Screwing up with Googlebot

Posted in blogging on @340 by pjh

Yesterday I had a heart-stopping experience with my site and the googlebot. I’d just signed up with google sitemaps and had been crawled. Then I realized that although my blog homepage was reachable, none of the article pages were. Neither was robots.txt, and I wasn’t sure whether a 403 Forbidden on that would be treated as a missing file or a deny *.

Happily, it’s treated as a missing file, so googlebot cheerfully crawled all of my site that it could see. Unfortunately, that’s only the home page — everything else came up with a 500 Server Error.

Luckily it was easy to fix. I’d recently rearranged my [VirtualHost][] settings and not thoroughly tested. All of the pages that I visit regularly — the home page and the admin/edit pages — were working fine. Apologies to anyone who got errors because of it.

Google cheerfully accepts a plain text sitemap, so after fixing the problem, I quickly created a short list of key URLs, uploaded it to the site, and told google about it. Not too long later (I hope) google will come along and revisit all of my pages based on that list.

Even easier, as I found out this evening, is to use Foo’s great WordPress plugin to generate a new sitemap with every post. Very easy to install, just a single .php in your plugins directory. Configuration is on the Options > Sitemaps page.

One word of caution: remember that page weightings are relative. This means that it’s only your rankings that count, so use low rankings to emphasize the high ones. The 1 you put against your home page is only meaningful because other pages have lower priority. Don’t be afraid to let unpopular posts have a low priority like 0.1 — it’s the overall effect you’re after.