The SEO analysis I mentioned yesterday showed that of our top ten clients, 6 of them have sitemap.xml files that are registered in the Google webmaster area. We always put priorities and last modified “lastmod” dates (set to equal today’s date if possible) into the sitemap.xml files. These XML sitemaps are worth doing, if only because Google gives you all kinds of info about the links to your site and any trouble they are having crawling your site, which is only available from the webmaster area at Google, IF you have a sitemap for the site. We’ve discovered many a dead internal link using this tool from Google.
It was no surprise to us that all of our top 10 sites at Google have a robots.txt file. We keep these simple.
Of our top 10 sites at Google, all of them are over two years old. The oldest is from Octoer 1990 (ancient in internet years) and the youngest was created in February 2004. It also means that none of them are in the sandbox. Does this mean that Google likes sites that have built up a lot of links slowly over time? Probably.
All of the sites were a mixture of CSS and Tables — it didn’t seem to make any difference that a site was ll pure valid CSS or a bunch of dirty tables. It is my belief that Google strips all that out and goes just with the same kind of text view that you’d get from the Lynx viewer. Matt Cutts said that the “Big Daddy” update of the last few months was about upgrading the Googlebot so it was more like the mozilla browser, and less like the Lynx text-only browser, on which Googlebot was based. If so, that’s good news for all those webmasters who have been cramming their sites with JavaScript and Flash, as Lynx was absolutely awful at seeing or indexing any of the text in JavaScript menus.
No comments yet.