Oct
28

How Google Uses Sitemaps

Written by Jonathan Dingman
10/28/2008 3:07 ET - Filed under Google Search

So everyone goes around talking about Google Sitemaps, but have you ever stopped to ask yourself how Google uses sitemaps on its own properties?

Taking another look at the google.com robots.txt file, you can see this in the footer, Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml. After viewing that file, you can see that this file specifies a number of other sitemap files.

<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
	<sitemap>
		<url>http://www.gstatic.com/s2/sitemaps/sitemap-000.txt</url>
		<lastmod>2008-10-15</lastmod>
	</sitemap>
	<sitemap>
		<url>http://www.gstatic.com/s2/sitemaps/sitemap-001.txt</url>
		<lastmod>2008-10-15</lastmod>
	</sitemap>
	<sitemap>
		<url>http://www.gstatic.com/s2/sitemaps/sitemap-002.txt</url>
		<lastmod>2008-10-15</lastmod>
	</sitemap>
</sitemapindex>

The list goes on until about 029.txt right now, so that’s 30 different sitemaps. There are 25,000 lines per file right now. The sitemaps.org protocol states that you should not have more than 50,000 lines per file, so they are going half the distance within the standard.

So what could Google possibly have in a database that they want to make available to other search engines? S2 profiles. This is where you can share data through Google.

Google wants your profile to be available so other search engines can find it. But hold up a second. 25,000 lines [profiles] per page….and roughly 30 sitemaps…that’s only around 750,000 profiles. That’s not a lot in the large scale of Google users. That’s actually a tiny fraction of their overall network.

But the bottom line is that Google still wants these profiles to be crawlable and findable, even though they may or may not be linked to; simply the purpose of having a sitemap in the first place.

  • Subscribe via RSS
  • Bookmark to del.icio.us