<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Can Googlebot See You?  Let&#8217;s Find Out</title>
	<atom:link href="http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=can-googlebot-see-you-lets-find-out</link>
	<description></description>
	<lastBuildDate>Sat, 10 Dec 2011 14:16:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Jonathan Dingman</title>
		<link>http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/#comment-440</link>
		<dc:creator>Jonathan Dingman</dc:creator>
		<pubDate>Tue, 28 Aug 2007 00:17:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/#comment-440</guid>
		<description>John,

I actually went through and verified it, but if there isn&#039;t any listing, such as Disallow: &lt;whatever&gt;, it will take whatever it has from above.  Such as I have listed, Yahoo!&#039;s slurp will obey both the listings and it actually has done exactly that.

Personally, I don&#039;t want Googlebot to be the only robot allowed to crawl my site.  MSN and Yahoo! still send me a decent bit of traffic which I&#039;m appreciative of.

It probably is the case that only the major search engines obey the wildcards, but those are all that I&#039;m concerned with in this case.  I&#039;m just as happy not caring about the rest since most people that even search for my site, use one of those three major search engines.

But in any case, yes, you would want to use the fully qualified paths if you were wanting to exclude *every* robot that obeys the rules.

Thanks for your two cents John.</description>
		<content:encoded><![CDATA[<p>John,</p>
<p>I actually went through and verified it, but if there isn&#8217;t any listing, such as Disallow: <whatever>, it will take whatever it has from above.  Such as I have listed, Yahoo!&#8217;s slurp will obey both the listings and it actually has done exactly that.</p>
<p>Personally, I don&#8217;t want Googlebot to be the only robot allowed to crawl my site.  MSN and Yahoo! still send me a decent bit of traffic which I&#8217;m appreciative of.</p>
<p>It probably is the case that only the major search engines obey the wildcards, but those are all that I&#8217;m concerned with in this case.  I&#8217;m just as happy not caring about the rest since most people that even search for my site, use one of those three major search engines.</p>
<p>But in any case, yes, you would want to use the fully qualified paths if you were wanting to exclude *every* robot that obeys the rules.</p>
<p>Thanks for your two cents John.</whatever></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JohnMu</title>
		<link>http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/#comment-439</link>
		<dc:creator>JohnMu</dc:creator>
		<pubDate>Mon, 20 Aug 2007 14:56:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.ginside.com/2007/1077/can-googlebot-see-you-lets-find-out/#comment-439</guid>
		<description>I&#039;m fairly certain you don&#039;t want your robots.txt like that. By having generic and specified sections, you are telling the Yahoo crawler to apply the crawl-delay but letting it crawl everything, including the feeds. The specific section overrides any setting you have in your generic section.

For instance:
&lt;blockquote&gt;user-agent: *
disallow: /

user-agent: Googlebot
disallow:&lt;/blockquote&gt;

would allow the Googlebot to crawl everything while disallowing access to all other crawlers. 

Similarly, you need to keep in mind that only Google, Yahoo and MSN (&quot;only&quot; probably 99.0%, but anyway :-)) actually use wildcards. This means that all other crawlers who see those disallow-lines will not be able to parse them and will allow those URLs. If you need to keep them out, you would have to put the full URLs into your robots.txt.</description>
		<content:encoded><![CDATA[<p>I&#8217;m fairly certain you don&#8217;t want your robots.txt like that. By having generic and specified sections, you are telling the Yahoo crawler to apply the crawl-delay but letting it crawl everything, including the feeds. The specific section overrides any setting you have in your generic section.</p>
<p>For instance:</p>
<blockquote><p>user-agent: *<br />
disallow: /</p>
<p>user-agent: Googlebot<br />
disallow:</p></blockquote>
<p>would allow the Googlebot to crawl everything while disallowing access to all other crawlers. </p>
<p>Similarly, you need to keep in mind that only Google, Yahoo and MSN (&#8220;only&#8221; probably 99.0%, but anyway <img src='http://www.ginside.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> ) actually use wildcards. This means that all other crawlers who see those disallow-lines will not be able to parse them and will allow those URLs. If you need to keep them out, you would have to put the full URLs into your robots.txt.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using apc
Page Caching using disk: basic
Database Caching using apc
Object Caching 308/370 objects using disk: basic

Served from: www.ginside.com @ 2012-02-11 19:26:03 -->
