topic Re: web crawling for google only in General Topics

web crawling for google only

bino150 — Mon, 23 Jun 2014 23:18:10 GMT

I know this topic has been discussed before but there is never a clear answer. It seems it is not possible to allow only specific web crawlers such as google. If that's the case, I assume most of you have web-crawling enabled for your site only? Google is still getting blocked from crawling our site. I was hesitant to enable web-crawling but it sounds like that's the only way its going to work.

Re: web crawling for google only

pulukas — Thu, 26 Jun 2014 01:08:31 GMT

There is no easy way to have the googlebot identified in a rule so that it would be the only one allowed to crawl.

Google does provide a way to verify the bot via reverse dns lookup of the bot ip address. Here you check the source of the crawler and lookup in DNS to a specific google subdomain as outlined in this document. The issue is there is no way to have this kind of check in a PA rule at this point.

Verifying Googlebot - Webmaster Tools Help

Google bot also uses a specified user agent string. So you could create a custom vulnerability signature to look for hte string. There are two issues with this method. One it is a good way to block but not a way to permit only the hits. And two there are people faking these user agents since they know they are trusted and would pass the test even without being the real deal.

Google crawlers - Webmaster Tools Help

Re: web crawling for google only

bino150 — Wed, 09 Jul 2014 04:50:53 GMT

Thank you.