06-23-2014 04:18 PM
I know this topic has been discussed before but there is never a clear answer. It seems it is not possible to allow only specific web crawlers such as google. If that's the case, I assume most of you have web-crawling enabled for your site only? Google is still getting blocked from crawling our site. I was hesitant to enable web-crawling but it sounds like that's the only way its going to work.
06-25-2014 06:08 PM
There is no easy way to have the googlebot identified in a rule so that it would be the only one allowed to crawl.
Google does provide a way to verify the bot via reverse dns lookup of the bot ip address. Here you check the source of the crawler and lookup in DNS to a specific google subdomain as outlined in this document. The issue is there is no way to have this kind of check in a PA rule at this point.
Verifying Googlebot - Webmaster Tools Help
Google bot also uses a specified user agent string. So you could create a custom vulnerability signature to look for hte string. There are two issues with this method. One it is a good way to block but not a way to permit only the hits. And two there are people faking these user agents since they know they are trusted and would pass the test even without being the real deal.
Google crawlers - Webmaster Tools Help
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!