web crawling for google only

bino150 · ‎06-23-2014

I know this topic has been discussed before but there is never a clear answer. It seems it is not possible to allow only specific web crawlers such as google. If that's the case, I assume most of you have web-crawling enabled for your site only? Google is still getting blocked from crawling our site. I was hesitant to enable web-crawling but it sounds like that's the only way its going to work.

pulukas · ‎06-25-2014

There is no easy way to have the googlebot identified in a rule so that it would be the only one allowed to crawl.

Google does provide a way to verify the bot via reverse dns lookup of the bot ip address. Here you check the source of the crawler and lookup in DNS to a specific google subdomain as outlined in this document. The issue is there is no way to have this kind of check in a PA rule at this point.

Verifying Googlebot - Webmaster Tools Help

Google bot also uses a specified user agent string. So you could create a custom vulnerability signature to look for hte string. There are two issues with this method. One it is a good way to block but not a way to permit only the hits. And two there are people faking these user agents since they know they are trusted and would pass the test even without being the real deal.

Google crawlers - Webmaster Tools Help

Steve Puluka BSEET - IP Architect - DQE Communications (Metro Ethernet/ISP)
ACE PanOS 6; ACE PanOS 7; ASE 3.0; PSE 7.0 Foundations & Associate in Platform; Cyber Security; Data Center

bino150 · ‎07-08-2014

Thank you.

Unlock your full community experience!

web crawling for google only

web crawling for google only

Show your appreciation!