- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
12-05-2017 12:24 AM
Hi Guys,
I have slight confusion about the working of URL filtering i.e once u define the URL Filtering Profile on any rule how does the URL or any website gets catogorised,i mean how does the PA knows that this website belongs to this category.
I know that there is a download of URL filtering DB from PA periodically,so is it the a particular website is compared to this DB and action being taken or i m missing something here to understand.
Please need your support to understand this query.
Thanks
12-05-2017 03:13 AM
Hi @mahmoodm
well, not 'all' 🙂
the periodic download is a 'seed' file with popular URLs (these may not apply to your specific organization) to prepopulate your cache,but the cache, like any cache, is fleeting: entries will time out and get purged
the seed file is only there to save you some lookups for very popular websites, but these may only be a fraction of the URLs your organization uses
so for URL filtering to work optimally, you will need internet access as eventually most of the lookups will happen on the cloud (the cache is simply there to serve the most popular urls in your organization as only the entries you use frequently will be kept)
12-05-2017 01:26 AM
Does this help answer your question?
Management plane cache: The seed database is placed into the management plane (MP) cache to provide quick URL lookups. The MP cache will pull more URLs and categories from the PAN-DB core as users access sites that are not currently in the MP cache. If the URL requested by a user is “unknown” to Palo Alto Networks, the URL will be examined, categorized, and implemented as appropriate.
Source: https://researchcenter.paloaltonetworks.com/2014/10/web-security-tips-pan-db-works/
12-05-2017 02:45 AM
in short, when a web-browsing session is started, the client will send a get http (or in case of ssl, the server returns a certificate with cn www.website.com)
The firewall will see if it has a local cached category for the url, if it does it will simply apply policy (allow, alert, block, continue). If not it will do a cloud lookup and ask the url filtering cloud for a category on a url
once the category is returned the firewall will use this category to apply a policy and will also put the url+category in a cache so the next time it can retrieve the category from local cache
12-05-2017 03:01 AM
Hi reaper,
Thanks for your response.
If i understand correctly this means that PA has the info about all the URL's on the internet in its cache...?
And if it isnt able to find the category then it will reach out to cloud to resolve the categoty...in this case PA needs to have internet access to query the could.
Thanks
12-05-2017 03:13 AM
Hi @mahmoodm
well, not 'all' 🙂
the periodic download is a 'seed' file with popular URLs (these may not apply to your specific organization) to prepopulate your cache,but the cache, like any cache, is fleeting: entries will time out and get purged
the seed file is only there to save you some lookups for very popular websites, but these may only be a fraction of the URLs your organization uses
so for URL filtering to work optimally, you will need internet access as eventually most of the lookups will happen on the cloud (the cache is simply there to serve the most popular urls in your organization as only the entries you use frequently will be kept)
12-05-2017 03:32 AM
Hi reaper,
Thanks for the response,it clears the things now.
Actually i didnt find this info documented anywhere.
Thanks
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!