Im involved in a project to migrate away from old asa firewalls to a palo solution.
The process has gone well but myself and peers are stumped with an odd issue and looking for troubleshooting advice.
We have a number of https hosts in a dmz, nat'ed to be available to the public internet. systems from all over the world can access these https hosts fine. That is.... except for a small group of individuals all using a particular local ISP. The trouble is, this issue did not exist with the previous ASA firewalls.
The security policy is simple, allow tcp 443 in. People with this issue will intermittently have difficultly fully loading an https site behind our palos. Pages will partially load or time out. Chrome debug logs from the end users perspective show timeouts on connection attempts.
If a user with the issue retries enough they will eventually establish a session and have no troubles. The https hosts in our DMZ show connection attempts from users on this ISP and from their perspective traffic seems to just stop.
No drops or blocks appear in threat or security logs on the palo and we dont see this behavior with any other people from other ISPs.
The ISP has insisted they are not mangling traffic and from what Ive seen I agree but the problem persists. Ordinarily Id write this off as an ISP doing something odd but the problem didnt exist when our prior ASA firewalls were in place.
The issue feels like a content inspection or ssl decryption issue but we've confirmed thats not in play. We've also confirmed any outside IDS systems are not mangling or dropping traffic. We dont see any asyc routing in play so it does not appear the palo is loosing state and dropping traffic. Everything from what we see appears the same as it was but still we continue to have issues with people using this one particular ISP
Myself and peers are looking for further troubleshooting insights or advice.
Any advice would be greatly appreciated.
On your NAT policy, do you have it setup as bi-directional? That could be the asymmetric routing issue you are referring to. Also check this article out for Direct Server Retrun:
Bi Directional NAT:
Just some thoughts.
Secondly, you may not be seeing dropped logs if you haven't clicked on the interzone-default and "overriden" it to log. What is the allow rule traffic is touching on the palo? If you have the service set to application-default, clone the rule, and underneath set to any port just to see if any traffic is switching ports in the lifecycle of the session.
Thanks for the reply. Fair question and suggestions. Yes the NAT is bidirectional and we arent load balancing between multiple upstream ISPs.... Theres only one return route back to problem clients. The issue does resemble that however we are looking to find some kind of proof. We're struggling to understand why clients on this one particular local ISP is having difficulties when no one else is.
Logging intrazone-default is something we hadn't considered. Its worth a shot to try. Thanks.
As for the allow rule; The current rule is:
allow any external ip , tcp 443 to the nat'ed dmz address, application type ssl
That works for every external client except for clients using this one particular ISPs service.
Ive gone so far as to push an Allow any protocol, port, zone from a given source IP (including any application) to the top of the rule base and made no impact to the problem.
Typically you want either ports or application in a policy rule, not both. To mirror the ASA rule, you should have Application any checked, and Service "service-https". This will allow all tcp 443 traffic in. Application SSL does not include all 443 traffic, it is based on the palo alto app id. I have seen traffic that doesn't match SSL but is 443.
What I typically do is create 2 rules. The first rule is to all Application SSL with Service "application-default." The next rule is like explained above. I will monitor the rules for a while and eventually disable the 443 rule.
Here is a picture of what I am explaining.
I am not sure this will fix the issue with the one ISP. Is this ISP the same as the one the PA FW is using?
Thanks for the insights and experiences with application definitions and tuning policies. Ill certainly keep your suggestion in mind, it sounds like a good strategy to determine exactly how the palo sees the traffic passing thru.
In this case the current "allow ssl" and "allow tcp port 443" seems working for everyone but users of this one ISP.
In answer to your question; yes the palo that was installed in place of old ASAs is using the same connectivity. We essentially swapped the ASAs with Palos and copied over rules.
To eliminate the possibility of a policy causing an issue Ive gone so far as to push an Allow any protocol, port, etc from a given source IP (including any application) to the top of the rule base and made no impact to the problem.
Its really odd.
Thanks , it may come to a PA TAC case. My concern is that Im struggling with defining the problem and worry PATAC may write it off as a problem with the particular users internet provider. Ideally Id have some definitive evidence to point to. Users of this problem ISP worked over ASAs so it feels like a hidden palo setting somewhere... MTU or fragmentation perhaps? Who knows
To answer your question; the internet provider users are having issues with is different from our internet provider.
We actually have a seperate secondary datacenter, with its own https hosts, own different internet provider and a similar palo setup providing inbound NAT for outside users to https hosts... This same group of users on the problem ISP intermittently cant access that secondary data center either.
I had an issue with accessing ic3.gov. PA TAC was very helpful in troubleshooting this issue. We found a work around but it did end up being "contact the website owner" which in my case was the FBI. Like that is going to happen. I temporarily rerouted that traffic through our DR site which worked fine. After public IP and ISP migration, the issue went away.
We did find a work around. Based on packet captures, the determined the packets were coming in out and never doing syn/ack. This is from my tac case.
Facing an issue with accessing a particular website.
out-of-window packets dropped incremented. Took captures with just server IP specified and confirmed server is sending challenge ack in response to syn, rather than syn/ack.
To allow challenge ACK:
Device > Setup > Session > Edit TCP Settings > Check "Allow Challenge ACK" (this is "Allow arbitrary ACK in response to SYN")
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!