I have a B2B tunnel with a business partner. There are 22 proxies, all defined host-to-host. The VPN peer is a Cisco firewall, I'm not sure of the model. Phase 2 lifetime is 8 hours. One particular SA stops sending and receiving traffic at each Phase 2 re-negotiation. When this happens the SA shows active on my PA-3250, PAN-OS 9.1.10 and on the partner's Cisco. On my side I see encaps and no decaps, on his side he doesn't see my traffic for this SA coming in. No other SAs in this tunnel experience this issue. The only way I have found to recover from this is to either bounce the tunnel (not desirable as it is in production and has other SAs that are just fine) or to remove the proxy for the affected host-to-host pair and re-add it. Either method works every time, until the next Phase 2 re-negotiation. This tunnel is not new, it has been running fine for a year+. There is nothing particularly special about this tunnel as compared with my 80+ other B2B VPNs. This started happening about a week after upgrading from PAN-OS 9.0.13 to 9.1.10, but that is essentially the only change and no other tunnels are affected.
What could be affecting a single SA like this and not affecting the others within the same tunnel? The Cisco engineer at the business partner site is competent. We've compared his encryption domain against my proxies. We've made sure our SPIs match. He's working the same issue from his side and neither of us is pointing a finger at the other, it could be either firewall or both.
Thanks for the response, BPry. Yes, the tunnel and this particular SA are actively passing traffic until this situation occurs. The broader perspective is that the host address that is my local side of the proxy ID/encryption domain is a PAT address being used by a few dozen users to connect to a host at the partner site. When Phase 2 renegotiates and this particular SA drops my users are actively accessing the host at the partner site and are disconnected - they see it happen and my phone immediately lights up. I'm sure TAC would like to analyze this while it's in a failed state but this is a critical system, my priority is to get it back into operation as quickly as possible.
I've been working with VPNs for over 20 years on a variety of platforms. There have been a couple of times on Cisco ASAs where unexplained weirdness was solved only by tearing down the tunnel config and re-building it from scratch. Maybe this is one of those times. I've got nothing else.
Following up - Over the weekend I worked with the partner's engineer and we tore down and rebuilt the VPN on both sides and apparently fixed the issue. There have been no more SA drops. I suspect the issue was on the Cisco side as in a previous life I had to tear down/rebuild VPNs on ASAs a few times to cure unexplained weird behavior. I can't say for sure. Bottom line is that it's fixed.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!