Inbound Policy-Based Forwarding Issue - Intermittent loss of connectivity

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Inbound Policy-Based Forwarding Issue - Intermittent loss of connectivity

L2 Linker

Hello,

 

Got a strange one, that I am hoping someone with deep knowledge of PBF and symmetric return can advise on.

 

We have two (2) virtual-routers due to two different ISPs.  The history of it is we are migrating off of one ISP to finally decommission it.

 

Most of the internal DMZs are on VR1

New VR2 is the new ISP

We have eBGP between VRs using loopbacks to share hundreds of internal routes

 

We have multiple outbound PBFs rules pushing traffic out form VR1 to VR2, which are zone type PBF rules

We have several inbound PBF rules, which are all interface type PBF rules pushing traffic to VR1 from VR2 all with Enforce Symmetric Return

 

One of these inbound services and PBF rules is intermittently failing.

 

Packet captures show :

 

Client to server reaching the Palo Alto in VR2

Palo Alto forwards that onto destination server in VR1 and applies the required NAT

Server returns TCP-SYN-ACK data to Palo Alto

However Palo Alto does not send that TCP-SYN-ACKs to client and is seen in the drop captures

 

At this stage multiple TCP retransmissions ensure from both the client and server but are also dropped until a TCP RST is sent from the server, which again is dropped

 

Session browser shows session is built and NAT and symmetric return flag checked

 

This is an intermittent issues last several hours.  Other inbound PBF rules and services do not seem to be affected.

 

PA support are currently baffled and have not yet stumped up any real next steps.

 

We moved from PA-3020s on 9.1.x back in Nov 2023 to new PA-5410s on 10.2.5 when this issue began.

 

Are there limits as to the number of PBF rules

Are there limits to the number of outbound PBF rules and inbound rules 

 

CPU showing as low at 5-10% on the data plane and session table is 39500 out of 5000000 at it's peak

Health check on physical network through FW, switches and ISP infrastructure shows as all clear, with no errors. (We would see errors in other services that share this infrastructure)

 

Looking for any advise and advance cli commands that could help me troubleshoot this problem.

 

Regards

 

 

 

 

3 REPLIES 3

L4 Transporter

Hello @GrantCampbell4 - I am going to suggest raising a TAC case for this one; if you've seen the behaviour introduced between 9.1 and 10.2, and no other changes were made, a TAC case is the appropriate way forward.

Iain Robertson
Senior Customer Success Engineer, NGFW, Palo Alto Networks

L0 Member

Hi,

 

Having the same problem. In my case, the default route sends to another VSYS connected via a physical cable.
Normally, the PBF enforces the return to go thru the original ingress interface, but it is going thru the default route one, and for some reason, the SOURCE MAC address is the one the PBF enforces.

 

I have a TAC oppened for more then two months, and they still did not find the issue.

 

Did you gyus find a solution?

Hi,

 

Just an update:

 

Still, no progress from TAC (yes, 8 months).

We have analyzed deeper and we are sure the issue is with PBFs Symmetric Return.

 

Here is the issue:

 

Scenario when working:

- Incoming packet comes from a non-default route ISP;

- PBF fowards the packet to the correct server destination on an Internal network;

- The server responds the packet;

- The PBF Symmetric Return kicks in and sends the returning packet to the same incoming interface, and with the next hop defined;

 

Non-working Scenario:

- Incoming packet comes from a non-default route ISP;

- PBF fowards the packet to the correct server destination on an Internal network;

- The server responds the packet;

- The PBF Symmetric Return kicks in and sends the returning packet to the default route ISP, and with the next hop defined;  ---->   (Yes, the MAC Address destination of that packet is still the same as the working scenario)

 

 

We still did not identified any correlation that makes the issue happen. It does for a couple of minutes to an hour, and stops.

 

 

 

Will update you further when TAC makes progress.

  • 1291 Views
  • 3 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!