Recently deployed several PA-5250s Running 10.1.3 and there is a issue that randomly comes and goes.
Latency for traffic going through the firewalls spikes to 100-500ms. I was able to capture one thing that looked peculiar and that was flow_fpga_ingress_exception_err counts were high (8169388322) and the rate was high (12468). But I can't seem to find a good definition as what this would indicate.
I also caught the packet descriptor (on-chip) (average): with 100 across the first two rows.
I failed to capture the CPU Cores at the same time though.
Well right now they have told us that high flow_fpga_ingress_exception_err are expected behavior and not to worry about them. As for the latency, we are just shot-gunning a few changes to see if anything helps. Like reducing port channel down to one link, possibly disabling offloading, and a couple others. Last resort is downgrade to the preferred 9 code. I will let you know if I find anything.
The reason we suggest the downgrade is because we have one 5220 running 9 code and it doesn't experience this issue. That's all we got though.
Any idea if there's any asymmetric routing going on?
Packet capture combined with global counters may shed some light on this. If you manage to narrow this down, to a sample source and destination that would be perfect. Then see if you get a drop pcap and use the pcap filters against the global counters.
I am sorry you are going through this, I am sure you'll find the solution.
At this point, since you are already at the pcap stage, I would perform a packet diagnostics, flow basic and look at a low level what the firewall is doing with each packet/session in the flow logic.
I bet you are familiar with that or already tried it, but if not, below is a good read:
My approach for reading these is different, I get a TSF and find the txt file there and open it in notepad++
We're having a very similar issue on our 5220 (PAN-OS 10.1.4). The latency comes and goes. CPU / Memory usage is close to nothing, same goes for session utilization. However every few secs the flow_fpga_ingress_exception_err counter is rising. Delta says 50 more drops in a second, the next second 3000.
There's a strange thing I noticed. We gather metrics with prometheus (nevermind the software), and monitoring IfHCOutOctets and IfHCinOctets via snmp. We both monitor the firewall interfaces, and the (Cisco) switch ports they're connected to. We're using the same formula for bandwith calculation and get massive differences. On the switchport we see the nightly backups consume the whole 1Gbit bandwith on our graphs, in the same time period the matching firewall interface shows only ~700 Mbit/sec. It's the two ends of the same wire!
I'm not saying it's related to flow_fpga_ingress_exception_err but packet drops (a few thousand per few secs) could explain the difference between the two measured values.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!