Intermittent random packet drops to/from NGFW

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Intermittent random packet drops to/from NGFW

L1 Bithead

What seems to be out of the blue, with no configuration changes on our firewall(s), we began experiencing random periods of "network outages" on our main data center firewall. The symptoms are as follows:
Our pingdom test to our OWA website shows as down (i.e. the web page hosted behind the firewall cannot be reached from the internet)

Users connected via GlobalProtect experience either horrible performance or outright disconnects.

GP connected users running ping tests to internal resources show 10-60+% packet loss. (it varies widely)

Users connected in the data center running ping tests to ISP gateway show similar packet loss patterns and experience horrible performance / no connection to the internet.

This condition lasts 1-20+ minutes and is obviously causing major headaches.

I've opened a TAC case, but as seems to be the case as of late, the wait times to talk to a human are long.

We have an HA pair in an active/passive setup at the data center and I've failed over with no apparent change in this condition. I've also removed the in-line switch (that takes the one physical connection from our ISP and splits it to the HA pair) from the equation with no change in the condition.
I can't MAKE it happen, but it has been happening regularly enough to be a real thorn in our side.

I haven't seen anything in the system log that I recognize as being relevant to this behavior. 

I upgraded the pair to PAN-OS 10.0.8-h8 in the hopes I'd tripped an obscure bug and the update would fix it. Sadly it did not.

If anyone as ideas, I am wide open to them.

6 REPLIES 6

L0 Member

We have an issue where our VPN is no longer working. A wildfire update timestamp coincides with the initial report of our VPN IP addresses not being reachable from DNS made easy. This morning I was able to get a PA tech to help but he didn't find any issues and as you, we haven't made any configuration changes. I've been trying to get another tech and already spent 2 hours and no luck. Our issues started this morning around 7am. In our case, no one can connect to VPN.  Have you heard back from Palo Alto Support?

L1 Bithead

@JPhilip wrote:

What seems to be out of the blue, with no configuration changes on our firewall(s), we began experiencing random periods of "network outages" on our main data center firewall. The symptoms are as follows:
Our pingdom test to our OWA website shows as down (i.e. the web page hosted behind the firewall cannot be reached from the internet)

Users connected via GlobalProtect experience either horrible performance or outright disconnects.

GP connected users running ping tests to internal resources show 10-60+% packet loss. (it varies widely)

Users connected in the data center running ping tests to ISP gateway show similar packet loss patterns and experience horrible performance / no connection to the internet.

This condition lasts 1-20+ minutes and is obviously causing major headaches.

I've opened a TAC case, but as seems to be the case as of late, the wait times to talk to a human are long.

We have an HA pair in an active/passive setup at the data center and I've failed over with no apparent change in this condition. I've also removed the in-line switch (that takes the one physical connection from our ISP and splits it to the HA pair) from the equation with no change in the condition.
I can't MAKE it happen, but it has been happening regularly enough to be a real thorn in our side.

I haven't seen anything in the system log that I recognize as being relevant to this behavior. 

I upgraded the pair to PAN-OS 10.0.8-h8 in the hopes I'd tripped an obscure bug and the update would fix it. Sadly it did not.

If anyone as ideas, I am wide open to them.


It may make sense to also change the MSS on the firewall in order to keep TCP packets small enough to avoid fragmentation. If you have dropped the MTU down to 1492 an MSS of the default of 1380 (which i see has been changed in the configuration for some reason) should help keep TCP in check as well. You can set this value with the command 'sysopt connection tcpmss 1380'. 1380 would leave 112 bytes available for headers etc (1492-1380).

 

When our connectivity is working, we can still connect to our VPN gateway.
We have found that we are getting DUP! packets in alarming numbers when pinging our ISPs gateway. So we're trying to get the ISP to help, but so far no joy.

Hi @JPhilip ,

It sound like one of those problems that it could be anything in the network, but everybody is blaming only the firewall.

From the above it doesn't sound you have been able to confirm that your Internet connection is not having problems.

 

My suggestions would be:

- Run packet capture for your pingdom test (because long running packet capture for ping will be significantly smaller than anything else).

- Run the capture on the internet facing interface

- Once the issue occur, check the capture and see if you receive any packets at all.

- If the pings packets are indeed captured, try to set the capture on the inside interface and wait for the next issue. But if GP users are having issue, it should be the outside connection having problems.

 

My humbple opinion is to first confirm that the packets are indeed reaching the firewall, if they are only then to dig deeper into debugging the firewall.

 

Meanwhile you can check some trivial stuff:

- Physical interface on the firewalls to outside switch for errors

- Switch to ISP CPE interfaces for errors.

 

 

Thanks @aleksandar.astardzhiev for the thoughts.
The pingdom check is for a web page, so it is port 80 rather than ICMP. A bit too noisy for filtering effectively. But we have had a few other pieces of information come to the forefront.

The following is true regardless of which firewall is active out of the HA pair. They behave the same.

If we ping from the outside interface to the ISP gateway, we get duplicate packets and dropped packets.

If we ping from the inside interface to the next hop switch, we get dropped packets and no duplicates.

For both the above pings the latency averages between 100 and 250 ms!

Pings from the next hop switch to anywhere else in our network average in the sub 10ms range.

The duplicate packets from the ISP speak to something there, but the high latency on all pings throws concern on the firewall itself.

I feel like I'm missing something basic, but don't know what it is. Like I'm too close to the tree to see the forest.  😕

 

L1 Bithead

Just in case this weirdness ever happens to anyone else and they end up here, the following is what we found after days of searching.

We have a small network in general. A headquarters location (HQ) with a single PAN 3220 out to the internet and a router across a metro-E style connection back to a router at our data center (colo) which also has 2 PAN 3220 in an active-passive HA pair. 

The active 3220 at the Colo was being overwhelmed with over 300,000 packets/sec. When we narrowed it down, we discovered virtually all the traffic was coming from our HQ's default internet IP!?
Turns out, that for some still unknown reason, clients that brought their computers back into the office and left their GlobalProtect client enabled - even if they didn't authenticate - were the ones causing the issue. 

More specifically, a single client could, in essence, DDoS the Colo firewall with the number of packets it was sending out. The common element was a zoom meeting on a client whose GP software was either connected or trying to connect.

We ended up creating a block rule on the HQ firewall to prevent the GP clients from reaching the Colo firewall while in the office. That stopped the issue. 

  • 11459 Views
  • 6 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!