SYN packets dropped in HA mode

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

SYN packets dropped in HA mode

L1 Bithead

  Sorry for reporting this here rather than through the proper channels  but I am not a direct customer and our service provider who could submit this through the proper channels has not (at least to my knowledge).

 

 

 

  Our network is behind a pair of 5050s in HA mode and I have documented that between 10 and 20 percent of outbound SYN packets are dropped when the source IP address ends in an even number.

 

  I have also documented SYN drops on inbound traffic but have not tested if it varies with the source IP address.

 

 

Here is an example command that demonstrates the issue.

 

sudo hping3 --fast -q -c 100 -S -p 80 xxx.xxx.215.5

HPING 165.234.215.5 (enp0s3 165.234.215.5): S set, 40 headers + 0 data bytes

 

--- xxx.xxx.215.5 hping statistic ---

100 packets transmitted, 84 packets received, 16% packet loss

round-trip min/avg/max = 9.9/12.4/26.0 ms

 

Here is a command sending the packets but showing 0 loss since it is from a computer with an IP address that ends in an odd number.

 

sudo hping3 --fast -q -c 100 -S -p 80 xxx.xxx.215.5

HPING 165.234.215.5 (en0 165.234.215.5): S set, 40 headers + 0 data bytes

 

--- xxx.xxx.215.5 hping statistic ---

100 packets tramitted, 100 packets received, 0% packet loss

round-trip min/avg/max = 9.3/9.6/12.0 ms

 

Here is a graph showing that the problem started at the time of a software upgrade.

 

 

SYN start.jpg

 

Here is a graph from a sensor that I set up to change IP numbers every 20 minutes.  The periods with normal latency are when the sensor has an odd IP number and the periods with 1 second latency are when the sensor has an even IP number.  The 1 second latencies are from TCP sessions where the SYN packet was dropped and the OS waited 1 second to resent the SYN packet.

 

SYN 20 minute.jpg

 

 

  If this is a known bug I appologize.  However if it is not a known bug some of you may be experiencing the problem right now.

 

14 REPLIES 14

L4 Transporter

Are you running in VWire or Layer 3 mode?  We have been facing an issue for some time with our 50xx series firewalls in Active/Active HA mode, but we are running in VWire mode thats sitting on a Layer 2 port channel.  

 

Through a few debug sessions with Palo Alto we have discovered the following:

 

1) Sometimes sessions aren't being replicated to the other Palo Alto HA pair and when a packet arrives on that firewall, its dropped.

2) Sometimes sessions would not be created at all for small, chatty traffic.

 

At this time we still do not have Active / Active working.

 

One thing that we did change that helped (but didnt fix) was change the session and session create owner to "first packet".

 

Matt

It is running in Vwire mode.

 

Our service provider will likely change the config to not use HA mode.

 

But since our service provider did not provide feedback on if they contacted Palo Alto about the issue I wanted to be sure that others running HA mode knew of the potential affect on their customers and so the bug can be properly reported to Palo Alto if someone else can reproduce it.

Sounds to be almost exactly like our issue.  Are you runniing 6.1.x?  Sounds like it may be behind a port channel as well where sometimes packets may look asymetrical to the Palo Alto, but in reality its just the Cisco hash algorithm balancing traffic.

 

You could open at ticket with Palo Alto on this and reference case: 00431191

 

I would be interested to hear if this may speed our resolution of this bug as I finally got the debugs to them at the beginning of the week.

Our service provider runs the box so I'm not sure which software version they are running.

 

I do know that the problem started with a software upgrade on August 19 if that helps you determine what version we might be running.

 

While the symptons are similar the problem is not with a port channel on other network devices.

 

As I said above the problem started to the minute with a software upgrade on the Palo Alto.  The packets in question arrive at a single 10 gig port on one of the two HA Active Active Palo Altos.  The return packets also arrive on a single 10 gig interface on the same Palo Alto.

 

Also I do have access to some MRTG graphs for the Palo Altos and every night at 2 AM there is a spike in the Palo Alto "

Management CPU".  (strangely last night there was no spike but that is not relevant).

 

A few weeks ago the problem dissapeared at exactly 2 AM.  Well it mostly disappeared from IPs I was monitoring with.  I did find signs that it moved, perhaps being sensitive to the destination IP number etc.  And then after a day or two it reappeared and acted exactly as it has since August 19.  I don't have the exact time that the problem reappeared because my monitor was down but the window during which it could have happened includes 2 AM again...

 

I suggest that you install hping or a similar tool to test if SYN packes are being randomly dropped.  You wll want to test from both and even and an odd IP number.    It would be best to be sure to test to an even and odd destination IP number also.

Palo starts nightly report generation at 2am. That is the cause for management cpu spike at that time.
Enterprise Architect, Security @ Cloud Carib Ltd
Palo Alto Networks certified from 2011

  Our service provider upgraded software on the Palo Alto on May 7th but the SYN packets are still dropped.

Maybe a long shot, but have you got zone protection enabled with Random Early Drop set to activate at 0 packets/sec ?

 

if so, you'll want to increase the activate to a much higher starting value, or switch to SYN cookies

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

I don't know if our service provider has Random Early Drop enabled but I doubt it.  

 

I know that the service provider had SYN cookies enabled and likely still has SYN cookies enabled.  (With the associated slowdown when SYN cookies are triggered because of the disabling of large TCP windows)

 

https://en.wikipedia.org/wiki/SYN_cookies.

 

Even if the issue was Random Early Drop it would still be a bug because it drops SYN packets from even IP numbers but not from odd IP numbers.

L1 Bithead

 

It seems likely that the SYN packets are being dropped on the HA3 link.

 

https://cyruslab.net/2013/01/03/palo-alto-networks-activeactive-high-availability/

 

Probably because of a bug introduced when the behavior changed or more options were included in version 6.

 

https://live.paloaltonetworks.com/t5/Management-Articles/HA3-Packet-Forwarding-in-Active-Active-Chan...

HA3 is only used in an active-active HA environment to share packets with the session owner:

 

Active-Active mode is specifically designed to accomodate asymmetrical network environments. if peer1 sees the syn packet and peer2 received the ack packet, depending on how your session setup is configured, one peer will send all packets to the other peer, so that peer has all the packets for session setup, AppID etc

 

so in case peer1 is the session owner, peer2 will forward the ack to peer1, it will decide this packet is allowed, update the session info (via HA2) at which point, peer2 releases the ack onto the network. peer1 will then receive another ack and then peer2 will receive packet4, forward it to peer1 again, it will analyse, apply AppID if possible, update the session and inform peer2 the packet is ok (and share the session info through HA2) at which point peer2 again releases packet4 onto the network

 

so ha3 is only used to share a copy of a packet

 

on the subject of HA3: since it encapsulates the full packets so the session owner gets to see the full session, jumbo frames are enabled on the ha3 link by default. if there's switches or other infrastructure in between, these need to support jumbo frames also

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

L1 Bithead

 

I suspect that my service provider has configured the "session setup" option shown in the link below to "IP modulo" since 10 to 20 % of SYN packets sourced from an IP address that ends in an even number but 0% of SYN packets sourced from an odd IP address are dropped.

 

https://live.paloaltonetworks.com/t5/Management-Articles/HA3-Packet-Forwarding-in-Active-Active-Chan...

 

So when I test through one box the traffic from odd IP numbers has the session created and "owned" by the same box the traffiic is flowing through.  When I send traffic through that same box sourced from an even IP number the session can not be created until packets are exhcanged over HA links to the "other" box in the HA pair.

 

It seems obvious to me that packets are being dropped in the exchange over the HA links, likely because of a software bug.

 

One of the two boxes serving us was just replaced due to hardware failure and the problem still persists.  

 

Also as I mentioned in earlier posts it is only the SYN packets that are being dropped.  Once a SYN packet is allowed through we don't see any further packet loss.

 

The drop of the SYN packets results in a 1 second delay in startng a session because most OSes wait 1 second before retransmitting the lost SYN packet.

 

Our service provider recently finally acknowledged that they can reproduce the problem on traffic inbound to us but they still have declined to open a case with Palo Alto.

 

Customers who have HA Active/Active but configured session setup to "first packet" or in some cases "primary device" might also be affected by the bug but they might be less likely to notice since their traffic that is affected would be much much less than the 50%  that we see (odd vs even).

I'd encourage you to twist your serve providers arm to have them open a case with support, only then can someone verify what is happening exactly and if it's a bug, get that fixed (or upgrade your firewall to a version that doesn't have this issue if one is already available)

 

since you don't have access to the firewalls, there could be all sorts of things

 

There could possibly be a problem with one of the firewall's upstream connection, since the problem seems to be even-numbered-ip there could also simply be a modulo configured for which firewall serves a floating IP and the upstream router mangles the SYN packet 

 

on a different note: does your network require A/A because of asymmetry ? if there's no asymmetric routing i'd recommend switching to A/P as there's no advantage of having A/A in a symmetrical environment (depending on the configuration, there could even be an overall performance decrease)

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

  I won't be arm wrestling with our service provider.  Our relationship with our service provider is not typical.  Our service provider has a great deal of power and would like to see me fired and has already applied pressure towards that end.  

 

  There is no problem with the firewall's upstream connection.    The observed problems happened to the minute when a software upgarde was applied to the firewall.  No other network changes were made at that time, upstream or downstream.  Also it is much more likely for a firewall to treat a SYN packet differnetly than for a router to treat a SYN packet differently.

 

  The service provider will be changing the config to  active/passive  on September 10th.

 

Our service provider should have opened a ticket long ago but chose to let the SYN packets be dropped until they could make some other changes in the network to allow the change to active/passive.

 

I posted the issue here because unlike our service provider I think that reporting the problem so that it may get fixed before others experience it is the right thing to do.  Our service provider would rather let someone else experience the issue and let them report it.

 

So we will likely no longer experience the problem after September 10th but hopefully if someone experiences the problem in the future these posts will reduce the time they spend exploring the problem and help them convice Palo Alto that there is a bug.

 

Sorry to hear about your service provider 😕

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization
  • 5976 Views
  • 14 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!