Firewall drops VSS-Management trailer due to Layer 4 checksum enabled

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

Firewall drops VSS-Management trailer due to Layer 4 checksum enabled

L2 Linker

This is not an issue, but a general document about an issue that we experience with a customer last weekend. The issue is not well documented by Palo TAC and it took us the help of another customer who experienced the same issue with the same application vendor.

 

One of our Electronic access systems stopped working after changing the perimeter firewalls to PA-5260 over the last weekend. The EAC has controllers (used for building accesses) that need to contact a server in the cloud. Simple configuration on the fw perspective allowing policy, NAT for the internal to external communication.

 

We experienced issue with connecting to server after placing the PA-5260 firewall in the environment and started seeing Server send RST-ACKs in the packet-captures.

 

the session teardown reason being - tcp-rst-from-server

 

After taking capture on the server, firewalls and the controllers, we saw a strange behavior on the firewall. Tracing a single TCP stream, we see the controller sends 37 packets to the server, but the firewall ingress interface only receives 36 packets and sends 36 packets to the server. The client keeps send a re transmission packet, since it didn't get any ACK for the PUSH-ACK packet that did not reach the firewall. And after waiting for 14 seconds the server sends a reset.

 

Deeper investigation into the issue reveals that the packet that does not make it to the firewall is the packet which has a  VSS-Management trailer added at the back by the Controllers.

 

VarunRao_0-1597719204504.png

The root cause is with newer hardware models that use the FPGA FE100 hardware chip, which causes the firewall drops certain segments containing a VSS-Management trailer. This is due to the firewall performing an FCS on ingress, but the added VSS-Management trailer breaks the checksum and the segment doesn't make it to the destination.

 

Taking global counters on the firewalls yielded below output:

 

> show counter global filter delta yes | match L4
[Kflow_fpga_rcv_igr_L4CHKSUMERR 46 5 info flow offload FPGA IGR Exception: L4CHKSUMERR
appid_ident_by_dport_first 1386 177 info appid pktproc Application identified by L4 dport first
appid_ident_by_dport 38 4 info appid pktproc Application identified by L4 dport

 

We see firewall dropping the packets due to checksum failure and hence not making it to the dataplane.

 

Disable Layer 4 Checksums
Perform the below on both firewalls using HA to minimise any impact. i.e passive first.

1. On the Firewall, disable layer4 checksum using below command:
> set system setting layer4-checksum disable

2. Reboot the device.

3. After box comes up after reboot, confirm setting in sdb:
> show system state | match fe100
Result: You should be getting l4_chk_sum': 0 as below:
cfg.hw.fe100: { 'cfg_mode': 10, 'l4_chk_sum': 0, 'usecase': 1, 'v4_v6_choice': 2,

Since L4 checksum will no longer be performed on the firewalls, TCP retransmissions due to invalid checksum would still occur because of the server/client checksum validation.

VarunRao_1-1597720335639.png

Related documents:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLpICAW

 

https://docs.paloaltonetworks.com/pan-os/8-1/pan-os-release-notes/pan-os-8-1-addressed-issues/pan-os...

 

 

 



Thanks & Regards,
Varun Rao
7 REPLIES 7

Community Team Member

@VarunRao ,

 

Awsome debugging !

Thanks for sharing !

 

Cheers,

-Kiwi.

 
LIVEcommunity team member, CISSP
Cheers,
Kiwi
Please help out other users and “Accept as Solution” if a post helps solve your problem !

Read more about how and why to accept solutions.

Cheers mate!!

 

Thanks,

VR



Thanks & Regards,
Varun Rao

L0 Member

Great find, I believe we are running into exactly the same issue. 

 

We initially just tried running the below command but that didn't seem to have any real affect by itself. 

set session strict-checksum no

 We'll give this a try next week and see how we go. 

 

I was keen to understand what the security impact of doing this is though because I don't really understand the risks associated with disabling this checksum. Do you have any insight into this? 

Hi Joe,

 

The Palo firewalls normally does a L4 checksum on the dataplane and you ca enable/disable it on the dataplane with the command that you used below, however on the 5200 and 3200 models there is a L4 checksum that it does on the network processor too, if your issue matches the conditions in the above post, you will have to disable it on network processor and keep the L4 checksum on the dataplane, that should not compormise any of your security posture.

 

Below is a doc that I could find explaining both dataplane and network processor L4 checksums:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLpICAW

 

Hope that helps,

Varun Rao



Thanks & Regards,
Varun Rao

Thanks Mate, that's really helpful. 

 

We disabled the layer 4 checksum on the network processor and that resolved this issue for us as well. 

We made sure that we left the checksum enabled on the dataplane as well. 

 

Cracking effort tracking this issue down!

 

Thanks

Joe

Glad it is helping people on the forum, sole reason why I had it documented here. There is no Palo alto document on this, since it is not an issue as per PA, but expected behavior of the network processor in 5200/3200's which sometime cause issue with other vendor traffic.

 

Happy to help!!



Thanks & Regards,
Varun Rao

L1 Bithead

@VarunRao - Curios to know your  perspective of the issue. Would it be safe to assume the access controller vendors are meant to get this fixed from their end? As it is their device that is introducing the VSS-Management trailer which is then breaking the checksum.

 

Turning off the L4 checksum on the network processor just seems to be a work around but does not fix the root cause? I'm just trying understand the industry perspective of where the fix should lie. 

  • 7884 Views
  • 7 replies
  • 4 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!