High availability failover due to high dataplane usage

Highlighted
L0 Member

High availability failover due to high dataplane usage

our internet went down a few weeks ago when our primary PA failover to a secondary PA. We found out, after doing some research and investigative work, that this was due to the amount of new session created, which cause the PA to use the slowpath and access more cpu resource. Once we failedover, we had internet access for about 5-10 minutes and then suddently we lost internet access.  After talking to tech support, we came to a conclusion that this might have been to due arp. We have about 25 static NATs, and 3 DNATs, could this have been the cause? if so why did we have internet for a while and suddently lost connection? 

our failover condition is  based on link monitoring, trust, untrust, and path monitoring, which is our gateway. 

Why didnt we fail back to the active if we lost access to the internet- our path was never down on our secondary FW. 

We're running 8.0.2

Community Manager

Re: High availability failover due to high dataplane usage

Sounds like your device may have been flooded. this would explain your connectivity came back for a few minutes while the newly active secondary firewall's session table/resources were rapidly depleting.

This could be an outside attack or an inside burst (example: simultaneous triggering of windows update on a lot of internal devices)

 

one way to prevent this type of issue from taking down your firewall is to enable Zone Protection profiels that will start discarding packets at a certain packet rate or will implement syn cookies to prevent malicious flooding of tcp sockets

 

did you happen to collect a techsupport file right after the incident on your primary? if so you could go take a look at the dataplane resources in the dataplane logs to see if your packet descriptors were filling up or software pools draining

 

is there any additional information you can share?

you mentioned ARP, could you elaborate on this conclusion?


Help the community: Like helpful comments and mark solutions
Reaper out
L5 Sessionator

Re: High availability failover due to high dataplane usage

Hi,

 

Huge amount of new session ... normally, session are sync between cluster member then ...

Which model of palo are you using ??

According my experince, path monitoring is not .... always efficient.

Maybe when your first palo carshed, failover happen but path monitoring was not up on backup .. then no internet ....

 

Hope help.

 

V.

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!

The Live Community thanks you for your participation!