Failover Behaviors

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Failover Behaviors

L1 Bithead

Hi All,

 

Setup: Active-Passive

Path Monitoring: enabled, but not configured(nothing under that Path group)

Version: 7.1.14

 

 

Would an Active firewall change its state to non-functional if both of its HA2/HA-Backup goes down?

 

Related Logs:

2019/12/04 09:41:04 critical ha ha2-lin 0 All HA2 links down
2019/12/04 09:41:04 high ha session 0 HA Group 1: Ignoring session synchronization due to HA2-unavailable
2019/12/04 09:41:04 high ha ha2-lin 0 HA2-Backup link down
2019/12/04 09:41:04 critical general general 0 Chassis Master Alarm: HA-event
2019/12/04 09:41:04 critical ha ha2-lin 0 HA2 link down
2019/12/04 09:41:04 critical ha state-c 0 HA Group 1: Moved from state Active to state Non-Functional
2019/12/04 09:41:04 critical ha datapla 0 HA Group 1: Dataplane is down: path monitor failure
2019/12/04 09:41:04 high general general 0 9: path_monitor HB failures seen, triggering HA DP down

 

Also is there an HA Failover table that I could refer so I can reference what is Palo Altos behavior when lets say HA1 fails or HA2 fails etc..

 

Thanks,

John

 

13 REPLIES 13

Cyber Elite
Cyber Elite

did the secondary device go to non-functional ?

 

the primary should not go into a faulty state if the HA2 links go down. the secondary, however, just lost it's capability of taking over seamlessly if the primary were to go down, since it no longer receives session state information.  

 

in case both HA1 links go down, the primary peer will remain active as it will assume the secondary peer went down, the secondary peer will assume an active role as it thinks the primary went down, so now both are active and no one is happy

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

Very good and useful info.

MP

Help the community: Like helpful comments and mark solutions.

Hi,

 

The active firewall went into non-functional state, so the passive firewall took over as active.


xxxx@xxxxxx-fw(passive)> show high-availability state

Group 1:
Mode: Active-Passive
Local Information:
Version: 1
Mode: Active-Passive
State: passive (last 17 hours)
Last non-functional state reason: Dataplane down: path monitor failure.

 

Some related logs on the ha_agent.log:

2019-12-04 09:41:04.464 +0000 debug: ha_slot_sysd_dp_down_notify_cb(src/ha_slot.c:641): Got initial dataplane down (slot 1; reason path monitor failure)
2019-12-04 09:41:04.464 +0000 The dataplane is going down
2019-12-04 09:41:04.464 +0000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Dataplane is down: path monitor failure
2019-12-04 09:41:04.464 +0000 Going to non-functional for reason Dataplane down: path monitor failure
2019-12-04 09:41:04.464 +0000 debug: ha_state_transition(src/ha_state.c:1329): Group 1: transition to state Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_state_start_monitor_holdup(src/ha_state.c:2518): Skipping monitor holdup for group 1
2019-12-04 09:41:04.464 +0000 debug: ha_state_monitor_holdup_callback(src/ha_state.c:2611): Going to Non-Functional state state
2019-12-04 09:41:04.464 +0000 debug: ha_state_move(src/ha_state.c:1423): Group 1: moving from state Active to Non-Functional
2019-12-04 09:41:04.464 +0000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Active to state Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1434): Set dev state to Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1400): Set dev alarm to on

 

 

 

I also did a test 

 

Active Passive  PA

 

Only HA1 is connected and no HA1 backup connected.

 

Heartbeat backup is checked on Both Firewalls.

 

Disconnected the HA1 and  Dashboard shows both HA1 and heartbeat are down.

Both PA became active.

 

Need to know even though heartbeat backup is checked and management interface on both PA is up why  heartbeat backup  show down on both firewalls?

Is this expected behaviour?

MP

Help the community: Like helpful comments and mark solutions.

@Jonathan_Panes,

Losing HA2 and having a device go into a non-function status is certainty not expected behavior. There are however multiple HA fixes that have been made in 7.1 in later maintenance releases, so you could possibly be running into a bug. While I generally don't like recommending someone upgrade unless I can point towards a specific issue ID, you are running an older maintenance release that has open security advisories present, so I'm going to use those instead and recommend you upgrade to 7.1.25 which will hopefully fix the issue you ran into here as well as patching some security issues. 

 

PAN-SA-2019-0013

PAN-SA-2019-0019

PAN-SA-2019-0021

PAN-SA-2019-0022

https://securityadvisories.paloaltonetworks.com/

@MP18,

How do you have your MGMT traffic routing. It's possible that due to the split-brain scenario present when HA1 is removed the two devices actually can't send heartbeat traffic to each other due to routing issues present when both firewalls are active. We would need to look at your actual network design to verify to be certain, but that would be my first guess. 

Hi BPry,

 

We are running 8.1.9 on this PA 3020.

These are our LAB firewalls and they do not have any traffic passing via Data plane.

 

Management Plane routing both firewalls are in same subnet.

All the service Routing is via Management plane only.

 

Regards

Mike

 

 

 

MP

Help the community: Like helpful comments and mark solutions.


@MP18 wrote:

 

Disconnected the HA1 and  Dashboard shows both HA1 and heartbeat are down.

Both PA became active.

 


that's not how it's supposed to work 😞

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

I am running 8.1.9 on PA 3020.

Am i hitting the bug?

MP

Help the community: Like helpful comments and mark solutions.

Anything i should check from config wise?

OR i can open the tac case

MP

Help the community: Like helpful comments and mark solutions.

@MP18,

I'd be interested in seeing what your ha_agent.log actually reports when you see this issue pop up to see exactly what the agent is seeing. I haven't seen any keepalive bugs with 8.1.9, and we don't have any addressed issues with 8.1.10 or 8.1.11 that appear to address anything related to this issue. 

Hi BPry,

 

Seems i was testing the HA1 by disabling the encryption on one firewall and leaving enabled on another.

It is not supposed to work like this similar to routing protocols like ospf neighbourship when we enable authentication on one router and 

do  not enable on another.

 

Many thanks for pointing me in right direction.

 

MP

Help the community: Like helpful comments and mark solutions.

L2 Linker

Hi All,

There was a 15 min downtime when customer working on replacing the Passive device in a A/P pair with RMA device.

Soon they connected HA1 (Aux1) cable only to New RMA device (no interfaces connected bcz link monitoring was enabled), there was split brain scenario for few mins where peer firewall running active became passive and dropped traffic. Customer suspended the new RMA device and both firewalls recovered from split brain scenario and the traffic was passing through expected firewall (Active Firewall).

My question: With preemption disabled if split brain scenario occurs in A/P pair, after recovery from split brain which firewall owns the active state?? ( my answer is firewall that has lowest priority will have the Active role after recovery even the network interfaces are not connected and link monitoring also enabled on these interfaces ).

Thanks in advance.

 

  • 10937 Views
  • 13 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!