Path Monitoring: enabled, but not configured(nothing under that Path group)
Would an Active firewall change its state to non-functional if both of its HA2/HA-Backup goes down?
2019/12/04 09:41:04 critical ha ha2-lin 0 All HA2 links down
2019/12/04 09:41:04 high ha session 0 HA Group 1: Ignoring session synchronization due to HA2-unavailable
2019/12/04 09:41:04 high ha ha2-lin 0 HA2-Backup link down
2019/12/04 09:41:04 critical general general 0 Chassis Master Alarm: HA-event
2019/12/04 09:41:04 critical ha ha2-lin 0 HA2 link down
2019/12/04 09:41:04 critical ha state-c 0 HA Group 1: Moved from state Active to state Non-Functional
2019/12/04 09:41:04 critical ha datapla 0 HA Group 1: Dataplane is down: path monitor failure
2019/12/04 09:41:04 high general general 0 9: path_monitor HB failures seen, triggering HA DP down
Also is there an HA Failover table that I could refer so I can reference what is Palo Altos behavior when lets say HA1 fails or HA2 fails etc..
did the secondary device go to non-functional ?
the primary should not go into a faulty state if the HA2 links go down. the secondary, however, just lost it's capability of taking over seamlessly if the primary were to go down, since it no longer receives session state information.
in case both HA1 links go down, the primary peer will remain active as it will assume the secondary peer went down, the secondary peer will assume an active role as it thinks the primary went down, so now both are active and no one is happy
The active firewall went into non-functional state, so the passive firewall took over as active.
xxxx@xxxxxx-fw(passive)> show high-availability state
State: passive (last 17 hours)
Last non-functional state reason: Dataplane down: path monitor failure.
Some related logs on the ha_agent.log:
2019-12-04 09:41:04.464 +0000 debug: ha_slot_sysd_dp_down_notify_cb(src/ha_slot.c:641): Got initial dataplane down (slot 1; reason path monitor failure)
2019-12-04 09:41:04.464 +0000 The dataplane is going down
2019-12-04 09:41:04.464 +0000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Dataplane is down: path monitor failure
2019-12-04 09:41:04.464 +0000 Going to non-functional for reason Dataplane down: path monitor failure
2019-12-04 09:41:04.464 +0000 debug: ha_state_transition(src/ha_state.c:1329): Group 1: transition to state Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_state_start_monitor_holdup(src/ha_state.c:2518): Skipping monitor holdup for group 1
2019-12-04 09:41:04.464 +0000 debug: ha_state_monitor_holdup_callback(src/ha_state.c:2611): Going to Non-Functional state state
2019-12-04 09:41:04.464 +0000 debug: ha_state_move(src/ha_state.c:1423): Group 1: moving from state Active to Non-Functional
2019-12-04 09:41:04.464 +0000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Active to state Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1434): Set dev state to Non-Functional
2019-12-04 09:41:04.464 +0000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1400): Set dev alarm to on
I also did a test
Active Passive PA
Only HA1 is connected and no HA1 backup connected.
Heartbeat backup is checked on Both Firewalls.
Disconnected the HA1 and Dashboard shows both HA1 and heartbeat are down.
Both PA became active.
Need to know even though heartbeat backup is checked and management interface on both PA is up why heartbeat backup show down on both firewalls?
Is this expected behaviour?
Losing HA2 and having a device go into a non-function status is certainty not expected behavior. There are however multiple HA fixes that have been made in 7.1 in later maintenance releases, so you could possibly be running into a bug. While I generally don't like recommending someone upgrade unless I can point towards a specific issue ID, you are running an older maintenance release that has open security advisories present, so I'm going to use those instead and recommend you upgrade to 7.1.25 which will hopefully fix the issue you ran into here as well as patching some security issues.
How do you have your MGMT traffic routing. It's possible that due to the split-brain scenario present when HA1 is removed the two devices actually can't send heartbeat traffic to each other due to routing issues present when both firewalls are active. We would need to look at your actual network design to verify to be certain, but that would be my first guess.
We are running 8.1.9 on this PA 3020.
These are our LAB firewalls and they do not have any traffic passing via Data plane.
Management Plane routing both firewalls are in same subnet.
All the service Routing is via Management plane only.
I'd be interested in seeing what your ha_agent.log actually reports when you see this issue pop up to see exactly what the agent is seeing. I haven't seen any keepalive bugs with 8.1.9, and we don't have any addressed issues with 8.1.10 or 8.1.11 that appear to address anything related to this issue.
Seems i was testing the HA1 by disabling the encryption on one firewall and leaving enabled on another.
It is not supposed to work like this similar to routing protocols like ospf neighbourship when we enable authentication on one router and
do not enable on another.
Many thanks for pointing me in right direction.
There was a 15 min downtime when customer working on replacing the Passive device in a A/P pair with RMA device.
Soon they connected HA1 (Aux1) cable only to New RMA device (no interfaces connected bcz link monitoring was enabled), there was split brain scenario for few mins where peer firewall running active became passive and dropped traffic. Customer suspended the new RMA device and both firewalls recovered from split brain scenario and the traffic was passing through expected firewall (Active Firewall).
My question: With preemption disabled if split brain scenario occurs in A/P pair, after recovery from split brain which firewall owns the active state?? ( my answer is firewall that has lowest priority will have the Active role after recovery even the network interfaces are not connected and link monitoring also enabled on these interfaces ).
Thanks in advance.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!