HA Active‑Passive 3420 Both Nodes Stuck – Suspecting LACP Issue

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

HA Active‑Passive 3420 Both Nodes Stuck – Suspecting LACP Issue

L1 Bithead

Hello,

Yesterday our HA infrastructure on a pair of Palo Alto PA‑3420 (Active‑Passive) firewalls completely froze. Both units continued to believe they were the active peer, and automatic failover never occurred. We had to manually reboot the actual active node to restore service. We suspect the root cause is related to LACP on our aggregated interfaces.

1. Configuration Details
Model: PA‑3420

PAN‑OS Version: 11.1.6‑h3

HA Mode: Active‑Passive

HA Group: 25

Network Interfaces:

4 physical ports aggregated into AE‑group ae1 (Ethernet1/13‑16)

LACP enabled, “fast” rate

Preemptive: Disabled

2. Problem Description
Both peers continuously report themselves as “active.”

Traffic halts because no clean role transition occurs.

Manual reboot of the true active node is required to recover.

We have already disabled preemptive, but the behavior persists.

3. Relevant Log Messages
critical - link down description contains 'LACP interface ethernet1/13 moved out of AE-group ae1. Selection state Unselected (Link down)'
critical - description contains 'HA Group 25: Can't synchronize control plane data; some state may be lost on switchover'
4. What We’ve Checked So Far
LACP Status: All ports show “collecting/distributing” when up

Cabling & Switches: Verified with loopback tests—no physical-layer errors

Software Version: Running latest H3 for 11.1.6

HA Config: PSK, HA IPs, and settings matched on both peers

5. Questions
Which LACP or HA CLI parameters can we tweak to prevent AE‑group flapping during failover?

Are there any known bugs in 11.1.6‑h3 affecting HA synchronization or LACP?

Any recommended workarounds or best practices for stabilizing AE‑groups in HA setups?

Thanks in advance for any guidance or suggestions!

1 REPLY 1

Cyber Elite
Cyber Elite

are the firewalls "flapping" or are they both active at the same time?

 

in the latter case, this is due to a HA1 problem and you should ensure the HA1 link is up and reliable, and you have a backup interface or have set the "Heartbeat Backup" enabled on both peers

also make sure there are no other clusters in the same management network with the same Group ID

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization
  • 263 Views
  • 1 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!