- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
01-08-2025 04:58 AM
This is a notification for anyone running 1420 boxes in a high-availability (HA) configuration in their environments >11.1.4. We recently encountered significant issues with our NGFW operating in HA mode. Specifically, the HA setup failed on the active firewall, and the failover did not occur as expected to the secondary (standby) device. I had to manually suspend the active firewall to enable the secondary to take over, resulting in a brief but severe outage.
A ticket has been opened with TAC, and it appears we are among the first customers to report this issue. TAC has confirmed it to be an LACP-related bug that originated in version 11.1.4. Unfortunately, even downgrading to the preferred version 11.1.4-h7 does not resolve the problem. According to TAC, this issue is still under internal investigation and has not yet been made public. Their senior engineers are actively working on it.
As a temporary workaround, I configured LACP to passive mode between the firewall and the core switch. So far, this adjustment has stabilized the HA setup, with both firewalls operating in Active/Standby mode without errors.
If you encounter a similar issue, provide TAC with the following reference issue: PAN-275888.
01-13-2025 12:38 PM
TAC confirmed they introduced a new feature to help with performance in conditions where the packet rate is low, however this "feature" has an impact when packets spiked causing the data plane to crash. they recommend upgrading to 11.1.6 (not preferred) to fix issue PAN-263208.
01-15-2025 10:44 PM
Hi , Thanks for sharing
Looks like for us its LACP issue (we encountered an issue where the firewall was unable to learn the MAC address from the core switch and all the services found unreachable after upgrading from 11.0.3-h10 to 11.1.2-h15 )
02-14-2025 06:46 AM
we had multiple interface failures, so even after upgrading to 11.1.6 and RMA the box, we just had another similar crash 02-13. all interfaces (OOB, Inside and DMZ) went down, causing of course the OSPF to fail and therefore no internet for all users. the failover didn't happen, even though the HA is enabled. we upgraded the ticket to critical and TAC are still trying to figure out this garbage.
02-16-2025 03:44 PM
We had the same issue this morning. Lacp had failed from core cisco 9500 switch to palo 1420 to primary firewall. The ha firewall never failed over and I had to reboot the firewall get all layer 3 connectivity back up. I have a case open with tac and they are researching tsf. On preferred code version 11.1.4h7. Waiting for an update and wondering if this same potential bug can effect our 5260 firewalls to cause lacp failure.
02-17-2025 01:10 AM
There is a lot common for this LACP issue 1420 model , We got below recommendation for PA TAC for upgrade issue ?
************************************************************************************************************
Root Cause:
The issue is triggered when the HA device has a ha_group_id set to nonzero, causing both devices to have the same system MAC address, regardless of whether the configuration option "Same System MAC Address for Active-Passive HA" is disabled or not.
Conditions to trigger the issue: HA A/P is enabled, LACP is enabled on AE interfaces with Enable in HA Passive State' is enabled and 'Same System MAC Address for Active-Passive HA' is disabled
To verify this, you can run the command:
show lacp aggregate-ethernet all
If the System MAC address appears the same on both devices, this confirms the issue. We have also successfully reproduced this in our lab environment.
Issue Details & Fix Versions:
Issue ID: PAN-278296
Fix Versions:
11.1.8 – March 6, 2025
11.2.8 – May 5, 2025
11.1.11 – June 6, 2025
12.1.2 – (Date TBD)
02-17-2025 05:02 AM
We recently encountered the same issue even after replacing the device through an RMA. As a result, we escalated the support case to a Critical priority, prompting involvement from Palo Alto Engineering, as the problem began impacting production during peak hours. The most recent incident occurred on February 13, 2025.
Following their investigation, TAC confirmed that the root cause of the PA crash is related to processing a DNS over HTTPS (DoH) packet. This feature was introduced in PAN-OS 11.0 and is enabled by default in PAN-OS 11.0.x. This explanation aligns with our environment, as we run DNS Security on our Edge PA but not on our SD-WAN PA, despite both operating on the same software version.
According to TAC, this issue was expected to be resolved in PAN-OS 11.1.6, which we upgraded to in January. However, the problem persists. If you are running the DNS Security license on your firewalls, it is recommended to execute the specified command on both devices in an HA pair until the release of the PAN-OS 11.1.8 fix, scheduled for the first week of March. Ensure you save your configuration as a precaution and verify with TAC whether DNS Security is enabled in your environment before proceeding.
Additionally, regarding the failover issue, TAC confirmed that it did not occur because the system MAC address remains the same on both devices, as previously discussed in this thread. You can run the second command to verify this. This issue is also expected to be addressed in PAN-OS 11.1.8.
set deviceconfig setting dns-over-https enable no
show lacp aggregate-ethernet all
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!