PA460 issues

BigPalo · ‎07-02-2024

Hi,

We have two FW PA460 in HA, one active and another one passive. We have several issues related to configuration synchronization and HA:

1- Synchronization before a commit can take us up to 8 minutes. With the old FW the commit was in less than a minute and with these newer models we have gotten worse. It wouldn't affect us if it wasn't that in cases like FW OS updates we are out of service and we think this time should be improveable by this model

2- When there is a change from passive-active and active-passive we have a network cut of between 3 and 4 minutes. We have verified that it is not the LACP negotiation of the IFs but rather the FWs that are taking all this time to realize the cut or to make the change. HA is not useful to us if it takes 4 minutes to make the change. We bought both PA460's instead of just one so we could have HA and we're not getting the benefit of it. We have been advised that it appears to be a bug on the PA460/400, but after a year (we installed the FWs in June/July 2023) we still have the problem despite receiving updates.

3- We have certificates installed in "ghost" FWs that we don't see in the GUI. We created these a few months ago and they didn't show up once generated, but they are present in the config XML. We see it in two places:
A) When we commit we get warnings indicating that we have 3 duplicate certificates but we only have one in the GUI in "Device > Certificates".
B) From the CLI listing the certificates we have different unique certificates that we don't see in the GUI but we do in the CLI.

Any idea? i already detected PanOS > 9.1 are much lower than previous versions 😞

reaper · ‎07-02-2024

1. 8 minutes is extremely long, what PAN-OS are you using and how big is your configuration file (it displays that in the commit completed popup). are many admins connected at the same time or do you have many scheduled reports set for a short period of time?

2. this is also suspiciously long, failover normally takes milliseconds: what parameters are you using during the failover (are you manually forcing a failure or are these real failures?) what is being monitored (path, interface,...) to trigger failovers?

is LACP set to prenegotiate on the passive device (make sure to set the passive link state to auto)

3) how did you generate these certificates? did they used to be visible but became invisible after an upgrade? you could try to remove them from an exported XML and then reimport them

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

reaper · ‎07-02-2024

1) have you tried 'revert to running config', to refresh the candidate config? can't say any of my 400's are super slow...

2) those are the default timers, you could set the timers a little more agressively, but that would not explain minutes to fail over ...

Did you set up any Link and Path monitoring?

do change the passive link state to 'auto' so LACP can be prenegotiated

3)ive had this issue appear at another customer, we're considering opening a case at it appears to be a bug (tried several things to fix this but to no avail)

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

BigPalo · ‎07-09-2024

Issues were solved

reaper · ‎07-10-2024

The HA sync is a second commit that takes place after the commit completed locally, where the config is synced to the peer

This adds some latency to commits as you need to wait for the peer to have completed the commit before your entire opertion is deemed succesful

So this is sorta normal, but you can track this on both sides to see how long it takes for the config to be transferred and how quickly the peer commit starts and finishes

The EDL refresh may add a little latency, as that also happens every commit, to update all the EDLs you're currently using.

The device certificate fetch, however, is a little odd... does it appear after every commit? it should not happen frequently, it is the certificate used to communicate with Palo cloud services and is usually valid for 3 months (at which time it automatically refreshes)

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

BigPalo · ‎07-10-2024

Thanks as usual Reaper for your answer.

reaper · ‎07-11-2024

if you really really want to know, you can set both the mgmt and devsrv to debugging mode on both nodes, and push a commit (turn off debug mode once it's done) and then go thrawling through the debug logs (> less mp-log ms.log and devsrv.log, or collect a techsupport file from both) to see where there are delays during the commit process

or open a support case and have someone else dig through logs 😉

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

PavelK · ‎03-29-2025

I came across this KB: During HA failover on PA-400 series firewalls, interface link-up will take a time on the new Active ... where long failover time / outage during failover is similar to what is described in initial post. Maybe that was root cause.

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

PA460 issues

PA460 issues

PA460 app summary no data to display

Troubleshooting GlobalProtect MTU Issues

Support FAQ: How to Troubleshoot IPSec VPN Connectivity Issues

Panorama software download issues

HA Link Monitor Issue

SSL Inspection issues with GlobalProtect users

PANCast™ Episode 43: Troubleshooting Commit Issues

Unlock your full community experience!

PA460 issues

PA460 issues

Show your appreciation!

PA460 app summary no data to display

Troubleshooting GlobalProtect MTU Issues

Support FAQ: How to Troubleshoot IPSec VPN Connectivity Issues

Panorama software download issues

HA Link Monitor Issue

SSL Inspection issues with GlobalProtect users

PANCast™ Episode 43: Troubleshooting Commit Issues