During the last PAN OS upgrade we had to failover between two firewalls in HA configuration. The failover time takes unusually amount of time during which the Internet access was unavailable. It took approximately 10-15 lost pings (to internet host) for passive to become an active. We had opened a case with PAN support and our zoom meeting was dropping, it was reconnecting after about 15 sec automatically. In one of my previous jobs the failover was taking very quickly, i would lost 1 or 2 pings 18.104.22.168..
Our HA setup is like this:
HA1 - over aux-1
HA2 - over eth1/10
Mode is active-passive/the config sync is enabled/passive link state is auto/preemptive is not setup/LACP-LLDP is not configured/Link and path monitorings are enabled/
Wondering if someone had simmilar experience and what was the solution to speed up the failover.
This is the best practice for upgrades. Hopefully you didnt just reboot the firewall and instead used the 'Ssupend' Feature.
Appreciate your concern; i have been working with PAN for a quite some time and never had an issue with OS upgrade but that was not my question...
this sounds like a spanning-tree issue- the time it takes for that port to come up - could be STP going through the listening learning .....stages
What @OtakarKlier was rightfully pointing out was that the proper upgrade procedure would likely have prevented any extended failover outage, as it's a 'clean' way of switching active status to the other peer. If you simply restarted the firewall as part of the upgrade procedure without suspending it, there are a variety of settings that could cause an extended period of time to elapse before traffic starts flowing through the peer unit. If this is the case, we could actually recommend looking at different log files instead of looking for configuration / configuration issues that would cause extended failover time.
There are a variety of settings that I would look at to narrow it down. The first being if LACP aggregates are in use at all, then the HA timer settings deployed on the device, that STP is setup correctly on your switches (took me to long to type this reply, +1 to @Sec101 for being technically the first person to bring up STP), and lastly what @Gertjan-HFG mentioned with his PPOE suggestion. You've already said no to two of the four, so the remaining two are things you should look into.
FYI, your comment to @OtakarKlier came off a little rude. Please keep in mind that some suggestions or comments, when answered, will lead to your solution. As mentioned earlier, the order that you performed the upgrade is actually highly important in knowing where we should actually be looking for issues. So if you didn't actually follow recommended procedures, we kinda need to know about it so we don't send you down a rabbit hole troubleshooting the wrong thing.
Unless someone has the title 'Community Manager', everyone that comments on this post is devoting time out of our day to help you answer a question/problem you are having. Please consider that when responding to someone spending part of their day helping others on this forum.
Thanks BPry, appreciate your comment and time spent answering question.
The 'power off' upgrade process was less likely applicable for my case as i mentioned in the original post that during my previous jobs i had successfully performed PAN OS upgrade, losing 1,2 pings which makes me believe i did follow the upgrade process properly and not just powering off the firewall. I would suspect that powering off the firewall will cause more lost pings.
I am leaning more towards LACP settings at this moment.
I am strong believer that this community is a great place to get answer, sharing ideas, best practice, tips and trick and that everyone's time is valuable, including mine. Being in this line of work for quite some time, i do understand the importance of right information so i am trying to put as much useful details in the original post as possible which should allow people willing to assist to be pointed into right direction. If there is not enough information, it is much easier to ask question instead making assumptions. I think everyone will benefit from this..
Also in version 8 you can modify the parameters that are used to speed this up a bit. Device tab->high availability->General tab->election settings. I would say set them to aggressive and give it a test.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!