The following change log may be useful to all of you wondering how an upgrade goes in an HA active-passive pair. It would be nice if PAN support were to put this into a tech note. Each step is essentially a check or an observation from top to bottom.
2050 Firewall Upgrade 4.07 to 4.1.2 Log:
Pre download of PAN-OS 4.1.0 and 4.1.2 to both units
No commits pending
firewalls shows HA is in synch
Same version of dynamic updates are installed on both units
Backup configs from both and export them
On PAN2 (the current passive), suspend the unit via GUI:device/ha/suspend @ 9:17 am
PAN2 upgrade to 4.1.0 starting @ 9:18am
PAN2 reboot @ 9:22
PAN2 back up at 9:25 and auto com until 9:38. PAN2 comes up as non-functional state. Note that the unit will not be able to log you in for several minutes as the upgrade process is happening.
PAN2 Started 4.1.0 to 4.1.2 upgrade at 9:40
PAN2 reboot at 9:44
PAN2 back up at 9:50 and auto com until 9:53. PAN2 comes up as non-functional state.
Because 4.1.x HA is not compatible with 4.0.x HA there is no way to make the newly upgrade firewall the active. The older fw code must be suspended first and then the new fw is made functional which then brings it into the active state. Because of this limitation, the sessions going through the active will sever when it is suspended. Detail below:
do quickly PAN1 from cli: request high-availability state suspend
NOTE: active sessions will sever when active becomes suspended
do quickly PAN2 from cli: request high-availability state functional
PAN2 immediately becomes active. loss of 14 pings
PAN1 upgrade to 4.1.0 starting @ 10:56am
PAN2 (not PAN1!!) redefined pre-emption to 98 to be favored over PAN1 so that PAN1 does not become active upon reboot into initial 4.1.0 install.
PAN1 reboot @ 11:04
PAN1 back up at 11:10 and auto com until 11:20
PAN1 state shows as "initial" after auto com completes
PAN1 Started 4.1.0 to 4.1.2 upgrade at 11:23
PAN1 state transitioned to passive while 4.1.2 upgrade was running
PAN1 reboot at 11:27
PAN1 back up at 11:32 and auto com until 11:35. PAN1 comes up as passive.
HA dashboard widgets on both sides show no errors and states are synched.
PAN2 pre-emption number set back to 101 then PAN2 was suspended. At this point an auto com job started running on PAN2 after PAN1 became active. This job took about 1 minute. This may be a synch check.
PAN2 commit config (to set back preemption value to 101)
PAN2 "request high-availability state functional" and state then becomes passive
Process completed at 11:43 am
Post upgrade items to do:
Download GlobalProtect client
Activate GlobalProtect client
Enable user ID on outside zone per error message (probably a Global Protect requirement)
Ensure User-ID agent reconnects
Test GlobalProtect upgrade and function
I wanted to upgrade our PAN 2050 yesterday evening from 4.1.8 to 5.0.1
directly and had issue with error message on my active unit seeing that (peer
non functional and peer version no compatible or not the same as local,
something like that).
After searching and didn't find any doc that explain the process upgrade
from 4.1.x to 5.0.X I did a downgrade, and today I saw this discussion that is
very helpful. do you thing that I have follow the same process described here
to upgrade to 5.0.1 and do you thing that is really mandatory to pass first by
5.0.0 before going to 5.0.1.?
Thanks for your help
Thanks for your quick answer, but in fact I already did that, i mean I downloaded on both my devices active and passive first v5.0.0 (without installing as was specified by the device, just download) and after that I downloaded the v5.0.1 and installed on passive unit, but after the reboot of the passive unit I had the status non fonctional and on the active I saw the HA status ( something like peer non fonctional (peer version nocompatible or mismatch) so I did the downgrade.
Could be very helpfull from Palo to have a official process or doc for that, I really dont understand way there is no doc for that.
Thanks for the link, but I followed the process described on
this doc till point 4. And I had the status of the HA (no fonctional “peer
version mismatch or not the same”) and
at this point my understanding is that the HA is not functional and if I went
to point 6 to suspend the second device I
have had a loss of connectivity and didn’t know if the upgraded device will
take the active role and if yes after how many time, as I didn’t plan on this
evening any downtime of our internet line I did a rollback
Thanks for your comment
I get why you'd be concerned, and the document could be improved by describing what is expected to happen at step 6. As you stated, it appears like you have a broken cluster and so FUD creeps in and so you are reluctant to pull the trigger and finish the upgrade.
In reality, all is well and when you suspend the active device at step 6, the newly upgraded device will take over, it just isn't documented or explained anywhere that I've seen. You might drop a couple of pings but that's it.
I'd like to hear from PAN support to explain what's going on at step 6. If I had to guess, it would have something to do with the HA interfaces being smart and taking action on the cluster, even though it looks broken at the time.
Open a support case and talk to them about it, then give it a shot!
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!