Upgrading 4.07 to 4.1.2 in HA environment

gmoerschel · ‎01-26-2012

The following change log may be useful to all of you wondering how an upgrade goes in an HA active-passive pair. It would be nice if PAN support were to put this into a tech note. Each step is essentially a check or an observation from top to bottom.

2050 Firewall Upgrade 4.07 to 4.1.2 Log:

Pre download of PAN-OS 4.1.0 and 4.1.2 to both units

No commits pending

firewalls shows HA is in synch

Same version of dynamic updates are installed on both units

Backup configs from both and export them

On PAN2 (the current passive), suspend the unit via GUI:device/ha/suspend @ 9:17 am

PAN2 upgrade to 4.1.0 starting @ 9:18am

PAN2 reboot @ 9:22

PAN2 back up at 9:25 and auto com until 9:38. PAN2 comes up as non-functional state. Note that the unit will not be able to log you in for several minutes as the upgrade process is happening.

PAN2 Started 4.1.0 to 4.1.2 upgrade at 9:40

PAN2 reboot at 9:44

PAN2 back up at 9:50 and auto com until 9:53. PAN2 comes up as non-functional state.

Because 4.1.x HA is not compatible with 4.0.x HA there is no way to make the newly upgrade firewall the active. The older fw code must be suspended first and then the new fw is made functional which then brings it into the active state. Because of this limitation, the sessions going through the active will sever when it is suspended. Detail below:

do quickly PAN1 from cli: request high-availability state suspend

NOTE: active sessions will sever when active becomes suspended

do quickly PAN2 from cli: request high-availability state functional

PAN2 immediately becomes active. loss of 14 pings

PAN1 upgrade to 4.1.0 starting @ 10:56am

PAN2 (not PAN1!!) redefined pre-emption to 98 to be favored over PAN1 so that PAN1 does not become active upon reboot into initial 4.1.0 install.

PAN1 reboot @ 11:04

PAN1 back up at 11:10 and auto com until 11:20

PAN1 state shows as "initial" after auto com completes

PAN1 Started 4.1.0 to 4.1.2 upgrade at 11:23

PAN1 state transitioned to passive while 4.1.2 upgrade was running

PAN1 reboot at 11:27

PAN1 back up at 11:32 and auto com until 11:35. PAN1 comes up as passive.

HA dashboard widgets on both sides show no errors and states are synched.

PAN2 pre-emption number set back to 101 then PAN2 was suspended. At this point an auto com job started running on PAN2 after PAN1 became active. This job took about 1 minute. This may be a synch check.

PAN2 commit config (to set back preemption value to 101)

PAN2 "request high-availability state functional" and state then becomes passive

Process completed at 11:43 am

Post upgrade items to do:

Download GlobalProtect client

Activate GlobalProtect client

Enable user ID on outside zone per error message (probably a Global Protect requirement)

Ensure User-ID agent reconnects

Test GlobalProtect upgrade and function

BSadozai · ‎01-18-2013

Hello,

Thanks for the link, but I followed the process described on
this doc till point 4. And I had the status of the HA (no fonctional “peer
version mismatch or not the same”) and
at this point my understanding is that the HA is not functional and if I went
to point 6 to suspend the second device I
have had a loss of connectivity and didn’t know if the upgraded device will
take the active role and if yes after how many time, as I didn’t plan on this
evening any downtime of our internet line I did a rollback

First suspend the active unit
from the CLI run the command:
> request high-availability state suspend

or

From the GUI go to Device > High Availability > Operations > Suspend
local device.

Note: This will cause an HA failover. It is recommended to do this first
to verify the HA functionality is working before initiating the upgrade.
Verify network stability on
the new active device with the previously active device suspended.
Install the new PAN-OS on the
suspended device, then reboot the device to complete the install.
When the upgraded
device is rebooted, the CLI prompt should show passive
(or non-operational, if on a different major release ie 4.0 to 4.1) and the
PAN-OS version should reflect the new version.
On current passive device,
verify auto commit completes successfully (FIN OK) by running command: show
jobs all before proceeding to the next step.
Suspend second device (should
be current active device).
Upgrade the second device,
then reboot it. When second device restarts, the first device that was already
upgraded takes over as active.
As HA functionality was
verified (step 1) and the config was successfully pushed to the dataplane on
the new PAN-OS (step 5), the failover should be seamless.
When the second unit reboots
it will come up as the passive unit. Validate the auto commit completes on this
device by running command: show jobs all
on this device (as done in step 5) to complete the upgrade. The original active
device before the upgrade will be the active device now.

Thanks for your comment

BES

msullivan · ‎01-18-2013

I get why you'd be concerned, and the document could be improved by describing what is expected to happen at step 6. As you stated, it appears like you have a broken cluster and so FUD creeps in and so you are reluctant to pull the trigger and finish the upgrade.

In reality, all is well and when you suspend the active device at step 6, the newly upgraded device will take over, it just isn't documented or explained anywhere that I've seen. You might drop a couple of pings but that's it.

I'd like to hear from PAN support to explain what's going on at step 6. If I had to guess, it would have something to do with the HA interfaces being smart and taking action on the cluster, even though it looks broken at the time.

Open a support case and talk to them about it, then give it a shot!

Cheers,

Mike

Unlock your full community experience!

Upgrading 4.07 to 4.1.2 in HA environment

Upgrading 4.07 to 4.1.2 in HA environment

Show your appreciation!