Management Articles

Announcements
Customer Notice: Panorama Certificate Expiration on June 16 2017.  Read More >

How to upgrade a High Availability (HA) pair

by djipp on ‎10-29-2012 05:27 PM - edited on ‎11-28-2016 04:22 AM by (92,913 Views)

Overview

 

The following instructions for upgrading an HA pair are recommended because:

  • It verifies HA functionality before starting the upgrade.
  • It ensures the upgrade is successfully applied to the first device before starting the upgrade on the second.
  • At any point in the procedure, if any issue arises, the upgrade can be seamlessly reverted without any expected downtime (unless you are having any dynamic routing protocols line OSPF/BGP).
  • When finished, the final active/passive device state will be the same as it was before the upgrade with the fewest number of failovers possible (2).

Preparation:

 

  1. Take backup of the configuration as well as Tech Support from both HA Peers. Give proper names to each file.

    • Device > Setup > Operations > Save Named Configuration Snapshot

    • Device > Setup > Operations > Export Named configuration Snapshot

    • Device > Setup > Operations > Export Device State (If device managed from panorama)

    • Device > Support > Generate Tech Support File, and then download it. (Might be required if any issues)

  2. (Optional but recommended) Disable preemption on High Availability settings to avoid the possibility of unwanted failovers. Disabling preempt configuration change must be committed on both peers. Likewise, once completed, re-enabling must be commited on both peers.


    To disable preempt, go to Device > High Availability > Election Settings and uncheck Preemptive.  Then, perform a commit.


    pre-empt.png

     

     

  3.  If upgrade is between major versions (4.1 -> 5.0 OR 5.0-> 6.0), it is advisable to disable TCP-Reject-Non-SYN, so that sessions can failover even when they are not in sync.

    # set deviceconfig setting session tcp-reject-non-syn no
    # commit
    
  4. (Optional but recommended) Arrange for Out-of-Band access (Console access) to the firewall if possible. This is again to help recover from any unexpected situation where we are unable to login to the firewall

 

Steps: 

 

  1. First suspend the active unit from the CLI. Run the command:
    > request high-availability state suspend
    or
    From the GUI, go to Device > High Availability > Operations > Suspend local device.
    Note: This will cause an HA failover.  It is recommended to do this first to verify the HA functionality is working before initiating the upgrade.
  2. Verify network stability on the new active device with the previously active device suspended.
  3. Install the new PAN-OS on the suspended device, then reboot the device to complete the install. How to Upgrade PAN-OS and Panorama
  4. When the upgraded device is rebooted, the CLI prompt should show passive (or non-operational, if on a different major release ie 5.0 to 7.0) and the PAN-OS version should reflect the new version.
  5. On the current passive device, verify auto commit completes successfully (FIN OK) by running the command before proceeding to the next step:
    > show jobs all 
    Note: If the current passive device is in a non-functional state, run the following command to make it functional again: 
    > request high-availability state functional
  6. Suspend the second device (current active device).  When the second device is suspended, the first device, already upgraded, takes over as active.
    Note: If you have dynamic routing protocols like OSPF/BGP, suspending the current active device may cause a short traffic outage depending upon your network. This is due to the adjacency with existing neighbors going down and coming back up with the new active device. In order to eliminate this downtime, please enable Graceful Restart on the firewall and the neigboring devices. Please note that the Graceful Restart feature is only supported on PAN-OS version 6.0 or later.
  7. Upgrade the second device, then reboot it.
  8. When the second unit reboots, it will come up as the passive unit. Validate the auto commit completes on this device by running the following command (on this device (as done in step 5) to complete the upgrade)
  9. > show jobs all 

Note: For upgrading an Active-Active HA pair, following the same steps for upgrading the Active-Passive pair. All the steps and terms used for Active and Passive devices can be correlated to Active-Primary and Active-Secondary, respectively. Please be aware that this whole upgrade process might take upwards to 30 minutes.

 

How to Downgrade

If an issue occurs on the new version and a downgrade is necessary:

  • To revert to the previous PAN-OS screen, run the following CLI command:
    > debug swm revert 

This causes the firewall to boot from the partition in use prior to the upgrade. Nothing will be uninstalled and no configuration change will be made.

 

However please be aware while running this command -
After rebooting from a SWM revert, the configuration active at the time before upgrade will be loaded with the activation of the previous partion. Any configuration changes made after upgrade will not be accounted for and will need to be manually recovered by loading the latest configuration version and committing the changes.

 

See Also

How to Check the Status of an Auto-Commit

 

 

owner: djipp

Comments
by etnerual
on ‎03-20-2013 01:33 PM

Just wanted to share this info.  I upgraded from 3.x to 5.x and after performing step #6 from above both devices ended up in suspended mode - none were active.  Apparently, the unit that was upgraded to 5.x remaind in suspended mode because the other HA unit version was "too old."  I tried request high-availability state functional on the 5.x unit but that didn't work.  I was forced to disable HA on the 5.x unit for it to be functional again.

Lesson: Don't expect HA functionality to work after upgrading couple major releases.

by djipp
on ‎03-23-2013 04:37 AM

The correct upgrade path from 3.1 to 5.0 is 3.1.x -> 4.0.x -> 4.1.x -> 5.0.x.  Each step must be completed on both devices in the cluster before proceeding.  Upgrades directly from PANOS 3.x to 5.x should not be attempted in HA.

by etnerual
on ‎03-26-2013 12:36 AM

If you want to upgrade and retain all of your configs then PANOS will not allow you to skip major releases.

by MCmgt
on ‎03-20-2014 10:56 AM

Thanks, worked great today. Some enhancement ideas:

  • Include instructions for how to disable preemption.
  • Include GUI instructions for all steps.
  • Integrate doc-1115 to provide more detail in the "verify auto commit completes" step. It took me a while to figure this out. A note as to the time this step can complete (30minutes) would be good.
  • In the "upgrade the second device" step I noticed that the failover took place during the upgrade...before the reboot
by Evan.Sink
on ‎05-01-2015 09:32 AM

One thing that I ran into last night upgrading from 5.0.11 to 6.0.10 was that I could NOT seamlessly have the upgraded device take over. HA was incompatible, and I had to suspend the 5.0.10 active device in order for the 6.0.11 device to take over. There was a log entry that stated that the HA versions were mismatched and not compatible so I had to suspend the active device first. Lost 15 pings. It seems like Seamless upgrade of HA pair is only possible on minor release updates?

This guide is very accurate in the steps you take to upgrade an HA pair, but I wish it would have told me that major versions can not be seamlessly upgraded. I even called into PAN support and that was the only option he said we had.

by EdwinD
on ‎05-07-2015 05:40 PM

Evan.Sink,

Thank you very much for sharing this information.   I just ran into the same issue.

To clarify, once the PanOS 6.0.10 device was up and in a non functional state (Step 5) I did a  show jobs all as the step states and confirmed autocomplete was FIN OK.

The next part of step #5 didn't work. request high-availability state functional actually returned the message "Successfully changed HA state to functional", but the passive unit stayed in the state non-functional.

After reading your post, I performed on the current active member this command: request high-availability state suspend.   This caused the active PanOS 5.0.16 device to become suspended and the PanOS 6.0.10 device to become active.

Once I did this, I had almost 1000 certificates appear as OCSP revoked.   I couldn't get to https://google.com as the cert was showing revoked.   This required me to clear the OCSP cache on the Palo Alto Firewall as per View/Delete CRL and OCSP cache

by Evan.Sink
on ‎05-08-2015 05:44 AM

Glad I could help. I did the same thing suspending the device with the old OS, but I had to manually put the 6.0.10 device back into active mode. This caused me to lose 5-10 packets. this would be horrible if it was on the edge and you were doing it remote!

by clockhart
on ‎05-19-2015 06:59 AM

I just participated in an upgrade of two 7050 chassis running in active/active mode. Following the directions in this KB article.

There were a few modifications, but for the most part, the directions were spot on for our configuration. The notable omission is in waiting for the LPC disks to fully populate and mirror the RAID disks ... but if you wait between failovers 5-8 minutes, it isn't an issue. Here are the steps we used to complete the upgrade:

  1. Upgrade Panorama to 6.0.10 prior to upgrading gateways to 6.0.10.
  2. Establish web UI session for both active and passive Palo units.
  3. Establish ssh session for both active and passive Palo units.
  4. Disable preemption on active-primary device.
  5. Committed config.
  6. Disabled preemption on active-secondary device.
  7. Committed config.
  8. Verify network connectivity for critical applications to ensure commit had no effect on traffic processing (it should not).
    1. From CLI: “show session all” to ensure sessions still being processed by Unit1.
  9. From CLI Unit 1: "request high-availability state suspend” command.
  10. Verify network connectivity for critical applications.
  11. From CLI Unit1: “show session all” to ensure sessions being processed by Unit2 (now active-primary).
  12. Run software upgrade install process from WebUI on Unit1 (now active-secondary).
  13. Software upgrade will require a reboot.
  14. Reboot Unit1.
  15. After reboot completes, re-establish ssh connection to Unit1.
  16. Request CLI command to make HA functional again: “request high-availability state functional”.
  17. Unit1 becomes active-secondary device.
  18. On Unit1, wait for LPC raid disks to populate and become Ready state before attempting to make Unit1 a primary. This is outlined in The PA-7050 Hardware Guide.
    1. Basically running “show system raid detail”
    2. Wait for all disks to be present and available.
  19. Again, validate that your network traffic is still traversing Unit2. You will next place Unit2 into suspend mode.
  20. From Unit2 CLI: "request high-availability state suspend” command.
  21. Verify network connectivity for critical applications. All traffic should be traversing Unit1. Unit2 is now active-secondary. Unit1 is active-primary.
  22. Run software upgrade install process from WebUI on Unit2 (now active-secondary).
  23. Software upgrade will require a reboot.
  24. Reboot Unit2.
  25. After reboot completes, re-establish ssh connection to Unit2.
  26. Request CLI command to make HA functional again: “request high-availability state functional”.
  27. Unit2 becomes active-secondary device.
  28. Modify preemption settings on Unit1 (lower priority=100) using WebUI. Commit change.
  29. Modify preemption settings on Unit2 (higher priority=200) using WebUI. Commit change.

Note: on the 7050 chassis, if you are moving to 6.0.10, you must also enable the HA1 Backup Link in the High Availability settings.

You should be all set.

by sworton
on ‎05-26-2015 02:39 PM

MCmgt wrote:

In the "upgrade the second device" step I noticed that the failover took place during the upgrade...before the reboot

I noticed this too, when I suspended the 2nd device the failover happened straight away, I wasn't expecting it until the reboot. I'm going from 6.0.3-h to 6.1.4 via 6.1.0

by spiromruen
on ‎06-01-2015 07:20 PM

As per PAN-OS release note, when upgrading H/A across major version (i.e. 4.1 -> 5.0,  5.0 -> 6.0, 6.0 -> 6.1), session table does not synchronize.

To alleviate tcp session being dropped, temporary configure both PA to disable tcp 3-way-handshake inspection until upgrade is completed for both PA

configure

set deviceconfig setting session tcp-reject-non-syn  no

commit

exit

If commit was success then double check that tcp-reject-non-syn is definitely disable (value is FALSE)

show session info

Once upgrade is completed, roll back to default configuration

configure

set deviceconfig setting session tcp-reject-non-syn  yes

commit

exit

show session info

by MMCiobanu
on ‎07-24-2015 10:03 AM

These instructions say to upgrade the passive device first, then the active one:

Upgrade an HA Firewall Pair to PAN-OS 6.0

Does it matter?

by clockhart
‎07-24-2015 01:04 PM - edited ‎01-26-2016 10:25 AM

The benefit of upgrading the passive device first is that you do not have to incur a failover before running the upgrade on the passive device. You can validate the upgrade before you fail to the active, newly upgraded unit. On the other hand, the benefit of failing the active unit over to the passive is that you are testing the viability of your current HA configuration; a successful failover to the passive unit indicates the configuration is correct. But essentially, you would then upgrade the unit that is passive, which you will be doing anyway because the device reboot will force the member to become passive. I would recommend upgrading the primary device after forcing it to standby. Kills two birds with one stone by testing current HA. If you didn't want to test the current HA configuration, you would simply upgrade the passive device first without failing over in the initial step. I hope this clarifies things a little bit.

by clockhart
on ‎07-24-2015 01:05 PM

Signed, Captain Obvious. :smileywink:

by M.Alzahaby
on ‎08-07-2015 05:59 PM

If you suspend the currently active firewall, the other peer will take over.

HA Suspend.JPG

by Peet_SBI
on ‎01-28-2016 02:34 AM

So when upgrading, do I have to start with the x.0 version before I can install the latest in that train or can I go directly?

 

We're currently running 6.0.10 so I'd like to know if I can go to 6.1.8 directly or do I have to install 6.1.0 first.

by TomDuong
‎02-26-2016 05:03 AM - edited ‎02-26-2016 05:04 AM

What is the community's thougths on upgrading just one node and let it run for a week or two before upgrading the second node? 

by MCmgt
on ‎03-11-2016 05:42 AM

Again, having just upgraded to 6.1.10, I am thankful for this handy reference document.

 

This document has been heavily edited. I think there a couple of potentially misleading artifacts of old versions that should be removed.

 

Step 8 can be deleted as Step 7 no longer results in a failover.

 

This statement in Step 9 is misleading:

 

"The original active device before the upgrade will be the active device now."

 

The original active device before the upgrade becomes the active device again at Step 6.

 

Suggested improvement: include the optional preempt disabling/re-enabling as formal (optional) steps. The current situation facilitates error. In the half hour it takes me to do the upgrade I may forget the preempt issue. Let's explicitly include it as the last step to remove that chance. 

 

 

by
on ‎03-11-2016 12:15 PM

@Peet_SBI, When upgrading, you have to Download the .0 release, then the version that you would like to go to. You DO NOT need to install the .0 version, just have it downloaded.

 

@TomDuong, If you are upgrading within versions.. ex. 6.1.3 to 6.1.4, then this might be OK. But if this is for any major release, ex. 6.1.3 to 7.0.2, then it would not work properly. 

Either way, we do not recommend you stay out of sync version wise, as this could cause unknown issues.

by lsaintig
on ‎08-03-2016 05:43 AM

Excellent, easy to follow instructions. Worked for us like a charm! We run 5060's in active-passive.

by Abdool_Yacub
on ‎08-21-2016 12:30 PM

What happend if you don't, "Disable preemption on High Availability settings" will both devices go into Standby when you update each one at a time? 

by fozail
on ‎12-04-2016 01:12 AM

Hi Abdool Yacub,

 

Preemption a feature ensures that Primary Appliance should always process the traffic actively as long as all is well with that appliance. Hence to avoid unwated multiple fail-over while performing PAN-OS upgrade it is higly suggested to disbale preemption on both appliances.

 

Best Regards,

Fozail

by JeroenL
‎02-07-2017 09:49 AM - edited ‎02-07-2017 10:23 AM

Hi all,

 

I'm planning to upgrade a PA-3050 cluster from 6.1.10 to 7.1.7.

Can I suspend the active member and upgrade it first to 7.0.1 and then 7.1.7 and then upgrade the second one to 7.0.1 and 7.1.7?

Or is it advised to upgraded both members to 7.0.1, bring the cluster back up perform the same steps to upgrade to 7.1.7?

 

Thanks for the feedback.

 

Best regards,

Jeroen

by FJU-ITCS
on ‎02-08-2017 04:53 AM

Hi Jeroen,

 

if you first update the first node from 6.1.10 -> 7.1.7, the node will NOT be come functional and the HA is not working. The best way is to update one node to 7.0.x, make it active again an test if every think working. Then update the second one and rebuild the HA. After this you can update to 7.1.x.

 

 

by TimmyLamar
on ‎03-19-2017 08:46 AM

i got upgrade plan in 5 days time, and i am doing simulation now.

the pair is 2050 and the OS is 4.1.11 to 7.1.6

 

i did tested on my lab with 5050, and i got issue on the upgraded pair.

 

so i did upgrade the passive to 6.0.0 (active still in 4.1.11)

i did disable non-syn with "set deviceconfig setting session tcp-reject-non-syn no" for both

 

after reboot, the passive HA status become suspended and unable to change HA mode to "function"

 

i tried suspend the active, but still the passive on suspended mode.

 

the only way i can force the traffic to the passive with disable the HA fully from the GUI "untick" the enable HA (and suspend the active FW)

 

is this expected?

 

anyone can help?

by
on ‎03-20-2017 05:42 AM

Hi @TimmyLamar

 

Did you use the below command to unsuspend the passive device: 

> request high-availability state functional

If this didn't change the passive device's state to non-functional (it will never become "passive" as there is a PAN-OS mismatch, non-functional indicates it is participating in the cluster but currently in a failed state. It will, however, become active if the primary unit goes offline), there might be a bug: make sure you install the latest maintenance release, even for intermediate steps: Best Practices for PAN-OS Upgrade

 

in a HA environment with mismatching PAN-OS versions, the lowest version will always assume an active role until it is rebooted or manually suspended, the highest version will go into a "non-functional" state (which is a participative role limited to 'last resort' if the primary goes offline)

 

by santonic
a month ago

Weren't these guides available as a PDF download as well? Is that option gone? Or i just can't find it before i get my morning coffee?

by
a month ago

@santonic they used to be (some older ones are still out there) but since those are a real mess to keep updated, we stopped doing that

Register now
Ask Questions Get Answers Join the Live Community