How to upgrade a High Availability (HA) pair

by djipp on ‎10-29-2012 05:27 PM - edited 3 weeks ago by (107,861 Views)

This article is outdated, for updated information please refer to: 
Best Practices for PAN-OS Upgrade

 

 

Comments
by etnerual
on ‎03-20-2013 01:33 PM

Just wanted to share this info.  I upgraded from 3.x to 5.x and after performing step #6 from above both devices ended up in suspended mode - none were active.  Apparently, the unit that was upgraded to 5.x remaind in suspended mode because the other HA unit version was "too old."  I tried request high-availability state functional on the 5.x unit but that didn't work.  I was forced to disable HA on the 5.x unit for it to be functional again.

Lesson: Don't expect HA functionality to work after upgrading couple major releases.

by djipp
on ‎03-23-2013 04:37 AM

The correct upgrade path from 3.1 to 5.0 is 3.1.x -> 4.0.x -> 4.1.x -> 5.0.x.  Each step must be completed on both devices in the cluster before proceeding.  Upgrades directly from PANOS 3.x to 5.x should not be attempted in HA.

by etnerual
on ‎03-26-2013 12:36 AM

If you want to upgrade and retain all of your configs then PANOS will not allow you to skip major releases.

by MCmgt
on ‎03-20-2014 10:56 AM

Thanks, worked great today. Some enhancement ideas:

  • Include instructions for how to disable preemption.
  • Include GUI instructions for all steps.
  • Integrate doc-1115 to provide more detail in the "verify auto commit completes" step. It took me a while to figure this out. A note as to the time this step can complete (30minutes) would be good.
  • In the "upgrade the second device" step I noticed that the failover took place during the upgrade...before the reboot
by Evan.Sink
on ‎05-01-2015 09:32 AM

One thing that I ran into last night upgrading from 5.0.11 to 6.0.10 was that I could NOT seamlessly have the upgraded device take over. HA was incompatible, and I had to suspend the 5.0.10 active device in order for the 6.0.11 device to take over. There was a log entry that stated that the HA versions were mismatched and not compatible so I had to suspend the active device first. Lost 15 pings. It seems like Seamless upgrade of HA pair is only possible on minor release updates?

This guide is very accurate in the steps you take to upgrade an HA pair, but I wish it would have told me that major versions can not be seamlessly upgraded. I even called into PAN support and that was the only option he said we had.

by EdwinD
on ‎05-07-2015 05:40 PM

Evan.Sink,

Thank you very much for sharing this information.   I just ran into the same issue.

To clarify, once the PanOS 6.0.10 device was up and in a non functional state (Step 5) I did a  show jobs all as the step states and confirmed autocomplete was FIN OK.

The next part of step #5 didn't work. request high-availability state functional actually returned the message "Successfully changed HA state to functional", but the passive unit stayed in the state non-functional.

After reading your post, I performed on the current active member this command: request high-availability state suspend.   This caused the active PanOS 5.0.16 device to become suspended and the PanOS 6.0.10 device to become active.

Once I did this, I had almost 1000 certificates appear as OCSP revoked.   I couldn't get to https://google.com as the cert was showing revoked.   This required me to clear the OCSP cache on the Palo Alto Firewall as per View/Delete CRL and OCSP cache

by Evan.Sink
on ‎05-08-2015 05:44 AM

Glad I could help. I did the same thing suspending the device with the old OS, but I had to manually put the 6.0.10 device back into active mode. This caused me to lose 5-10 packets. this would be horrible if it was on the edge and you were doing it remote!

by clockhart
on ‎05-19-2015 06:59 AM

I just participated in an upgrade of two 7050 chassis running in active/active mode. Following the directions in this KB article.

There were a few modifications, but for the most part, the directions were spot on for our configuration. The notable omission is in waiting for the LPC disks to fully populate and mirror the RAID disks ... but if you wait between failovers 5-8 minutes, it isn't an issue. Here are the steps we used to complete the upgrade:

  1. Upgrade Panorama to 6.0.10 prior to upgrading gateways to 6.0.10.
  2. Establish web UI session for both active and passive Palo units.
  3. Establish ssh session for both active and passive Palo units.
  4. Disable preemption on active-primary device.
  5. Committed config.
  6. Disabled preemption on active-secondary device.
  7. Committed config.
  8. Verify network connectivity for critical applications to ensure commit had no effect on traffic processing (it should not).
    1. From CLI: “show session all” to ensure sessions still being processed by Unit1.
  9. From CLI Unit 1: "request high-availability state suspend” command.
  10. Verify network connectivity for critical applications.
  11. From CLI Unit1: “show session all” to ensure sessions being processed by Unit2 (now active-primary).
  12. Run software upgrade install process from WebUI on Unit1 (now active-secondary).
  13. Software upgrade will require a reboot.
  14. Reboot Unit1.
  15. After reboot completes, re-establish ssh connection to Unit1.
  16. Request CLI command to make HA functional again: “request high-availability state functional”.
  17. Unit1 becomes active-secondary device.
  18. On Unit1, wait for LPC raid disks to populate and become Ready state before attempting to make Unit1 a primary. This is outlined in The PA-7050 Hardware Guide.
    1. Basically running “show system raid detail”
    2. Wait for all disks to be present and available.
  19. Again, validate that your network traffic is still traversing Unit2. You will next place Unit2 into suspend mode.
  20. From Unit2 CLI: "request high-availability state suspend” command.
  21. Verify network connectivity for critical applications. All traffic should be traversing Unit1. Unit2 is now active-secondary. Unit1 is active-primary.
  22. Run software upgrade install process from WebUI on Unit2 (now active-secondary).
  23. Software upgrade will require a reboot.
  24. Reboot Unit2.
  25. After reboot completes, re-establish ssh connection to Unit2.
  26. Request CLI command to make HA functional again: “request high-availability state functional”.
  27. Unit2 becomes active-secondary device.
  28. Modify preemption settings on Unit1 (lower priority=100) using WebUI. Commit change.
  29. Modify preemption settings on Unit2 (higher priority=200) using WebUI. Commit change.

Note: on the 7050 chassis, if you are moving to 6.0.10, you must also enable the HA1 Backup Link in the High Availability settings.

You should be all set.

by sworton
on ‎05-26-2015 02:39 PM

MCmgt wrote:

In the "upgrade the second device" step I noticed that the failover took place during the upgrade...before the reboot

I noticed this too, when I suspended the 2nd device the failover happened straight away, I wasn't expecting it until the reboot. I'm going from 6.0.3-h to 6.1.4 via 6.1.0

by spiromruen
on ‎06-01-2015 07:20 PM

As per PAN-OS release note, when upgrading H/A across major version (i.e. 4.1 -> 5.0,  5.0 -> 6.0, 6.0 -> 6.1), session table does not synchronize.

To alleviate tcp session being dropped, temporary configure both PA to disable tcp 3-way-handshake inspection until upgrade is completed for both PA

configure

set deviceconfig setting session tcp-reject-non-syn  no

commit

exit

If commit was success then double check that tcp-reject-non-syn is definitely disable (value is FALSE)

show session info

Once upgrade is completed, roll back to default configuration

configure

set deviceconfig setting session tcp-reject-non-syn  yes

commit

exit

show session info

by MMCiobanu
on ‎07-24-2015 10:03 AM

These instructions say to upgrade the passive device first, then the active one:

Upgrade an HA Firewall Pair to PAN-OS 6.0

Does it matter?

by clockhart
‎07-24-2015 01:04 PM - edited ‎01-26-2016 10:25 AM

The benefit of upgrading the passive device first is that you do not have to incur a failover before running the upgrade on the passive device. You can validate the upgrade before you fail to the active, newly upgraded unit. On the other hand, the benefit of failing the active unit over to the passive is that you are testing the viability of your current HA configuration; a successful failover to the passive unit indicates the configuration is correct. But essentially, you would then upgrade the unit that is passive, which you will be doing anyway because the device reboot will force the member to become passive. I would recommend upgrading the primary device after forcing it to standby. Kills two birds with one stone by testing current HA. If you didn't want to test the current HA configuration, you would simply upgrade the passive device first without failing over in the initial step. I hope this clarifies things a little bit.

by clockhart
on ‎07-24-2015 01:05 PM

Signed, Captain Obvious. :smileywink:

by M.Alzahaby
on ‎08-07-2015 05:59 PM

If you suspend the currently active firewall, the other peer will take over.

HA Suspend.JPG

by Peet_SBI
on ‎01-28-2016 02:34 AM

So when upgrading, do I have to start with the x.0 version before I can install the latest in that train or can I go directly?

 

We're currently running 6.0.10 so I'd like to know if I can go to 6.1.8 directly or do I have to install 6.1.0 first.

by TomDuong
‎02-26-2016 05:03 AM - edited ‎02-26-2016 05:04 AM

What is the community's thougths on upgrading just one node and let it run for a week or two before upgrading the second node? 

by MCmgt
on ‎03-11-2016 05:42 AM

Again, having just upgraded to 6.1.10, I am thankful for this handy reference document.

 

This document has been heavily edited. I think there a couple of potentially misleading artifacts of old versions that should be removed.

 

Step 8 can be deleted as Step 7 no longer results in a failover.

 

This statement in Step 9 is misleading:

 

"The original active device before the upgrade will be the active device now."

 

The original active device before the upgrade becomes the active device again at Step 6.

 

Suggested improvement: include the optional preempt disabling/re-enabling as formal (optional) steps. The current situation facilitates error. In the half hour it takes me to do the upgrade I may forget the preempt issue. Let's explicitly include it as the last step to remove that chance. 

 

 

by
on ‎03-11-2016 12:15 PM

@Peet_SBI, When upgrading, you have to Download the .0 release, then the version that you would like to go to. You DO NOT need to install the .0 version, just have it downloaded.

 

@TomDuong, If you are upgrading within versions.. ex. 6.1.3 to 6.1.4, then this might be OK. But if this is for any major release, ex. 6.1.3 to 7.0.2, then it would not work properly. 

Either way, we do not recommend you stay out of sync version wise, as this could cause unknown issues.

by lsaintig
on ‎08-03-2016 05:43 AM

Excellent, easy to follow instructions. Worked for us like a charm! We run 5060's in active-passive.

by Abdool_Yacub
on ‎08-21-2016 12:30 PM

What happend if you don't, "Disable preemption on High Availability settings" will both devices go into Standby when you update each one at a time? 

by fozail
on ‎12-04-2016 01:12 AM

Hi Abdool Yacub,

 

Preemption a feature ensures that Primary Appliance should always process the traffic actively as long as all is well with that appliance. Hence to avoid unwated multiple fail-over while performing PAN-OS upgrade it is higly suggested to disbale preemption on both appliances.

 

Best Regards,

Fozail

by JeroenL
‎02-07-2017 09:49 AM - edited ‎02-07-2017 10:23 AM

Hi all,

 

I'm planning to upgrade a PA-3050 cluster from 6.1.10 to 7.1.7.

Can I suspend the active member and upgrade it first to 7.0.1 and then 7.1.7 and then upgrade the second one to 7.0.1 and 7.1.7?

Or is it advised to upgraded both members to 7.0.1, bring the cluster back up perform the same steps to upgrade to 7.1.7?

 

Thanks for the feedback.

 

Best regards,

Jeroen

by FJU-ITCS
on ‎02-08-2017 04:53 AM

Hi Jeroen,

 

if you first update the first node from 6.1.10 -> 7.1.7, the node will NOT be come functional and the HA is not working. The best way is to update one node to 7.0.x, make it active again an test if every think working. Then update the second one and rebuild the HA. After this you can update to 7.1.x.

 

 

by TimmyLamar
on ‎03-19-2017 08:46 AM

i got upgrade plan in 5 days time, and i am doing simulation now.

the pair is 2050 and the OS is 4.1.11 to 7.1.6

 

i did tested on my lab with 5050, and i got issue on the upgraded pair.

 

so i did upgrade the passive to 6.0.0 (active still in 4.1.11)

i did disable non-syn with "set deviceconfig setting session tcp-reject-non-syn no" for both

 

after reboot, the passive HA status become suspended and unable to change HA mode to "function"

 

i tried suspend the active, but still the passive on suspended mode.

 

the only way i can force the traffic to the passive with disable the HA fully from the GUI "untick" the enable HA (and suspend the active FW)

 

is this expected?

 

anyone can help?

by
on ‎03-20-2017 05:42 AM

Hi @TimmyLamar

 

Did you use the below command to unsuspend the passive device: 

> request high-availability state functional

If this didn't change the passive device's state to non-functional (it will never become "passive" as there is a PAN-OS mismatch, non-functional indicates it is participating in the cluster but currently in a failed state. It will, however, become active if the primary unit goes offline), there might be a bug: make sure you install the latest maintenance release, even for intermediate steps: Best Practices for PAN-OS Upgrade

 

in a HA environment with mismatching PAN-OS versions, the lowest version will always assume an active role until it is rebooted or manually suspended, the highest version will go into a "non-functional" state (which is a participative role limited to 'last resort' if the primary goes offline)

 

by santonic
on ‎03-30-2017 11:19 PM

Weren't these guides available as a PDF download as well? Is that option gone? Or i just can't find it before i get my morning coffee?

by
on ‎03-30-2017 11:54 PM

@santonic they used to be (some older ones are still out there) but since those are a real mess to keep updated, we stopped doing that

Ask Questions Get Answers Join the Live Community