PA-5250 Raid Integrity Check

IMTechSupport · ‎07-25-2022

Hi everyone,

A good day to all!

I encountered the following when upgrading the physical PA-5250 Firewalls from 10.0.10-h1 to 10.1.0 and from 10.1.0 to 10.1.5-h2.
Such that the Log Quota is reflected 0MB and there were no logs there were displayed (E.g. System Logs). When we enter the command ‘show system raid status’ on CLI, we saw that the spare disk was in a to be repaired state. Only when it displayed ‘Done checking the integrity of the RAID log details’ and started rebuilding then the logs start to appear and values were reflected on the Log Quota.

Is this a known issue for PA-5250 model as we upgraded the PA-3220 as well but there isn’t any occurrence of the above mentioned issue.
As far as I know PA-3220 will not display any RAID-related info as this model does not support raid.I suspect this as a FSCK. The firewalls are being managed by Panorama (PAN-PRA-25 ).

Can someone enlightened me what happened? Does traffic passes through the firewall if it's performing such Integrity checks?

Cheers,

Renz

PavelK · ‎03-01-2023

Hello @fabianmartinez

the behavior you are seeing is expected. Could you please refer to below KBs:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sZwNCAU

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000wkxPCAQ

Until disk rebuilt is completed, the auto-commit will not finish and until the auto-commit is completed, you will see data plane interfaces as down.

Regarding the upgrade to 10.1.8-h2, you do not have to install 10.1.0. You can download this image, then download and upgrade to 10.1.8-h2:

https://docs.paloaltonetworks.com/pan-os/11-0/pan-os-upgrade/upgrade-pan-os/upgrade-the-firewall-pan...

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

View solution in original post

MP18 · ‎12-10-2023

Hi Team,

I myself ran into same issue while upgrading PA 5220 in HA while one PA was running 10.1.3.

I am waiting how long this RAID process will take to complete.

Seems it will be a long day today.

Also you can go to the Root with TAC help and run the command to check how much RAID rebuild time is remaining.

Regards

MP

Help the community: Like helpful comments and mark solutions.

View solution in original post

fabianmartinez · ‎03-01-2023

I'm seeing the same thing. I'm upgrading from 10.0.10-h1 to 10.1.0 and from 10.1.0 to 10.1.8-h2. When the gui loads I'm seeing "RAID log disks check in progress, please wait..."

I ran the command that you provided and seeing the same output:

Logging Drives RAID status
--------------------------------------------------------------------------------
Disk Pair Log Unavailable
Status Admin disabled
Disk id Log1 Present
model : ST2000NX0253
size : 1907729 MB
status : active sync
Disk id Log2 Present
model : ST2000NX0253
size : 1907729 MB
status : spare rebuilding

I'm upgrading physical 5220 so it's probably related to this hardware model. How long did it take for your firewall to come back up? I can login to the GUI/CLI but my interfaces are down.

Thanks

PavelK · ‎03-01-2023

Hello @fabianmartinez

the behavior you are seeing is expected. Could you please refer to below KBs:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sZwNCAU

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000wkxPCAQ

Until disk rebuilt is completed, the auto-commit will not finish and until the auto-commit is completed, you will see data plane interfaces as down.

Regarding the upgrade to 10.1.8-h2, you do not have to install 10.1.0. You can download this image, then download and upgrade to 10.1.8-h2:

https://docs.paloaltonetworks.com/pan-os/11-0/pan-os-upgrade/upgrade-pan-os/upgrade-the-firewall-pan...

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

fabianmartinez · ‎03-01-2023

Thank you for replying Pavel. Yes, it took about an hour for my firewall to finish the Raid repair. Once it's completed I got the following message on the CLI:

Broadcast message from root (Thu Mar 2 03:15:37 2023):

Done checking the integrity of the RAID log disks.

I'm upgrading my secondary now, but should be all good. I only installed the base image of 10.1.0 and didn't download it.

Thank you,

Fabian

PaoloMartires · ‎06-01-2023

Hi Everyone,

We upgraded our PA-5250 in HA, and experience spare rebuilding process.

1.) Upgrade FW2 to 10.0.11-h1
2.) Failover traffic from FW2 to FW1
3.) Upgrade FW1 to 10.0.11-h1
4.) Upgrade FW1 to 10.1.9-h3
5.) Encounter "auto commit error" on FW1
6.) Check raid detail on FW1, seen disk pair logs unavailable, status spare rebuilding
7.) Also check raid detail on FW2, seen status spare rebuilding

Can someone please enlighten me, because we also encountered the raid process on the FW2 which is on 10.0.11-h1, and as per the KB we'll encounter this on version 10.1.4 or greater. Also is this rebuild process a one time thing? or is there a possibility to also encounter it again upon upgrading to later versions such as 10.2.X or higher?

Thankyou and regards,
Paolo

${userLoginName} · ‎10-24-2023

I would like to know if there is any way to check in advance if this raid rebuild will happen when upgrading? Since the node will be unavailable for 1-3 hours, it raises a risk in case something happens with the network infrastructure where the other node is active.

Chris_O · ‎10-26-2023

I requested this same information from Palo Support but was there is no way to verify in advance if this situation will occur. If anyone is aware of any options to either prevent or reduce this impact, I would love to hear the steps you took.

Chris_O · ‎10-26-2023

One more request. Palo Support ran the command (tail follow yes mp-log raid.log) that provides a status of the RAID repair. Does anyone have documentation on how many Passes there are on the drives, including on what each Pass does? I have been on Pass 1 for 60 minutes as I write this post.

^user2@PaloFirewall> tail follow yes mp-log raid.log
Oct 26 05:43:22 INFO: md2 : active raid1 sda2[0] sdb2[1]
Oct 26 05:43:22 INFO: 40009792 blocks [2/2] [UU]
Oct 26 05:43:22 INFO: sd detected: /dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sda4 /dev/sda5 /dev/sda6 /dev/sda7 /dev/sda8 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6 /dev/sdb7 /dev/sdb8 /dev/sdc /dev/sdc1 /dev/sdd /dev/sdd1
Oct 26 05:43:22 INFO: md detected: /dev/md1 /dev/md2 /dev/md3 /dev/md5 /dev/md6 /dev/md7 /dev/md8 /dev/md9
Oct 26 05:43:23 DEBUG: /dev/md9 has not been mounted
Oct 26 05:43:23 DEBUG: sanitize_md: md_array[9]: ['sdc', 'sdd']; md_array_unchecked[9]: ['sdc', 'sdd']
Oct 26 05:43:23 DEBUG: start fsck check for /dev/md9 ...
Oct 26 05:43:23 ERROR: e2fsck 1.45.6 (20-Mar-2020)
Oct 26 05:43:26 DEBUG: Log has been mounted 8 times without being checked, check forced.
Oct 26 05:43:26 DEBUG: Pass 1: Checking inodes, blocks, and sizes

Bhavik · ‎10-28-2023

Hi Chris,

as i write message, even we are impacted by this RAID check..

i got below URL of Pass checks

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000kGs2CAEhttps://knowledgeba...

we encountered this issue when upgrading 1 firewall in cluster of PA5260 from 10.0.11-h2 to 10.1.10-h1 -> it tooks 1hr 30mins for RAID checks to complete, post which firewall was UP.

and again we encountered this issue when upgrading another firewall in cluster of PA5260 from 10.1.10-h2 to 10.12.5 -> RAID is still going on for more than 2hrs now, by the time i write,

any suggestion for solution will be appreciated!

Chris_O · ‎10-28-2023

Here is my experience during this past week. Unfortunately for each time you encounter the issue, you will need to wait until the rebuild is at a certain state that allows the device to function, meaning the rebuild will not be 100% complete. Two of the HA pairs had the issue going from 10.0.xx to 10.1.xx, while the last HA pair encountered this during the jump from 10.1.xx to 10.2.5. Palo support confirmed this issue can occur at any point after 10.0.4, which gives me concern for any upgrade going forward.

If you continue to run the commands 1) tail follow yes mp-log raid.log 2). show system raid detail , you will be able to monitor the status. The interfaces will recover prior to the rebuild finishing, and I believe it takes about 12 hours for the rebuild to complete. The rebuild needs to get to a specific state, but not complete, when the device will become functional.

PA-5260 HA Pair

9.1.16 >> 10.0.11-h1 ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2. ~50 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 12 minutes

10.1.10-h2 >> 10.2.5. ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

PA-5250 HA Pair

9.1.16 >> 10.0.11-h1. ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2 ~2 hours (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 12 minutes

10.1.10-h2 >> 10.2.5. ~12 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 12 minutes

PA-5220 HA Pair

9.1.16 >> 10.0.11-h1. ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2 ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected.

10.1.10-h2 >> 10.2.5. ~100 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 20 minutes

Bhavik · ‎10-28-2023

Hi Chris,

thanks for prompt response

we managed to upgrade 5260s, and we also followed same upgrade path, but we were unlucky with RAID check hits, that delayed our overall upgrade duration.

PA-5260 HA Pair runs in Active/Active Mode.

9.1.16 >> 10.0.11-h1 ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2. ~90mins due to RAID checks (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 12 minutes

10.1.10-h2 >> 10.2.5. ~13mins (for Primary) from time clicking 'Reboot' until all interfaces are green/connected. Secondary = 130 minutes due to RAID checks..

MP18 · ‎12-10-2023

Hi Team,

I myself ran into same issue while upgrading PA 5220 in HA while one PA was running 10.1.3.

I am waiting how long this RAID process will take to complete.

Seems it will be a long day today.

Also you can go to the Root with TAC help and run the command to check how much RAID rebuild time is remaining.

Regards

MP

Help the community: Like helpful comments and mark solutions.

Chris_O · ‎12-10-2023

MP18,

Hopefully the RAID build did not take the full 180 minutes.

Chris

MP18 · ‎12-11-2023

@Chris_O It took us almost around 5 hours.

MP

Help the community: Like helpful comments and mark solutions.

PMCHelpdesk · ‎01-15-2024

Is there any upgrade path to avoid this down time? I don't have an HA and potentially 5 hours of down time is just crazy.

Unlock your full community experience!

PA-5250 Raid Integrity Check

PA-5250 Raid Integrity Check

Show your appreciation!