PA-5250 Raid Integrity Check

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.
Palo Alto Networks Approved
Palo Alto Networks Approved
Community Expert Verified
Community Expert Verified

PA-5250 Raid Integrity Check

L2 Linker

Hi everyone,

 

A good day to all! 

 

I encountered the following when upgrading the physical PA-5250 Firewalls from 10.0.10-h1 to 10.1.0 and from 10.1.0 to 10.1.5-h2.
Such that the Log Quota is reflected 0MB and there were no logs there were displayed (E.g. System Logs). When we enter the command ‘show system raid status’ on CLI, we saw that the spare disk was in a to be repaired state. Only when it displayed ‘Done checking the integrity of the RAID log details’ and started rebuilding then the logs start to appear and values were reflected on the Log Quota.

Is this a known issue for PA-5250 model as we upgraded the PA-3220 as well but there isn’t any occurrence of the above mentioned issue.
As far as I know PA-3220 will not display any RAID-related info as this model does not support raid.I suspect this as a FSCK. The firewalls are being managed by Panorama (PAN-PRA-25 ).

 

Can someone enlightened me what happened? Does traffic passes through the firewall if it's performing such Integrity checks? 

 

Cheers,

Renz

 

2 accepted solutions

Accepted Solutions

Cyber Elite
Cyber Elite

Hello @fabianmartinez

 

the behavior you are seeing is expected. Could you please refer to below KBs:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sZwNCAU

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000wkxPCAQ

 

Until disk rebuilt is completed, the auto-commit will not finish and until the auto-commit is completed, you will see data plane interfaces as down.

 

Regarding the upgrade to 10.1.8-h2, you do not have to install 10.1.0. You can download this image, then download and upgrade to 10.1.8-h2:

https://docs.paloaltonetworks.com/pan-os/11-0/pan-os-upgrade/upgrade-pan-os/upgrade-the-firewall-pan...

 

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

View solution in original post

Hi Team,

 

I myself ran into same issue while upgrading PA 5220 in HA while one PA was running 10.1.3.

I am waiting how long this RAID process will take to complete.

 

Seems it will be a long day today.

 

Also you can go to the Root with TAC help and run the command to check how much RAID rebuild time is remaining.

 

Regards

MP

Help the community: Like helpful comments and mark solutions.

View solution in original post

15 REPLIES 15

L1 Bithead

I'm seeing the same thing. I'm upgrading from 10.0.10-h1 to 10.1.0 and from 10.1.0 to 10.1.8-h2. When the gui loads I'm seeing "RAID log disks check in progress, please wait..."

 

I ran the command that you provided and seeing the same output:

 


Logging Drives RAID status
--------------------------------------------------------------------------------
Disk Pair Log Unavailable
Status Admin disabled
Disk id Log1 Present
model : ST2000NX0253
size : 1907729 MB
status : active sync
Disk id Log2 Present
model : ST2000NX0253
size : 1907729 MB
status : spare rebuilding

 

I'm upgrading physical 5220 so it's probably related to this hardware model. How long did it take for your firewall to come back up? I can login to the GUI/CLI but my interfaces are down.

 

Thanks

 

 

Cyber Elite
Cyber Elite

Hello @fabianmartinez

 

the behavior you are seeing is expected. Could you please refer to below KBs:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sZwNCAU

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000wkxPCAQ

 

Until disk rebuilt is completed, the auto-commit will not finish and until the auto-commit is completed, you will see data plane interfaces as down.

 

Regarding the upgrade to 10.1.8-h2, you do not have to install 10.1.0. You can download this image, then download and upgrade to 10.1.8-h2:

https://docs.paloaltonetworks.com/pan-os/11-0/pan-os-upgrade/upgrade-pan-os/upgrade-the-firewall-pan...

 

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

L1 Bithead

Thank you for replying Pavel. Yes, it took about an hour for my firewall to finish the Raid repair. Once it's completed I got the following message on the CLI:


Broadcast message from root (Thu Mar 2 03:15:37 2023):

Done checking the integrity of the RAID log disks.

 

I'm upgrading my secondary now, but should be all good. I only installed the base image of 10.1.0  and didn't download it.

 

Thank you,

Fabian

L1 Bithead

Hi Everyone,

We upgraded our PA-5250 in HA,  and experience spare rebuilding process. 

1.) Upgrade FW2 to 10.0.11-h1
2.) Failover traffic from FW2 to FW1
3.) Upgrade FW1 to 10.0.11-h1
4.) Upgrade FW1 to 10.1.9-h3
5.) Encounter "auto commit error" on FW1
6.) Check raid detail on FW1, seen disk pair logs unavailable, status spare rebuilding
7.) Also check raid detail on FW2, seen status spare rebuilding

Can someone please enlighten me, because we also encountered the raid process on the FW2 which is on 10.0.11-h1, and as per the KB we'll encounter this on version 10.1.4 or greater. Also is this rebuild process a one time thing? or is there a possibility to also encounter it again upon upgrading to later versions such as 10.2.X or higher?

Thankyou and regards,
Paolo

   

I would like to know if there is any way to check in advance if this raid rebuild will happen when upgrading? Since the node will be unavailable for 1-3 hours, it raises a risk in case something happens with the network infrastructure where the other node is active.

I requested this same information from Palo Support but was there is no way to verify in advance if this situation will occur.  If anyone is aware of any options to either prevent or reduce this impact, I would love to hear the steps you took.

L1 Bithead

One more request.  Palo Support ran the command (tail follow yes mp-log raid.log) that provides a status of the RAID repair.  Does anyone have documentation on how many Passes there are on the drives, including on what each Pass does?   I have been on Pass 1 for 60 minutes as I write this post.

 

^user2@PaloFirewall> tail follow yes mp-log raid.log
Oct 26 05:43:22 INFO: md2 : active raid1 sda2[0] sdb2[1]
Oct 26 05:43:22 INFO: 40009792 blocks [2/2] [UU]
Oct 26 05:43:22 INFO: sd detected: /dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sda4 /dev/sda5 /dev/sda6 /dev/sda7 /dev/sda8 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6 /dev/sdb7 /dev/sdb8 /dev/sdc /dev/sdc1 /dev/sdd /dev/sdd1
Oct 26 05:43:22 INFO: md detected: /dev/md1 /dev/md2 /dev/md3 /dev/md5 /dev/md6 /dev/md7 /dev/md8 /dev/md9
Oct 26 05:43:23 DEBUG: /dev/md9 has not been mounted
Oct 26 05:43:23 DEBUG: sanitize_md: md_array[9]: ['sdc', 'sdd']; md_array_unchecked[9]: ['sdc', 'sdd']
Oct 26 05:43:23 DEBUG: start fsck check for /dev/md9 ...
Oct 26 05:43:23 ERROR: e2fsck 1.45.6 (20-Mar-2020)
Oct 26 05:43:26 DEBUG: Log has been mounted 8 times without being checked, check forced.
Oct 26 05:43:26 DEBUG: Pass 1: Checking inodes, blocks, and sizes

Hi Chris,

 

as i write message, even we are impacted by this RAID check..

i got below URL of Pass checks

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000kGs2CAEhttps://knowledgeba...

we encountered this issue when upgrading  1 firewall in cluster of PA5260 from 10.0.11-h2 to 10.1.10-h1 -> it tooks 1hr 30mins for RAID checks to complete, post which firewall was UP.

and again we encountered this issue when upgrading  another firewall in cluster of PA5260 from 10.1.10-h2 to 10.12.5 -> RAID is still going on for more than 2hrs now, by the time i write, 

any suggestion for solution will be appreciated!

 

 

L1 Bithead

Here is my experience during this past week.  Unfortunately for each time you encounter the issue, you will need to wait until the rebuild is at a certain state that allows the device to function, meaning the rebuild will not be 100% complete.  Two of the HA pairs had the issue going from 10.0.xx to 10.1.xx, while the last HA pair encountered this during the jump from 10.1.xx to 10.2.5. Palo support confirmed this issue can occur at any point after 10.0.4, which gives me concern for any upgrade going forward.

 

If you continue to run the commands 1) tail follow yes mp-log raid.log    2). show system raid detail     , you will be able to monitor the status.  The interfaces will recover prior to the rebuild finishing, and I believe it takes about 12 hours for the rebuild to complete.  The rebuild needs to get to a specific state, but not complete, when the device will become functional. 

 

PA-5260 HA Pair

9.1.16 >> 10.0.11-h1   ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2. ~50 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 12 minutes

10.1.10-h2 >> 10.2.5.  ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

 

PA-5250 HA Pair

9.1.16 >> 10.0.11-h1.  ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2  ~2 hours (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 12 minutes

10.1.10-h2 >> 10.2.5. ~12 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 12 minutes

 

PA-5220 HA Pair

9.1.16 >> 10.0.11-h1.  ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2  ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected. 

10.1.10-h2 >> 10.2.5.  ~100 minutes (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 20 minutes

L1 Bithead

Hi Chris,

 

thanks for prompt response

 

we managed to upgrade 5260s, and we  also followed same upgrade path, but we were unlucky with RAID check hits, that delayed our overall upgrade duration.

PA-5260 HA Pair runs in Active/Active Mode.

 

9.1.16 >> 10.0.11-h1   ~12 minutes (per device) from time clicking 'Reboot' until all interfaces are green/connected

10.0.11-h1 >> 10.1.10-h2. ~90mins due to RAID checks (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 12 minutes

10.1.10-h2 >> 10.2.5.  ~13mins (for Primary) from time clicking 'Reboot' until all interfaces are green/connected.  Secondary = 130 minutes due to RAID checks..

 

Hi Team,

 

I myself ran into same issue while upgrading PA 5220 in HA while one PA was running 10.1.3.

I am waiting how long this RAID process will take to complete.

 

Seems it will be a long day today.

 

Also you can go to the Root with TAC help and run the command to check how much RAID rebuild time is remaining.

 

Regards

MP

Help the community: Like helpful comments and mark solutions.

MP18,

 

Hopefully the RAID build did not take the full 180 minutes.

 

Chris

@Chris_O   It took us almost around 5 hours.

MP

Help the community: Like helpful comments and mark solutions.

L1 Bithead

Is there any upgrade path to avoid this down time? I don't have an HA and potentially 5 hours of down time is just crazy.

  • 2 accepted solutions
  • 11799 Views
  • 15 replies
  • 1 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!