Procedure to Replace failed M-500 in Hybrid Panorama - Log Collector

Printer Friendly Page

Procedure applies for PANOS versions

 8.0 and below

 

Scenario

Standalone M-500 Panorama in Hybrid mode ( Panorama device management and local Log Collector configured )  faced a hardware issue that requires chassis replacement.

The M-500 uses 8 disk pairs for storing the logs received from managed devices.


Naming convention

Faulty M-500 device to be replaced will be called  "Old-M-500".
Newly received replacement device will be called "New-M-500".

You can use any name desired in your environment. These names are used for easier understanding of operations in the procedure.

 

Requirements

In order to replace the faulty chassis Old-M-500 we need to have the configuration saved, so that we can import it in the New-M-500.
Configuration can be exported by following the procedure in this Live article:.


How to Back Up Panorama
or by following the administrator manual:
Export Panorama and Firewall Configurations

The Old-M-500 has 8 disk pairs that will be moved to the New-M-500.

 

Procedure details

1) Power down the failed M-500 platform - Old-M-500.

 

Shutdown Panorama Link


2) Configure the New-M-500 in Panorama mode. Import the configuration exported from the faulty device.

  • Import Old-M-500 exported configuration in the New-M-500.
  • Load the named imported configuration into the New-M-500.
  • Modify the Hostname from Old-M-500 to New-M-500.

 

Commit the configuration to Panorama.

3) Take the Primary disks from Old-M-500 ( A1, B1, C1, D1, E1, F1, G1, H1) and move them to the same Primary positions in New-M-500 ( A1, B1, C1, D1, E1, F1, G1, H1) .

 

Check M-500 Hardware documentation for correct identification of disks.

M-500 Hardware Guide

 

The picture below shows the physical positioning of the drives inside the M-500 devices.

 

Screen Shot 2017-03-20 at 20.13.15.png

 

On New-M-500 we are going to add the Primary Log disks to RAID using CLI commands. 


We must use "force" and "no-format" option.
Force option associates the disk pair that is previously associated with another Log Collector.
The option “no-format” keeps the logs by not formatting the disk storage.

In this step we are going to add the Primary log disks only.

 

Secondary Log Disks will be added towards the end of the procedure.

This is done as the Secondary Log Disks are used as data backup and we do not want to use them until the Migration of logs is confirmed.

 

In our example we have 8 Active RAID pairs ( A, B, C, D, E, F, G, H ).


The full list of commands to attach the 8 primary 8 disks is:

admin@New-M-500> request system raid add A1 force no-format

admin@New-M-500> request system raid add B1 force no-format

admin@New-M-500> request system raid add C1 force no-format

admin@New-M-500> request system raid add D1 force no-format

admin@New-M-500> request system raid add E1 force no-format

admin@New-M-500> request system raid add F1 force no-format

admin@New-M-500> request system raid add G1 force no-format

admin@New-M-500> request system raid add H1 force no-format


4) Check the disk adding status by verifying the status and RAID status:

 

> show system raid detail


Example: Output for 8 primary disks inserted after the adding operation ends:

admin@New-M-500> show system raid detail

Disk Pair A                           Available
   Status                       clean, degraded
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Missing
Disk Pair B                           Available
   Status                       clean, degraded
   Disk id B1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id B2                           Missing


....


Disk Pair G                           Available
   Status                       clean, degraded
   Disk id G1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id G2                           Missing
Disk Pair H                           Available
   Status                       clean, degraded
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Missing


To follow the state of the addition you can check the Management Plane raid.log debug logs through CLI:
   

 >  tail lines 120 mp-log raid.log

 

This commands shows the last 120 lines that contain all the logs necessary to check the disk operations.
 

Mar 20 00:01:37 DEBUG: raid_util: argv: ['GetArrayId', 'A1']

Mar 20 00:01:37 DEBUG: raid_util: argv: ['Add', 'A1', 'force', 'no-format', 'verify']

Mar 20 00:01:37 DEBUG: Verifying drive A1 to be added.

Mar 20 00:01:37 DEBUG: create_md 1, sdb

Mar 20 00:01:38 DEBUG: raid_util: argv: ['Add', 'A1', 'force', 'no-format']

Mar 20 00:01:38 INFO: Adding drive A1 (sdb)

Mar 20 00:01:38 DEBUG: create_md 1, sdb

Mar 20 00:01:38 DEBUG: create_md_paired_drive 1, sdb, no_format=True

Mar 20 00:01:38 DEBUG: Mounting Disk Pair A (/dev/md1)

Mar 20 00:01:38 DEBUG: set_drive_pairing_one 1

Mar 20 00:01:38 INFO: New Disk Pair A detected.

Mar 20 00:01:38 DEBUG: Created Disk Pair A (/dev/md1) from A1 (/dev/sdb1)

Mar 20 00:01:38 INFO: Done Adding drive A1



...


Mar 20 00:02:41 DEBUG: raid_util: argv: ['GetArrayId', 'H1']

Mar 20 00:02:41 DEBUG: raid_util: argv: ['Add', 'H1', 'force', 'no-format', 'verify']

Mar 20 00:02:41 DEBUG: Verifying drive H1 to be added.

Mar 20 00:02:41 DEBUG: create_md 8, sdp

Mar 20 00:02:41 DEBUG: raid_util: argv: ['Add', 'H1', 'force', 'no-format']

Mar 20 00:02:41 INFO: Adding drive H1 (sdp)

Mar 20 00:02:41 DEBUG: create_md 8, sdp

Mar 20 00:02:41 DEBUG: create_md_paired_drive 8, sdp, no_format=True

Mar 20 00:02:42 DEBUG: Mounting Disk Pair H (/dev/md8)

Mar 20 00:02:42 DEBUG: set_drive_pairing_one 8

Mar 20 00:02:42 INFO: New Disk Pair H detected.

Mar 20 00:02:42 DEBUG: Created Disk Pair H (/dev/md8) from H1 (/dev/sdp1)

Mar 20 00:02:42 INFO: Done Adding drive H1


5)  Next step is to regenerate the Log Disks' Metadata for each RAID disk slot.
Note:  This command can take a long time to finish depending on the data size stored on the disks, because the command rebuilds all the log indexes.

 

> request metadata-regenerate slot 1
> request metadata-regenerate slot 2
> request metadata-regenerate slot 3
> request metadata-regenerate slot 4
> request metadata-regenerate slot 5
> request metadata-regenerate slot 6
> request metadata-regenerate slot 7
> request metadata-regenerate slot 8


Sample Output:

 

Bringing down vld: vld-0-0

Process 'vld-0-0' executing STOP

Removing old metadata from /opt/pancfg/mgmt/vld/vld-0

Process 'vld-0-0' executing START

Done generating metadata for LD:1

....

admin@New-M-500> request metadata-regenerate slot 8

Bringing down vld: vld-7-0

Process 'vld-7-0' executing STOP

Removing old metadata from /opt/pancfg/mgmt/vld/vld-7

Process 'vld-7-0' executing START

Done generating metadata for LD:8


You can check the status of the metadata regeneration by opening a new CLI window and running the command to follow the debug log file vldmgr.log:

 

> tail lines 100 follow yes mp-log vldmgr.log


This commands shows the last 100 lines and then follows the logfile vldmgr.log:

Sample output:

2017-03-19 23:38:42.836 -0700 sysd send 'stop LD:1 became unavailable' to 'vld-0-0' vldmgr:vldmgr
2017-03-19 23:38:43.185 -0700 Error:  _process_fd_event(pan_vld_mgr.c:2113): connection failed on fd:13 for cs:vld-0-0
2017-03-19 23:38:43.185 -0700 Sending to MS new status for slot 0, vldid 1280: not online
2017-03-19 23:38:43.185 -0700 setting LD refcount in var:runtime.ld-refcount.LD1 to 0. create:false
2017-03-19 23:38:46.186 -0700 vldmgr vldmgr diskinfo cb from sysd

....

2017-03-20 00:20:56.792 -0700 setting LD refcount in var:runtime.ld-refcount.LD7 to 2. create:false
2017-03-20 00:20:56.792 -0700 Sending to MS new status for slot 6, vldid 1286: online
2017-03-20 00:20:56.905 -0700 connection failed for err 111 with vld-7-0. Will start retry 3 in 2000
2017-03-20 00:20:58.907 -0700 connection failed for err 111 with vld-7-0. Will start retry 4 in 2000
2017-03-20 00:21:00.908 -0700 Connection to vld-7-0 established
2017-03-20 00:21:00.908 -0700 connect(2) succeeded on fd:20 for cs:vld-7-0
2017-03-20 00:21:00.908 -0700 setting LD refcount in var:runtime.ld-refcount.LD8 to 2. create:false
2017-03-20 00:21:00.908 -0700 Sending to MS new status for slot 7, vldid 1287: online


6) On the New-M-500 add a new Local Collector.
Click add under Panorama > Managed Collectors to add a new Collector. Under the General tab, enter the serial
number of the New-M-500 device that we are moving the disks to.

( Visual example can be found below.  )

We will add the disks to the New-M-500 Log Collector in a later step.

add empty log collector.png

 

7) Check the status of the new Log Collector. Check for following things in the output of the command:
a. Connected status should display “yes”
b. Disk capacity should display the correct size
c. Disk pair will display as “Disabled” but this is expected behavior at this stage in the RMA process

 

>  show log-collector serial-number <serial-number-of-New-M-500>


Sample output:

 

> show log-collector serial-number 007307000539 

Serial           CID      Hostname           Connected    Config Status    SW Version         IPv4 - IPv6                                                     

---------------------------------------------------------------------------------------------------------

007307000539     0        M-500_LAB          yes          Out of Sync      7.1.7              10.193.81.241 - unknown

Redistribution status:       none

Last commit-all: commit succeeded, >>>>>>>>current ring version 0<<<<<<<<

md5sum  updated at ?

Raid disks

DiskPair A: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair B: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair C: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair D: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair E: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair F: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair G: Disabled,  Status: Present/Available,  Capacity: 870 GB

DiskPair H: Disabled,  Status: Present/Available,  Capacity: 870 GB


8) Add the disks to the New-M-500 Log collector configuration:
Panorama > Managed Collectors
Click on the name of the Log Collector (Eg. New-M-500)
Click on the tab Disks
Add all the disks that were moved to the New-M-500 device.  ( Eg. A,B,C,D,E,F,G,H)

 

add disks to new LC1.png

 

added disks to new LC.png

 

9) On New-M-500 add the new Local Log Collector that we have created to the existing Log Collector Group that the Old-M-500 was a part of,  in this example the Old-M-500 log collector was part of the "default" Collector Group.

Add the New-M-500 Log collector where the Old-M-500 Log collector was present:

 

Add new lc to LCG.png

 

 


10) Delete the failed Log Collector from the Collector Group.

On WebGUI go to Panorama > Collector Group
Select the Collector Group name where the New-M-500 is configured.

In the Collector Group popup select the tab "Device Log Forwarding".
Delete all references of the serial number of the failed Old-M-500.

 

clear old lc from lcg.png


11) Issue a Panorama Commit only.

 

Screen Shot 2017-03-20 at 21.05.07.png


12) Issue Commit to Collector Group.

Screen Shot 2017-03-20 at 21.05.30.png


13. Check that the old logs are visible.

 

Check that logs are present.png

 

 

14. Add spare disks to RAID, so that we rebuild full RAID redundancy will log migration already confirmed.
Physically move disks from Old-M-500  A2, B2, C2, D2, E2, F2, G2, H2  to the New-M-500 A2, B2, C2, D2, E2, F2, G2, H2.

 

Check that the disks are available to be added to RAID:

 

> show system raid detail 


The newly added disks will be in the state "Present" and status "Not in use".


admin@New-M-500> show system raid detail


Sample Output:
Disk Pair A                           Available
   Status                       clean, degraded
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : not in use


....


Disk Pair H                           Available
   Status                       clean, degraded
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : not in use
       

 

15. Add the secondary disks (A2,B2,C2,D2,E2,F2,G2,H2) to RAID using the command:

 

> request system raid add A2 force

> request system raid add B2 force
> request system raid add C2 force
> request system raid add D2 force
> request system raid add E2 force
> request system raid add F2 force
> request system raid add G2 force
> request system raid add H2 force

 

Note:  Executing this command may delete all data on the drive being added. Do you want to continue? (y or n)
Press the key "y" to accept.

After these commands the RAID goes to "Spare Rebuild" operation. 

Please note that this may be a lengthy operation and it will run in the background until it ends.
During this time logging to the Log Collector Group will be on hold.
Once operation is over the log forwarding to the New-M-500 will resume.

You can check the status of the operation running the command:

> show system raid detail

Sample Output:

> show system raid detail 

Disk Pair A                           Available
   Status     clean, degraded, recovering (2% complete)
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : spare rebuilding

 



....


Disk Pair H                           Available
   Status     clean, degraded, recovering (0% complete)
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : spare rebuilding
       


16. Once the Spare rebuild operation is finished the New-M-500 is in fully operational state and the RMA process is done.

Comments

How long it will take RAID rebuild process to complete? 

hi @sarumughan

 

this could take a few hours depending on the disk size

you can track the progress with the below command:

 

> show system raid detail
Disk Pair A                                                            Available 
Status                        clean,degraded, recovering (14% complete)
Disk id A1                                                             Present 
  model                     : ST91000640NS 
  size                      : 953869 MB 
  status                    : active sync 
Disk id A2                                                             Present 
  model                     : ST91000640NS 
  size                      : 953869 MB 
  status                    : spare rebuilding