What causes an HA Primary to go into Suspended State?

Reply
L1 Bithead

What causes an HA Primary to go into Suspended State?

I've setup an HA Pair with Primary Priority of 10 and a secondary of 20 (both with Pre-empt enabled).

The Primary keeps going into Suspended state.. what would cause this?

Tags (1)
L6 Presenter

Without looking at the logs I would just be guessing. Do you see any message in the system logs that would indicate an obvious problem on the primary?

In any event you should probably open a case with support to investigate this issue.

-Benjamin

L1 Bithead

Here's the log entries.. not much to go on!

02/14 12:47:33

ha

critical

state-change

HA Group 1: moved from state Non-Functional to   state Suspended

02/14 12:47:33

ha

critical

preempt-loop

HA Group 1: going to suspended state due to   detection of a preemption loop after 3 loops

L4 Transporter

Hi - This link will help describe the scenario:

https://live.paloaltonetworks.com/docs/DOC-1142

Are you seeing this?

Thanks

James

L1 Bithead

Hi James


I did find that link, but what does "non-functional" mean? an interface down, or the whole device going down?

I've also got link monitoring set on the primary, but not the secondary. Would that be causing issues? I thought the primary config would sync to the secondary, but it doesn't.

L4 Transporter

Hi,

This is a good doc:

https://live.paloaltonetworks.com/docs/DOC-1656

Useful exerts are:

Non-functional: Error sate due to data plane crash or monitor failure

Non-functional loop
A non-functional loop is when both devices in an HA pair have link or path monitoring failures that are not detectible while in non-functional state. This happens when the link state on passive device is set to shutdown in layer 3 mode. The link state on the passive device is always shutdown in vwire and layer2 deployments. If device in HA cluster starts in active state, detects a link or path down and it changes state to non-functional. The peer device at this time will go active. The non- functional device will remain in this state for monitor-fail-holddown time and change state to passive. The active device upon seeing the peer device as passive will change to non-functional because of the link failure. At this point, if monitoring fails again, the device gets into a loop to repeat the active ->non-functional ->passive->active transitions. This state transitions are referred to as flaps. The device will remain in the suspended state even if the link or path connectivity is restored. The default number of flaps is 3. A value of “0” means infinite flaps. The maximum number of flaps defined will have to happen within 15 minutes after which the device enters suspended state. Once the device enters the suspended state, it requires user intervention to transition to functional state. This is accomplished by using the operational command “request high-availability state functional “

Not all parameters are synchronised in HA - HA settings themselves are not synchronised, since some items need to be different on each device.

Thanks

James

Not applicable

Thanks James.. That doc is spot-on.

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!