Monitoring A/A HA status and session sync

Reply
Highlighted
Not applicable

Monitoring A/A HA status and session sync

How have folks setup automated monitoring of HA status and session sync?

We see HA instability on a 5060 A/A cluster during periods of high load.  The boxes get too busy to respond to HA messages, lose heartbeat, and start to think the links have failed. Normally, they recover automatically when the load decreases.  But this weekend, we had a case where HA1 heartbeat did not return, and session sync failed as a result.

But I can't figure out what to poll.  There's nothing about HA in SNMP.  So I'm looking at the CLI/API.   "show high-availability state" seems to display the configuration, not the status.  And it doesn't give specific status info for the each HA link, heartbeats, or sync.  "show high-availability state-synchronization" looks promising, but I can't tell if it's reporting configuration or status.

Ross

L6 Presenter

Re: Monitoring A/A HA status and session sync

Hi...Did you setup HA heartbeat backup where heartbeats are sent over the mgmt interface in addition to HA1?  That will help when the dataplane is under high load.  Thanks.

Not applicable

Re: Monitoring A/A HA status and session sync

Yes - we have HA heartbeat backup configured.  It doesn't really help - or at least, it doesn't help enough.

L6 Presenter

Re: Monitoring A/A HA status and session sync

Here is a CLI command that you can use to see the hello timeouts and failures.  Another method is to configure the system log to forward to snmp and/or syslog and monitor for HA heartbeat events from your snmp/syslog console.  Thanks.

admin@PA-5060(active)> show high-availability control-link statistics

Group 1:

  Mode: Active-Passive

  Control Link Statistics:

    HA1:

      Messages-TX               : 23004

      Messages-RX               : 22973

      Capability-Msg-TX         : 17

      Capability-Msg-RX         : 17

      Error-Msg-TX              : 5

      Error-Msg-RX              : 1

      Preempt-Msg-TX            : 0

      Preempt-Msg-RX            : 0

      Preempt-Ack-Msg-TX        : 0

      Preempt-Ack-Msg-RX        : 0

      Primary-Msg-TX            : 7

      Primary-Msg-RX            : 7

      Primary-Ack-Msg-TX        : 7

      Primary-Ack-Msg-RX        : 7

      Hello-Msg-TX              : 22954

      Hello-Msg-RX              : 22927

     Hello-Timeouts            : 0

     Hello-Failures            : 0

L3 Networker

Re: Monitoring A/A HA status and session sync

I setup email alerts for Critical events. That way when one occurs, you get notified of what happened. Since I have over 5 HA pairs, i set it up on the Panorama.

Here is what the emails looks like:

Subject: SYSTEM ALERT : critical : HA Group 1: Moved from state Active to state Non-Functional

Body:

domain: 1
receive_time: 2015/01/07 15:22:52
serial:
seqno: 155559
actionflags: 0x8000000000000000
type: SYSTEM
subtype: general
config_ver: 0
time_generated: 2015/01/07 15:22:50
vsys:
eventid: general
object:
fmt: 0
id: 0
module: general
severity: critical
opaque: Chassis Master Alarm: HA-event

Not applicable

Re: Monitoring A/A HA status and session sync

That looks potentially promising - we could pull those counters into our monitoring system and alert on their incrementing.

Unfortunately I can't get the data from the API.  If I call "<show><high-availability><control-link><statistics></statistics></control-link></high-availability></show>", the response looks like this:

<response status="success">

  <result>

    <enabled>yes</enabled>

    <group>

      <mode>Active-Active</mode>

      <control-stats/>

    </group>

  </result>

</response>

So there is some data there, but we can't get at it programmatically.

Not applicable

Re: Monitoring A/A HA status and session sync

I was hoping to get something we could feed into our monitoring system, so that our NOC will be alerted automatically.

We have the emails setup, and they are helpful when you're looking at email - but it doesn't really scale for automated monitoring.

L3 Networker

Re: Monitoring A/A HA status and session sync

I also have this in addition and the way I do it is by exporting all logs from the Panorama to the log management system and then setup a custom alert from that system. Any log manager or SEIM should be able to accomplish this.

Not applicable

Re: Monitoring A/A HA status and session sync

I guess there's no other option - seems like there's no other way to get at this data.  Shame its so hard to collect some simple counters!

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!

The Live Community thanks you for your participation!