How have folks setup automated monitoring of HA status and session sync?
We see HA instability on a 5060 A/A cluster during periods of high load. The boxes get too busy to respond to HA messages, lose heartbeat, and start to think the links have failed. Normally, they recover automatically when the load decreases. But this weekend, we had a case where HA1 heartbeat did not return, and session sync failed as a result.
But I can't figure out what to poll. There's nothing about HA in SNMP. So I'm looking at the CLI/API. "show high-availability state" seems to display the configuration, not the status. And it doesn't give specific status info for the each HA link, heartbeats, or sync. "show high-availability state-synchronization" looks promising, but I can't tell if it's reporting configuration or status.
Here is a CLI command that you can use to see the hello timeouts and failures. Another method is to configure the system log to forward to snmp and/or syslog and monitor for HA heartbeat events from your snmp/syslog console. Thanks.
admin@PA-5060(active)> show high-availability control-link statistics
Control Link Statistics:
Messages-TX : 23004
Messages-RX : 22973
Capability-Msg-TX : 17
Capability-Msg-RX : 17
Error-Msg-TX : 5
Error-Msg-RX : 1
Preempt-Msg-TX : 0
Preempt-Msg-RX : 0
Preempt-Ack-Msg-TX : 0
Preempt-Ack-Msg-RX : 0
Primary-Msg-TX : 7
Primary-Msg-RX : 7
Primary-Ack-Msg-TX : 7
Primary-Ack-Msg-RX : 7
Hello-Msg-TX : 22954
Hello-Msg-RX : 22927
Hello-Timeouts : 0
Hello-Failures : 0
I setup email alerts for Critical events. That way when one occurs, you get notified of what happened. Since I have over 5 HA pairs, i set it up on the Panorama.
Here is what the emails looks like:
Subject: SYSTEM ALERT : critical : HA Group 1: Moved from state Active to state Non-Functional
receive_time: 2015/01/07 15:22:52
time_generated: 2015/01/07 15:22:50
opaque: Chassis Master Alarm: HA-event
That looks potentially promising - we could pull those counters into our monitoring system and alert on their incrementing.
Unfortunately I can't get the data from the API. If I call "<show><high-availability><control-link><statistics></statistics></control-link></high-availability></show>", the response looks like this:
So there is some data there, but we can't get at it programmatically.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!