- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
01-21-2015 11:01 AM
How have folks setup automated monitoring of HA status and session sync?
We see HA instability on a 5060 A/A cluster during periods of high load. The boxes get too busy to respond to HA messages, lose heartbeat, and start to think the links have failed. Normally, they recover automatically when the load decreases. But this weekend, we had a case where HA1 heartbeat did not return, and session sync failed as a result.
But I can't figure out what to poll. There's nothing about HA in SNMP. So I'm looking at the CLI/API. "show high-availability state" seems to display the configuration, not the status. And it doesn't give specific status info for the each HA link, heartbeats, or sync. "show high-availability state-synchronization" looks promising, but I can't tell if it's reporting configuration or status.
Ross
01-22-2015 07:20 AM
Yes - we have HA heartbeat backup configured. It doesn't really help - or at least, it doesn't help enough.
01-22-2015 01:49 PM
Here is a CLI command that you can use to see the hello timeouts and failures. Another method is to configure the system log to forward to snmp and/or syslog and monitor for HA heartbeat events from your snmp/syslog console. Thanks.
admin@PA-5060(active)> show high-availability control-link statistics
Group 1:
Mode: Active-Passive
Control Link Statistics:
HA1:
Messages-TX : 23004
Messages-RX : 22973
Capability-Msg-TX : 17
Capability-Msg-RX : 17
Error-Msg-TX : 5
Error-Msg-RX : 1
Preempt-Msg-TX : 0
Preempt-Msg-RX : 0
Preempt-Ack-Msg-TX : 0
Preempt-Ack-Msg-RX : 0
Primary-Msg-TX : 7
Primary-Msg-RX : 7
Primary-Ack-Msg-TX : 7
Primary-Ack-Msg-RX : 7
Hello-Msg-TX : 22954
Hello-Msg-RX : 22927
Hello-Timeouts : 0
Hello-Failures : 0
01-22-2015 03:04 PM
I setup email alerts for Critical events. That way when one occurs, you get notified of what happened. Since I have over 5 HA pairs, i set it up on the Panorama.
Here is what the emails looks like:
Subject: SYSTEM ALERT : critical : HA Group 1: Moved from state Active to state Non-Functional
Body:
domain: 1
receive_time: 2015/01/07 15:22:52
serial:
seqno: 155559
actionflags: 0x8000000000000000
type: SYSTEM
subtype: general
config_ver: 0
time_generated: 2015/01/07 15:22:50
vsys:
eventid: general
object:
fmt: 0
id: 0
module: general
severity: critical
opaque: Chassis Master Alarm: HA-event
01-22-2015 03:23 PM
That looks potentially promising - we could pull those counters into our monitoring system and alert on their incrementing.
Unfortunately I can't get the data from the API. If I call "<show><high-availability><control-link><statistics></statistics></control-link></high-availability></show>", the response looks like this:
<response status="success">
<result>
<enabled>yes</enabled>
<group>
<mode>Active-Active</mode>
<control-stats/>
</group>
</result>
</response>
So there is some data there, but we can't get at it programmatically.
01-22-2015 03:24 PM
I was hoping to get something we could feed into our monitoring system, so that our NOC will be alerted automatically.
We have the emails setup, and they are helpful when you're looking at email - but it doesn't really scale for automated monitoring.
01-22-2015 03:44 PM
I also have this in addition and the way I do it is by exporting all logs from the Panorama to the log management system and then setup a custom alert from that system. Any log manager or SEIM should be able to accomplish this.
01-23-2015 07:41 AM
I guess there's no other option - seems like there's no other way to get at this data. Shame its so hard to collect some simple counters!
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!