I have a question about high availability with A-P mode.
We found out critical system log in active device for HA1 connection down but not occurred split-brain. (system log : type ha / severity critical / event connect-change / description HA Group 1: HA1 Connection down.)
Just HA 1 link is to go down and up within a few second.
Configured HA 1 value are auto speed , auto duplex and other are default value and HA1 link between both devices are direct by straight cable.
Recently, upgraded OS Version from 5.0.1 to 5.0.11.
When I used OS Version 5.0.1, never seen this problem.
What's the root cause? about monitor hold time in HA1 ?
The HA1 connection goes down when heart beats are missed over that connection, the connection going down is an attempt made by the device to recover the HA1 link.
If you have Heartbeat back up enabled then the peers would use the mgmt interface to do heart beats pings and check the peer's status.
Typical reasons for missed pings is busy management plane, since heart beats keep track of peer's management plane and if the MP is busy and does not repond to heart beats in timely fashion, peer accounts this as a missed ping.
Hope that explanation helps.
I see this also from 4.03 to 5.0.11. dpalani, your explanations don't really account for the change after upgrade. Should we conclude that 5.0.11 consumes more CPU on the MP?
"don't really account for the change after upgrade"
I would like you to explain or elaborate on that.
"Should we conclude that 5.0.11 consumes more CPU on the MP?"
No, the cpu utilization is process dependent, therefore for a better understanding you need to follow the logs of a process that is consuming significant resources and check if it is justified usage ( commit, updates, user id updates, group mapping updates etc )
I'm surprised to see this crop up. There were HA issues in 5.0.10 and the release notes for 5.0.11 note the three HA bugs fixed by this release. You may have run into a new bug with this issue.
You may want to open an ticket so it can be examined by support to confirm if this is a problem or not. And if it is get it into the bug fix chain.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!