I wathched strange logs.
Hence I would like to get your help.
Please look at the following logs.
PA missed 4 pings then restaring connection. So HA1 interface was down during 1 second.
I remember take-over just happen after missed 4 pings.
Probably, I have never seen the log for restarting connection.
Was something on default behavior changed?
OR I do not know something about HA.
Please let me know it.
2016-05-07 00:01:34.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 1 ping timeouts out of 3 (ha1)
2016-05-07 00:01:35.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 2 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 3 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:763): We have missed 4 pings from the peer for group 1 (ha1), restarting connection
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: HA1 connection down
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_error(src/ha_peer.c:1641): Group 1 (HA1-MAIN): Sending errro message
flags : 0x2 (close:)
err code : Heartbeat ping failure (16)
num tlvs : 1
Printing out 1 tlvs
TLV: type 5 (ERR_STRING); len 23; value:
48656172 74626561 74207069 6e672066 61696c75 726500
2016-05-07 00:01:36.765 +0900 Error: ha_peer_disconnect(src/ha_peer.c:1776): Group 1 (HA1-MAIN): peer connection error msg set: Heartbeat ping failure
2016-05-07 00:01:36.765 +0900 Group 1 (HA1-BKUP): new primary (error), going away from NONE
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Control link running on HA1-Backup connection
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_primary(src/ha_peer.c:5292): Group 1 (HA1-BKUP): Sending primary message
These messages in themselves are 'normal', but they do indicate there is an issue
The HA1 heartbeat was lost for 4 consecutive instances which causes an error state, the cause of this could be several different issues like a faulty network cable, an issue on the switch or router the HA1 is connected to or a remote issue on the HA peer. To make sure you should also verify the logs on the remote HA peer to see if did not have any problems at the time (like a process was restarted or something else happened)
to prevent the cluster from failing over it's recommended to enable HA1 backup, which can also be accomplished on the management interface if you do not want to dedicate an interface to this. the management interface will only serve as a backup heartbeat to prevent a 'split-brain' where a dedicated interface could perform all the HA1 tasks
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!