HA1 interface was down during 1 second by restarting connection after heartbeat failure.

KiCheon.Lee · ‎05-10-2016

Hello,

I wathched strange logs.

Hence I would like to get your help.

Please look at the following logs.

PA missed 4 pings then restaring connection. So HA1 interface was down during 1 second.

I remember take-over just happen after missed 4 pings.

Probably, I have never seen the log for restarting connection.

Was something on default behavior changed?

OR I do not know something about HA.

Please let me know it.

2016-05-07 00:01:34.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 1 ping timeouts out of 3 (ha1)
2016-05-07 00:01:35.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 2 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 3 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:763): We have missed 4 pings from the peer for group 1 (ha1), restarting connection
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: HA1 connection down
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_error(src/ha_peer.c:1641): Group 1 (HA1-MAIN): Sending errro message

Error Msg
---------
flags : 0x2 (close:)
err code : Heartbeat ping failure (16)
num tlvs : 1
Printing out 1 tlvs
TLV[1]: type 5 (ERR_STRING); len 23; value:
48656172 74626561 74207069 6e672066 61696c75 726500

2016-05-07 00:01:36.765 +0900 Error: ha_peer_disconnect(src/ha_peer.c:1776): Group 1 (HA1-MAIN): peer connection error msg set: Heartbeat ping failure
2016-05-07 00:01:36.765 +0900 Group 1 (HA1-BKUP): new primary (error), going away from NONE
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Control link running on HA1-Backup connection
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_primary(src/ha_peer.c:5292): Group 1 (HA1-BKUP): Sending primary message

rmonvon · ‎05-10-2016

Hi...Can you share the HA1 configuration settings. Do you have HA1 heartbeat backup enabled? How about HA1 backup link?

reaper · ‎05-11-2016

Hi

These messages in themselves are 'normal', but they do indicate there is an issue

The HA1 heartbeat was lost for 4 consecutive instances which causes an error state, the cause of this could be several different issues like a faulty network cable, an issue on the switch or router the HA1 is connected to or a remote issue on the HA peer. To make sure you should also verify the logs on the remote HA peer to see if did not have any problems at the time (like a process was restarted or something else happened)

to prevent the cluster from failing over it's recommended to enable HA1 backup, which can also be accomplished on the management interface if you do not want to dedicate an interface to this. the management interface will only serve as a backup heartbeat to prevent a 'split-brain' where a dedicated interface could perform all the HA1 tasks

Tom Piens
PANgurus - Strata & Prisma Access specialist

Unlock your full community experience!

HA1 interface was down during 1 second by restarting connection after heartbeat failure.

HA1 interface was down during 1 second by restarting connection after heartbeat failure.

Show your appreciation!