HA1 interface was down during 1 second by restarting connection after heartbeat failure.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

HA1 interface was down during 1 second by restarting connection after heartbeat failure.

L4 Transporter

Hello,

 

I wathched strange logs.

Hence I would like to get your help.

 

Please look at the following logs.

PA missed 4 pings then restaring connection. So HA1 interface was down during 1 second.

I remember take-over just happen after missed 4 pings.

Probably, I have never seen the log for restarting connection.

Was something on default behavior changed?

OR I do not know something about HA.

 

Please let me know it.

 

 

2016-05-07 00:01:34.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 1 ping timeouts out of 3 (ha1)
2016-05-07 00:01:35.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 2 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 3 ping timeouts out of 3 (ha1)
2016-05-07 00:01:36.763 +0900 Error: ha_ping_peer_miss(src/ha_ping.c:763): We have missed 4 pings from the peer for group 1 (ha1), restarting connection
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: HA1 connection down
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_error(src/ha_peer.c:1641): Group 1 (HA1-MAIN): Sending errro message

Error Msg
---------
flags : 0x2 (close:)
err code : Heartbeat ping failure (16)
num tlvs : 1
Printing out 1 tlvs
TLV[1]: type 5 (ERR_STRING); len 23; value:
48656172 74626561 74207069 6e672066 61696c75 726500

2016-05-07 00:01:36.765 +0900 Error: ha_peer_disconnect(src/ha_peer.c:1776): Group 1 (HA1-MAIN): peer connection error msg set: Heartbeat ping failure
2016-05-07 00:01:36.765 +0900 Group 1 (HA1-BKUP): new primary (error), going away from NONE
2016-05-07 00:01:36.765 +0900 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Control link running on HA1-Backup connection
2016-05-07 00:01:36.765 +0900 debug: ha_peer_send_primary(src/ha_peer.c:5292): Group 1 (HA1-BKUP): Sending primary message

2 REPLIES 2

L6 Presenter

Hi...Can you share the HA1 configuration settings.  Do you have HA1 heartbeat backup enabled?  How about HA1 backup link? 

Cyber Elite
Cyber Elite

Hi

 

These messages in themselves are 'normal', but they do indicate there is an issue

The HA1 heartbeat was lost for 4 consecutive instances which causes an error state, the cause of this could be several different issues like a faulty network cable, an issue on the switch or router the HA1 is connected to or a remote issue on the HA peer. To make sure you should also verify the logs on the remote HA peer to see if did not have any problems at the time (like a process was restarted or something else happened)

 

to prevent the cluster from failing over it's recommended to enable HA1 backup, which can also be accomplished on the management interface if you do not want to dedicate an interface to this. the management interface will only serve as a backup heartbeat to prevent a 'split-brain' where a dedicated interface could perform all the HA1 tasks

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization
  • 4584 Views
  • 2 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!