HA1 Backup link went down root cause analysis

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

HA1 Backup link went down root cause analysis

HI Team,

 

I have issue in Palo alto firewall 3260 where HA1 backup link went down. Eventhough there is no production impact i'm seeing this issue happened without any cable change or any activity.

 

This is due to ping failure for heart beat , But I want to know what caused this ping failure issue.

 

I have already running PANOS 8.1.4-h2 which says release notes that HA1 Backup port issue unexpected behaviour was fixed.

 

Below is error message

 

Error Msg
---------
flags    : 0x2 (close:)
err code : Heartbeat ping failure (16)
num tlvs : 1
  Printing out 1 tlvs
  TLV[1]: type 5 (ERR_STRING); len 23; value:
    48656172 74626561 74207069 6e672066 61696c75 726500

 

Regards

Venky

 

28 REPLIES 28

@reaper @BPry

 

Can you guys look into  below error and let me know root cause. 

 

 

hi @Venkatesan_radhakrishnan

 

How is the HA1-backup connected (over a switch, directly,..? dedicated interface, mgmt, ...?), how were the link speed and duplex set on all connecting nodes, what type of cable was used, have you tried using a different cable/port,...?

 

the ha1-b issue solved in 8.1.4-h2 is a permanently down link, not one where there are ping timeouts, you are most likely experiencing something different

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

Ya I'm experiencing something different, It is directly connected through cables via HA1 B port on both sides.

 

Link went down then I have clicked the HA1 Backup port settings selected the same interface it came up and no flap or down till date.

 

Firewall is already running 8.1.4-h2 version.

 

 

@reaper I haven't tried using different cable it staright through. Once I select the same interface which was already in use and commit the configuration it started to work again.

 

I'm not seeing any abnormalities in the firewall related to that HA port.

 

Can I able to change the link state and speed for dedicated HA ports? If so how ?

 

 

 

Hi Guys,

 

Can anyone reply on this issue

 

 

 

Spoiler
Hi Guys,

 Can anyone reply on this issue

 

 

you cant change the settings of the dedicated ports

 

if theyreconnected directly, all is left is to try replacing the cable I think. If that fails too, you'll need to reach out to support to help investigate this issue

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

I think it is not cable issue, because last event occured on Jan 29 and it is stable till date. 

 

I'm not seeing any issue after that event. But Customer is affraid it may come in future.

 

 

Did you collect a TechSupportFile at the time of the event ?

you could still have that reviewed by support to make sure

if not, you'll have to wait it out and collect one as soon as something does happen and then get in touch with support asap

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

HI Reaper,

 

I have share the error message that happened during the time of event. 

 

Which is share in my first post but i couldn't able to understand the error message.

 

 

Unfortunately this error message only indicates a heartbeat was missed

To find the cause, troubleshooting needs to be done which will likely require a techsupport file feom each peer and some debugging logs to be enabled
Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

I Have tech support file collected from both the peer during the time of issue.

 

What debug need to be done? 

You would need to lay both TSF side by side and compare their logs to see if anything interesting happens in their logs hours to minutes before the event
Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

This is what I can see during the time of issue. 

 

I can't understand error message number and stuffs .

 

2019-01-29 11:51:33.447 +0400 debug: ha_sysd_haX_link_change(src/ha_sysd.c:2221): Seeing HA1-Backup peer link down, waiting hold

2019-01-29 11:51:33.447 +0400 Warning: ha_event_log(src/ha_event.c:47): HA1-Backup peer link down

2019-01-29 11:51:34.229 +0400 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 2 ping timeouts out of 3 (ha1-backup) 2019-01-29 11:51:35.229 +0400 Error: ha_ping_peer_miss(src/ha_ping.c:756): Missed 3 ping timeouts out of 3 (ha1-backup) 2019-01-29 11:51:35.229 +0400 Error: ha_ping_peer_miss(src/ha_ping.c:763): We have missed 4 pings from the peer for group 1 (ha1-backup), restarting connection

2019-01-29 11:51:35.230 +0400 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: HA1-Backup connection down

2019-01-29 11:51:35.230 +0400 debug: ha_peer_send_error(src/ha_peer.c:1517): Group 1 (HA1-BKUP): Sending errro message Error Msg --------- flags : 0x2 (close:) err code : Heartbeat ping failure (16) num tlvs : 1 Printing out 1 tlvs TLV[1]: type 5 (ERR_STRING); len 23; value: 48656172 74626561 74207069 6e672066 61696c75 726500

2019-01-29 11:51:35.230 +0400 Error: ha_peer_disconnect(src/ha_peer.c:1652): Group 1 (HA1-BKUP): peer connection error msg set: Heartbeat ping failure

2019-01-29 11:51:35.230 +0400 debug: ha_ping_stop(src/ha_ping.c:407): Group 1: Stopping pings for ha1-backup

2019-01-29 11:51:35.230 +0400 debug: ha_ping_stop(src/ha_ping.c:407): Group 1: Stopping pings for ha1-backup

2019-01-29 11:51:35.230 +0400 debug: ha_ping_start(src/ha_ping.c:210): Group 1: Starting pings for ha1-backup

2019-01-29 11:51:35.230 +0400 debug: ha_peer_start(src/ha_peer.c:246): Group 1 (HA1-BKUP): waiting for ping response before starting connection

2019-01-29 11:51:39.195 +0400 debug: cfgagent_flags_callback(pan_cfgagent.c:226): ha_agent: cfg agent received flags from server

2019-01-29 11:51:39.195 +0400 debug: cfgagent_flags_callback(pan_cfgagent.c:230): new flags=0x4 2019-01-29 11:51:39.195 +0400 debug: cfgagent_config_callback(pan_cfgagent.c:253): ha_agent: cfg agent received configuration from server

2019-01-29 11:51:39.195 +0400 debug: cfgagent_config_callback(pan_cfgagent.c:275): config length=45594

  • 18254 Views
  • 28 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!