PAN-VM HA Link Group Monitoring Issue

cancel
Showing results for 
Search instead for 
Did you mean: 

PAN-VM HA Link Group Monitoring Issue

Not applicable

Hi,

I have a pair of PAN-VM in active/passive mode and configured link group monitoring with four member ports and when I disconnect one of the ports from vSphere the failover happens quickly and marks the node as "non-functional (Link down)" but when I connect back the port the status does not change and failback not happening unless I remove the HA link group from the passive node. Any idea what may be wrong?

I am using version 6.0.4

Thanks,

Saeed

12 REPLIES 12

Hello Saeed,

As per my observation: The Active device (A--IP-10.101.200.70) is configured with Priority 50 and Passive device(B--IP-10.101.200.71) configured with priority 100. Preemtion has enabled on both firewalls.

Hence, once firewall B will become active and the monitored link (SP-IF-MON) came UP on firewall A, the FW A should automatically become Active without any manual intervention.

A--IP-10.101.200.70

B--IP-10.101.200.71

Logs from firewall A:

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 up

2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from down to up>>>>>>>>>>>>>>> link group  came UP on firewall A

As per expectation, the FW A became Active:

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active >>>>>>>>>>>>>>>> Going to Active state

But, at the same time we observed that HA-1 link became DOWN and the monitor interface went down again :

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/4' is down

2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' failure; one or more links are down >>>>>>>>>>> Link DOWN

2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Non-Functional  >>>>>>>>>>>>> The firewall went into non-functional state.

Suggestion: According to the current HA configuration, you have set failure condition as "any". Could you please change it to "all" and perform the same test.

Failure condition: any >>>>>>>>>>

Group SP-IF-MON:

link-monitoring.jpg

Hope this helps.

Thanks

Hi Hulk,

Thanks for your time to analyse the logs. Once the active firewall goes into non-functional mode it will not negotiate any HA with its peer to become active even if network has restored from failure unless I disable the SP-IF-MON monitor and my requirement is that even if a single link fails just failover to the other peer and this is to address an incident that I recently had and setting it to "all" will have no value in my scenario .

Cheers,

Saeed

Hi,

I am using the default 1 min and in one of the tests I waited 30 minutes and no change! the only cure is disable monitor.

Regards,

Saeed

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!