PAN-VM HA Link Group Monitoring Issue

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PAN-VM HA Link Group Monitoring Issue

Not applicable

Hi,

I have a pair of PAN-VM in active/passive mode and configured link group monitoring with four member ports and when I disconnect one of the ports from vSphere the failover happens quickly and marks the node as "non-functional (Link down)" but when I connect back the port the status does not change and failback not happening unless I remove the HA link group from the passive node. Any idea what may be wrong?

I am using version 6.0.4

Thanks,

Saeed

12 REPLIES 12

L5 Sessionator

Hi saeed.pazoki

Could you please confirm if you preempt enabled under election settings ? Without it I don't think the old active node will take over even if you connect back the port.

Thanks

Hi,

Yes its enabled and that's why when I remove the link group it falls back to the node with lower priority. The HA failover/fallback has been working as expected for the past few months since setup and its just that I am improving the HA function and its not behaving as expected.

Thanks,

Saeed

Could you please let us know the priority for Active and passive FW ...?

Device Priority— A priority value to identify the active firewall. The firewall with the lower value (higher priority) becomes the active firewall (range 0-255) when the preemptive capability is enabled on both firewalls in the pair.

Thanks

Hi,

One is set to 50 and the other 100 and preemtion is enabled as you can see below and the one with priority 50 becomes active when healthy:

Group 1: xxx

  Mode: Active-Passive

  Local Information:

    Version: 1

    Mode: Active-Passive

    State: active (last 8 minutes)

    Last non-functional state reason: Link down

    Device Information:

      Management IPv4 Address: xxx

      Management IPv6 Address:

      Jumbo-Frames disabled; MTU 1500

    HA1 Control Links Joint Configuration:

      Encryption Enabled: no

    Election Option Information:

      Priority: 50

      Preemptive: yes

    Version Compatibility:

      Software Version: Match

      Application Content Compatibility: Match

      Threat Content Compatibility: Match

      Anti-Virus Compatibility: Match

      VPN Client Software Compatibility: Match

      Global Protect Client Software Compatibility: Match

  Peer Information:

    Connection status: up

    Version: 1

    Mode: Active-Passive

    State: passive (last 1 minutes)

    Last non-functional state reason: Link down

    Device Information:

      Management IPv4 Address: xxx

      Management IPv6 Address:

      Jumbo-Frames disabled; MTU 1500

      Connection up; Primary HA1 link

    Election Option Information:

      Priority: 100

      Preemptive: yes

  Configuration Synchronization:

    Enabled: yes

    Running Configuration: synchronized

To get to know the exact reason behind this behavior, we may need to analyze the HA-agent logs:

> less mp-log ha_agent.log  ( enter)

Press Shift + G

Thanks

Hi Hulk,

please check the following logs:

-----------------------------------------------------------------

16:51 enable "SP-IF-MON" link group and commit

16:53 VM interface change via vSphere client (change to isolated vlan to simulate network problem)

16:54 Revert vSphere interface change (no failback so proceed to next step to make failback happen)

16:55 Disable "SP-IF-MON" link group

-----------------------------------------------------------------

2014-08-26 16:51:16.098 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:194): ha_agent: cfg agent received flags from server

2014-08-26 16:51:16.098 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:198): new flags=0x3

2014-08-26 16:51:16.098 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:219): ha_agent: cfg agent received configuration from server

2014-08-26 16:51:16.098 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:235): config length=15491

2014-08-26 16:51:16.098 +1000 debug: ha_cfgagent_phase1(src/ha_cfgagent.c:549): start

2014-08-26 16:51:16.098 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:491): start

2014-08-26 16:51:16.099 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:520): sending back true for p1done

2014-08-26 16:51:21.337 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2693): Starting monitor increase holdup on phase2 start

2014-08-26 16:51:21.337 +1000 debug: ha_state_start_increase_monitor_holdup(src/ha_state.c:1181): Starting monitor holdup increase during commit

2014-08-26 16:51:21.410 +1000 debug: ha_cfgagent_phase2(src/ha_cfgagent.c:723): start

2014-08-26 16:51:21.410 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:673): start

2014-08-26 16:51:21.411 +1000 debug: ha_state_eval_peer(src/ha_state.c:3150): Group 1: Goto active automatically

2014-08-26 16:51:21.411 +1000 debug: ha_state_config_change(src/ha_state.c:567): Group 1: Config change request to move group to Active state

2014-08-26 16:51:21.411 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active

2014-08-26 16:51:21.411 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active

2014-08-26 16:51:21.411 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting

2014-08-26 16:51:21.411 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored

        Ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:51:21.411 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Active (5)

priority : 50

cookie   : 2789

num tlvs : 2

  Printing out 2 tlvs

  TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

    00000001

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:51:21.411 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:51:21.411 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:699): sending back true for p2done

2014-08-26 16:51:21.411 +1000 debug: ha_state_monitor_hold_callback(src/ha_state.c:2106): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:51:24.638 +1000 debug: ha_state_cfg_md5_set(src/ha_state_cfg.c:480): We were in sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no; pre-old-insync yes; pre-new-insync yes

2014-08-26 16:51:24.638 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Active (5)

priority : 50

cookie   : 2789

num tlvs : 3

  Printing out 3 tlvs

  TLV[1]: type 62 (CONFIG_MD5_PRE); len 33; value:

    38666163 30376537 33626265 62666139 38633466 61376565

    35343763 39656461 00

  TLV[2]: type 2 (CONFIG_MD5SUM); len 33; value:

    31613835 64353131 31613964 63643636 33663735 30613766

    36396134 32656635 00

  TLV[3]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:51:25.699 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2685): Ending monitor increase holdup on commit end

2014-08-26 16:51:25.699 +1000 debug: ha_state_stop_increase_monitor_holdup(src/ha_state.c:1203): Ending monitor holdup increase after commit in 60 seconds

2014-08-26 16:53:07.730 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:53:07.730 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07.730 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/4' is down

2014-08-26 16:53:07.731 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' failure; one or more links are down

2014-08-26 16:53:07.731 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Non-Functional

2014-08-26 16:53:07.731 +1000 debug: ha_state_start_monitor_holdup(src/ha_state.c:2475): Starting monitor holdup for group 1: 500ms

2014-08-26 16:53:07.733 +1000 debug: ha_state_monitor_holdup_callback(src/ha_state.c:2557): Going to Non-Functional state state

2014-08-26 16:53:07.733 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Non-Functional

2014-08-26 16:53:07.733 +1000 debug: ha_state_stop_loop_hold(src/ha_state.c:2678): Group 1: clearing preempt loop hold

2014-08-26 16:53:07.733 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Active to state Non-Functional

2014-08-26 16:53:07.733 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Non-Functional

2014-08-26 16:53:07.733 +1000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1326): Set dev alarm to on

2014-08-26 16:53:07.733 +1000 debug: ha_state_move_degraded(src/ha_state.c:1693): Group 1: Non-functional loop count updated to 1

2014-08-26 16:53:07.733 +1000 debug: ha_state_check_nonfunc_hold(src/ha_state.c:2599): No non-func hold based on no path failures with passive list state set to auto

2014-08-26 16:53:07.733 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Non-Functional (2)

priority : 50

cookie   : 2789

num tlvs : 3

  Printing out 3 tlvs

  TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

    00000001

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

  TLV[3]: type 50 (STATE_REASON); len 10; value:

    4c696e6b 20646f77 6e00

2014-08-26 16:53:07.733 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:53:07.733 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:53:07.733 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:53:07.741 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr

-------

version : 1

groupID : 1

type    : Hello (2)

token   : 0xe10a

flags   : 0x1 (req:)

length  : 52

  Hello Msg

  ---------

  flags    : 0x1 (preempt:)

  state    : Active (5)

  priority : 100

  cookie   : 2789

  num tlvs : 2

    Printing out 2 tlvs

    TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

      00000001

    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

      00000000

2014-08-26 16:53:07.741 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true

2014-08-26 16:53:07.741 +1000 debug: ha_sysd_dev_peer_state_update(src/ha_sysd.c:1387): Set dev peer state to Active

2014-08-26 16:53:07.767 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 up

2014-08-26 16:53:07.818 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down

2014-08-26 16:53:07.818 +1000 Group 1: Link 'ethernet1/1' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07.819 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/1' is down

2014-08-26 16:53:07.848 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up

2014-08-26 16:53:07.912 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down

2014-08-26 16:53:07.912 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07.912 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/2' is down

2014-08-26 16:53:07.918 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up

2014-08-26 16:53:07.963 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down

2014-08-26 16:53:07.963 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07.963 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/3' is down

2014-08-26 16:53:07.981 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:53:08.027 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/5 down

2014-08-26 16:53:08.028 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/6 down

2014-08-26 16:53:08.029 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/7 down

2014-08-26 16:53:08.030 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/8 down

2014-08-26 16:53:08.031 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/9 down

2014-08-26 16:53:08.032 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/10 down

2014-08-26 16:53:08.033 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/11 down

2014-08-26 16:53:08.034 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/12 down

2014-08-26 16:53:08.035 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/13 down

2014-08-26 16:53:08.036 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/14 down

2014-08-26 16:53:08.037 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/15 down

2014-08-26 16:53:08.038 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/16 down

2014-08-26 16:53:08.039 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/17 down

2014-08-26 16:53:08.039 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/18 down

2014-08-26 16:53:08.040 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/19 down

2014-08-26 16:53:08.041 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/20 down

2014-08-26 16:53:08.042 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/21 down

2014-08-26 16:53:08.043 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/22 down

2014-08-26 16:53:08.044 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/23 down

2014-08-26 16:53:08.045 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/24 down

2014-08-26 16:55:10.948 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:194): ha_agent: cfg agent received flags from server

2014-08-26 16:55:10.948 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:198): new flags=0x3

2014-08-26 16:55:10.948 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:219): ha_agent: cfg agent received configuration from server

2014-08-26 16:55:10.948 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:235): config length=15490

2014-08-26 16:55:10.948 +1000 debug: ha_cfgagent_phase1(src/ha_cfgagent.c:549): start

2014-08-26 16:55:10.948 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:491): start

2014-08-26 16:55:10.949 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:520): sending back true for p1done

2014-08-26 16:55:16.126 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2693): Starting monitor increase holdup on phase2 start

2014-08-26 16:55:16.126 +1000 debug: ha_state_start_increase_monitor_holdup(src/ha_state.c:1181): Starting monitor holdup increase during commit

2014-08-26 16:55:16.188 +1000 debug: ha_cfgagent_phase2(src/ha_cfgagent.c:723): start

2014-08-26 16:55:16.188 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:673): start

2014-08-26 16:55:16.189 +1000 debug: ha_state_config_change(src/ha_state.c:567): Group 1: Config change request to move group to Passive state

2014-08-26 16:55:16.189 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Passive

2014-08-26 16:55:16.189 +1000 debug: ha_state_start_rt_sync_hold(src/ha_state.c:2023): Group 1: starting runtime state sync hold (0)

2014-08-26 16:55:16.189 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:699): sending back true for p2done

2014-08-26 16:55:16.189 +1000 debug: ha_state_rt_sync_hold_callback(src/ha_state.c:2052): Group 1: ending runtime state sync hold

2014-08-26 16:55:16.189 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Non-Functional to Passive

2014-08-26 16:55:16.189 +1000 HA Group 1: Moved from state Non-Functional to state Passive

2014-08-26 16:55:16.189 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Passive

2014-08-26 16:55:16.189 +1000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1326): Set dev alarm to off

2014-08-26 16:55:16.189 +1000 debug: ha_state_clear_monitor_log_history(src/ha_state.c:3702): Clearing all monitoring log history

2014-08-26 16:55:16.189 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored

        Ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:55:16.189 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Passive (4)

priority : 50

cookie   : 2789

num tlvs : 2

  Printing out 2 tlvs

  TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

    00000001

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:55:16.189 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:55:16.189 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:55:16.189 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:55:16.204 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr

-------

version : 1

groupID : 1

type    : Hello (2)

token   : 0xe11b

flags   : 0x1 (req:)

length  : 52

  Hello Msg

  ---------

  flags    : 0x1 (preempt:)

  state    : Active (5)

  priority : 100

  cookie   : 2789

  num tlvs : 2

    Printing out 2 tlvs

    TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

      00000001

    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

      00000000

2014-08-26 16:55:16.204 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true

2014-08-26 16:55:19.479 +1000 debug: ha_state_cfg_md5_set(src/ha_state_cfg.c:480): We were in sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no; pre-old-insync yes; pre-new-insync yes

2014-08-26 16:55:19.479 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Passive (4)

priority : 50

cookie   : 2789

num tlvs : 3

  Printing out 3 tlvs

  TLV[1]: type 62 (CONFIG_MD5_PRE); len 33; value:

    38666163 30376537 33626265 62666139 38633466 61376565

    35343763 39656461 00

  TLV[2]: type 2 (CONFIG_MD5SUM); len 33; value:

    31613835 64353131 31613964 63643636 33663735 30613766

    36396134 32656635 00

  TLV[3]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:55:20.620 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2685): Ending monitor increase holdup on commit end

2014-08-26 16:55:20.620 +1000 debug: ha_state_stop_increase_monitor_holdup(src/ha_state.c:1203): Ending monitor holdup increase after commit in 60 seconds

2014-08-26 16:56:16.189 +1000 debug: ha_state_monitor_hold_callback(src/ha_state.c:2106): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:56:16.190 +1000 debug: ha_peer_recv_preempt(src/ha_peer.c:5070): Group 1 (HA1-MAIN): Receiving preempt message

Msg Hdr

-------

version : 1

groupID : 1

type    : Preempt (4)

token   : 0xe123

flags   : 0x1 (req:)

length  : 24

  Preempt Msg

  -----------

  flags    : 0x1 (preempt:)

  priority : 100

  num tlvs : 0

2014-08-26 16:56:16.190 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active

2014-08-26 16:56:16.190 +1000 debug: ha_state_start_promotion_hold(src/ha_state.c:2144): Group 1: starting promotion hold (0 ms)

2014-08-26 16:56:16.190 +1000 Group 1: Received preempt message for us to take over which we accepted

2014-08-26 16:56:16.190 +1000 debug: ha_peer_send_preempt_ack(src/ha_peer.c:428): Group 1 (HA1-MAIN): Sending preempt ack message

Ack Msg

-------

accept   : 1

status   : 0

num tlvs : 0

2014-08-26 16:56:16.190 +1000 debug: ha_state_promotion_hold_callback(src/ha_state.c:2201): Group 1: ending promotion hold

2014-08-26 16:56:16.190 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored

        Ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:56:16.190 +1000 debug: ha_state_promotion_hold_callback(src/ha_state.c:2236): Group 1: moving to Active due to preemption

2014-08-26 16:56:16.190 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Passive to Active

2014-08-26 16:56:16.190 +1000 Warning:  ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Passive to state Active

2014-08-26 16:56:16.190 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Active

2014-08-26 16:56:16.190 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting

2014-08-26 16:56:16.190 +1000 debug: ha_state_start_loop_hold(src/ha_state.c:2733): Group 1: starting preempt loop hold

2014-08-26 16:56:16.190 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored

        Ignoring link and path monitoring failures due to an HA state transition

2014-08-26 16:56:16.190 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Active (5)

priority : 50

cookie   : 2789

num tlvs : 2

  Printing out 2 tlvs

  TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

    00000001

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:56:16.190 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:56:16.190 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:56:16.190 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:56:16.195 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr

-------

version : 1

groupID : 1

type    : Hello (2)

token   : 0xe124

flags   : 0x1 (req:)

length  : 52

  Hello Msg

  ---------

  flags    : 0x1 (preempt:)

  state    : Passive (4)

  priority : 100

  cookie   : 2789

  num tlvs : 2

    Printing out 2 tlvs

    TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

      00000001

    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

      00000000

2014-08-26 16:56:16.195 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true

2014-08-26 16:56:16.195 +1000 debug: ha_sysd_dev_peer_state_update(src/ha_sysd.c:1387): Set dev peer state to Passive

2014-08-26 16:56:16.195 +1000 Group 1 peer state moved from Active to Passive

2014-08-26 16:56:16.195 +1000 debug: ha_state_eval_peer(src/ha_state.c:3150): Group 1: Goto active automatically

2014-08-26 16:56:16.195 +1000 debug: ha_state_peer_change(src/ha_state.c:471): Group 1: Peer change requests group move to Active state from Active

2014-08-26 16:56:16.195 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active

2014-08-26 16:56:16.195 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active

2014-08-26 16:56:16.195 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting

2014-08-26 16:56:16.195 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Active (5)

priority : 50

cookie   : 2789

num tlvs : 2

  Printing out 2 tlvs

  TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:

    00000001

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

2014-08-26 16:56:16.195 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:56:16.195 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:56:16.195 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down

2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down

2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 up

2014-08-26 16:56:16.222 +1000 Group 1: Link 'ethernet1/1' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:56:16.225 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down

2014-08-26 16:56:16.228 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down

2014-08-26 16:56:16.233 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down

2014-08-26 16:56:16.233 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down

2014-08-26 16:56:16.235 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:56:16.243 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:56:16.245 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/5 down

2014-08-26 16:56:16.247 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/6 down

2014-08-26 16:56:16.248 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/7 down

2014-08-26 16:56:16.249 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/8 down

2014-08-26 16:56:16.250 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/9 down

2014-08-26 16:56:16.251 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/10 down

2014-08-26 16:56:16.252 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/11 down

2014-08-26 16:56:16.253 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/12 down

2014-08-26 16:56:16.254 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/13 down

2014-08-26 16:56:16.255 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/14 down

2014-08-26 16:56:16.256 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/15 down

2014-08-26 16:56:16.257 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/16 down

2014-08-26 16:56:16.258 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/17 down

2014-08-26 16:56:16.259 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/18 down

2014-08-26 16:56:16.260 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/19 down

2014-08-26 16:56:16.261 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/20 down

2014-08-26 16:56:16.262 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/21 down

2014-08-26 16:56:16.263 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/22 down

2014-08-26 16:56:16.264 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/23 down

2014-08-26 16:56:16.265 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/24 down

2014-08-26 16:56:16.266 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no

2014-08-26 16:56:16.266 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no

2014-08-26 16:56:16.841 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no

2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up

2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up

2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 up

2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:56:24.769 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no

2014-08-26 16:56:24.777 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status Complete, force no

2014-08-26 16:56:24.777 +1000 debug: ha_state_runtime_insync(src/ha_state.c:661): Ignoring runtime insync while HA2 links down: NO

We need to check HA-agent logs from both firewalls. Do you have any case open with support for this issue...? If so, please update the case ID here.

Thanks

Hi Hulk,

thanks for the help. I just logged a case and number is 00247140.

Regards,

Saeed

Hello saeed,

I am not sure if you have checked this setting or not but, just want to put it out there. Under active/passive settings, there is something called as "Monitor hold down time". It is the length of time (minutes) that a firewall will spend in the non-functional state before becoming passive or active (depends on the priority). This timer is used only when the failure reason is a link or path monitor failure. By default it is set to 1 min. The range is from 1 to 60 min. There might be a possibility that you might have configured it to hold the non functional state for more then 1 min. Hope this helps.

Thanks

Hello Saeed,

As per my observation: The Active device (A--IP-10.101.200.70) is configured with Priority 50 and Passive device(B--IP-10.101.200.71) configured with priority 100. Preemtion has enabled on both firewalls.

Hence, once firewall B will become active and the monitored link (SP-IF-MON) came UP on firewall A, the FW A should automatically become Active without any manual intervention.

A--IP-10.101.200.70

B--IP-10.101.200.71

Logs from firewall A:

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up

2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from down to up

2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 up

2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from down to up>>>>>>>>>>>>>>> link group  came UP on firewall A

As per expectation, the FW A became Active:

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active >>>>>>>>>>>>>>>> Going to Active state

But, at the same time we observed that HA-1 link became DOWN and the monitor interface went down again :

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection

2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from up to down

2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/4' is down

2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' failure; one or more links are down >>>>>>>>>>> Link DOWN

2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Non-Functional  >>>>>>>>>>>>> The firewall went into non-functional state.

Suggestion: According to the current HA configuration, you have set failure condition as "any". Could you please change it to "all" and perform the same test.

Failure condition: any >>>>>>>>>>

Group SP-IF-MON:

link-monitoring.jpg

Hope this helps.

Thanks

Hi Hulk,

Thanks for your time to analyse the logs. Once the active firewall goes into non-functional mode it will not negotiate any HA with its peer to become active even if network has restored from failure unless I disable the SP-IF-MON monitor and my requirement is that even if a single link fails just failover to the other peer and this is to address an incident that I recently had and setting it to "all" will have no value in my scenario .

Cheers,

Saeed

Hi,

I am using the default 1 min and in one of the tests I waited 30 minutes and no change! the only cure is disable monitor.

Regards,

Saeed

  • 8636 Views
  • 12 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!