- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
08-25-2014 10:49 PM
Hi,
I have a pair of PAN-VM in active/passive mode and configured link group monitoring with four member ports and when I disconnect one of the ports from vSphere the failover happens quickly and marks the node as "non-functional (Link down)" but when I connect back the port the status does not change and failback not happening unless I remove the HA link group from the passive node. Any idea what may be wrong?
I am using version 6.0.4
Thanks,
Saeed
08-25-2014 10:59 PM
Hi saeed.pazoki
Could you please confirm if you preempt enabled under election settings ? Without it I don't think the old active node will take over even if you connect back the port.
Thanks
08-25-2014 11:05 PM
Hi,
Yes its enabled and that's why when I remove the link group it falls back to the node with lower priority. The HA failover/fallback has been working as expected for the past few months since setup and its just that I am improving the HA function and its not behaving as expected.
Thanks,
Saeed
08-25-2014 11:07 PM
Could you please let us know the priority for Active and passive FW ...?
Device Priority— A priority value to identify the active firewall. The firewall with the lower value (higher priority) becomes the active firewall (range 0-255) when the preemptive capability is enabled on both firewalls in the pair.
Thanks
08-25-2014 11:14 PM
Hi,
One is set to 50 and the other 100 and preemtion is enabled as you can see below and the one with priority 50 becomes active when healthy:
Group 1: xxx
Mode: Active-Passive
Local Information:
Version: 1
Mode: Active-Passive
State: active (last 8 minutes)
Last non-functional state reason: Link down
Device Information:
Management IPv4 Address: xxx
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
HA1 Control Links Joint Configuration:
Encryption Enabled: no
Election Option Information:
Priority: 50
Preemptive: yes
Version Compatibility:
Software Version: Match
Application Content Compatibility: Match
Threat Content Compatibility: Match
Anti-Virus Compatibility: Match
VPN Client Software Compatibility: Match
Global Protect Client Software Compatibility: Match
Peer Information:
Connection status: up
Version: 1
Mode: Active-Passive
State: passive (last 1 minutes)
Last non-functional state reason: Link down
Device Information:
Management IPv4 Address: xxx
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
Connection up; Primary HA1 link
Election Option Information:
Priority: 100
Preemptive: yes
Configuration Synchronization:
Enabled: yes
Running Configuration: synchronized
08-25-2014 11:22 PM
To get to know the exact reason behind this behavior, we may need to analyze the HA-agent logs:
> less mp-log ha_agent.log ( enter)
Press Shift + G
Thanks
08-26-2014 12:04 AM
Hi Hulk,
please check the following logs:
-----------------------------------------------------------------
16:51 enable "SP-IF-MON" link group and commit
16:53 VM interface change via vSphere client (change to isolated vlan to simulate network problem)
16:54 Revert vSphere interface change (no failback so proceed to next step to make failback happen)
16:55 Disable "SP-IF-MON" link group
-----------------------------------------------------------------
2014-08-26 16:51:16.098 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:194): ha_agent: cfg agent received flags from server
2014-08-26 16:51:16.098 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:198): new flags=0x3
2014-08-26 16:51:16.098 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:219): ha_agent: cfg agent received configuration from server
2014-08-26 16:51:16.098 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:235): config length=15491
2014-08-26 16:51:16.098 +1000 debug: ha_cfgagent_phase1(src/ha_cfgagent.c:549): start
2014-08-26 16:51:16.098 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:491): start
2014-08-26 16:51:16.099 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:520): sending back true for p1done
2014-08-26 16:51:21.337 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2693): Starting monitor increase holdup on phase2 start
2014-08-26 16:51:21.337 +1000 debug: ha_state_start_increase_monitor_holdup(src/ha_state.c:1181): Starting monitor holdup increase during commit
2014-08-26 16:51:21.410 +1000 debug: ha_cfgagent_phase2(src/ha_cfgagent.c:723): start
2014-08-26 16:51:21.410 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:673): start
2014-08-26 16:51:21.411 +1000 debug: ha_state_eval_peer(src/ha_state.c:3150): Group 1: Goto active automatically
2014-08-26 16:51:21.411 +1000 debug: ha_state_config_change(src/ha_state.c:567): Group 1: Config change request to move group to Active state
2014-08-26 16:51:21.411 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active
2014-08-26 16:51:21.411 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active
2014-08-26 16:51:21.411 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting
2014-08-26 16:51:21.411 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored
Ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:51:21.411 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 50
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:51:21.411 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:51:21.411 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:699): sending back true for p2done
2014-08-26 16:51:21.411 +1000 debug: ha_state_monitor_hold_callback(src/ha_state.c:2106): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:51:24.638 +1000 debug: ha_state_cfg_md5_set(src/ha_state_cfg.c:480): We were in sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no; pre-old-insync yes; pre-new-insync yes
2014-08-26 16:51:24.638 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 50
cookie : 2789
num tlvs : 3
Printing out 3 tlvs
TLV[1]: type 62 (CONFIG_MD5_PRE); len 33; value:
38666163 30376537 33626265 62666139 38633466 61376565
35343763 39656461 00
TLV[2]: type 2 (CONFIG_MD5SUM); len 33; value:
31613835 64353131 31613964 63643636 33663735 30613766
36396134 32656635 00
TLV[3]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:51:25.699 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2685): Ending monitor increase holdup on commit end
2014-08-26 16:51:25.699 +1000 debug: ha_state_stop_increase_monitor_holdup(src/ha_state.c:1203): Ending monitor holdup increase after commit in 60 seconds
2014-08-26 16:53:07.730 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down
2014-08-26 16:53:07.730 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from up to down
2014-08-26 16:53:07.730 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/4' is down
2014-08-26 16:53:07.731 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' failure; one or more links are down
2014-08-26 16:53:07.731 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Non-Functional
2014-08-26 16:53:07.731 +1000 debug: ha_state_start_monitor_holdup(src/ha_state.c:2475): Starting monitor holdup for group 1: 500ms
2014-08-26 16:53:07.733 +1000 debug: ha_state_monitor_holdup_callback(src/ha_state.c:2557): Going to Non-Functional state state
2014-08-26 16:53:07.733 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Non-Functional
2014-08-26 16:53:07.733 +1000 debug: ha_state_stop_loop_hold(src/ha_state.c:2678): Group 1: clearing preempt loop hold
2014-08-26 16:53:07.733 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Active to state Non-Functional
2014-08-26 16:53:07.733 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Non-Functional
2014-08-26 16:53:07.733 +1000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1326): Set dev alarm to on
2014-08-26 16:53:07.733 +1000 debug: ha_state_move_degraded(src/ha_state.c:1693): Group 1: Non-functional loop count updated to 1
2014-08-26 16:53:07.733 +1000 debug: ha_state_check_nonfunc_hold(src/ha_state.c:2599): No non-func hold based on no path failures with passive list state set to auto
2014-08-26 16:53:07.733 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Non-Functional (2)
priority : 50
cookie : 2789
num tlvs : 3
Printing out 3 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
TLV[3]: type 50 (STATE_REASON); len 10; value:
4c696e6b 20646f77 6e00
2014-08-26 16:53:07.733 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:53:07.733 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:53:07.733 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:53:07.741 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message
Msg Hdr
-------
version : 1
groupID : 1
type : Hello (2)
token : 0xe10a
flags : 0x1 (req:)
length : 52
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 100
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:53:07.741 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true
2014-08-26 16:53:07.741 +1000 debug: ha_sysd_dev_peer_state_update(src/ha_sysd.c:1387): Set dev peer state to Active
2014-08-26 16:53:07.767 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 up
2014-08-26 16:53:07.818 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down
2014-08-26 16:53:07.818 +1000 Group 1: Link 'ethernet1/1' in link group 'SP-IF-MON' state is going from up to down
2014-08-26 16:53:07.819 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/1' is down
2014-08-26 16:53:07.848 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up
2014-08-26 16:53:07.912 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down
2014-08-26 16:53:07.912 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from up to down
2014-08-26 16:53:07.912 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/2' is down
2014-08-26 16:53:07.918 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up
2014-08-26 16:53:07.963 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down
2014-08-26 16:53:07.963 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from up to down
2014-08-26 16:53:07.963 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/3' is down
2014-08-26 16:53:07.981 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down
2014-08-26 16:53:08.027 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/5 down
2014-08-26 16:53:08.028 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/6 down
2014-08-26 16:53:08.029 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/7 down
2014-08-26 16:53:08.030 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/8 down
2014-08-26 16:53:08.031 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/9 down
2014-08-26 16:53:08.032 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/10 down
2014-08-26 16:53:08.033 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/11 down
2014-08-26 16:53:08.034 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/12 down
2014-08-26 16:53:08.035 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/13 down
2014-08-26 16:53:08.036 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/14 down
2014-08-26 16:53:08.037 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/15 down
2014-08-26 16:53:08.038 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/16 down
2014-08-26 16:53:08.039 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/17 down
2014-08-26 16:53:08.039 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/18 down
2014-08-26 16:53:08.040 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/19 down
2014-08-26 16:53:08.041 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/20 down
2014-08-26 16:53:08.042 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/21 down
2014-08-26 16:53:08.043 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/22 down
2014-08-26 16:53:08.044 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/23 down
2014-08-26 16:53:08.045 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/24 down
2014-08-26 16:55:10.948 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:194): ha_agent: cfg agent received flags from server
2014-08-26 16:55:10.948 +1000 debug: cfgagent_flags_callback(pan_cfgagent.c:198): new flags=0x3
2014-08-26 16:55:10.948 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:219): ha_agent: cfg agent received configuration from server
2014-08-26 16:55:10.948 +1000 debug: cfgagent_config_callback(pan_cfgagent.c:235): config length=15490
2014-08-26 16:55:10.948 +1000 debug: ha_cfgagent_phase1(src/ha_cfgagent.c:549): start
2014-08-26 16:55:10.948 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:491): start
2014-08-26 16:55:10.949 +1000 debug: ha_cfgagent_phase1_callback(src/ha_cfgagent.c:520): sending back true for p1done
2014-08-26 16:55:16.126 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2693): Starting monitor increase holdup on phase2 start
2014-08-26 16:55:16.126 +1000 debug: ha_state_start_increase_monitor_holdup(src/ha_state.c:1181): Starting monitor holdup increase during commit
2014-08-26 16:55:16.188 +1000 debug: ha_cfgagent_phase2(src/ha_cfgagent.c:723): start
2014-08-26 16:55:16.188 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:673): start
2014-08-26 16:55:16.189 +1000 debug: ha_state_config_change(src/ha_state.c:567): Group 1: Config change request to move group to Passive state
2014-08-26 16:55:16.189 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Passive
2014-08-26 16:55:16.189 +1000 debug: ha_state_start_rt_sync_hold(src/ha_state.c:2023): Group 1: starting runtime state sync hold (0)
2014-08-26 16:55:16.189 +1000 debug: ha_cfgagent_phase2_callback(src/ha_cfgagent.c:699): sending back true for p2done
2014-08-26 16:55:16.189 +1000 debug: ha_state_rt_sync_hold_callback(src/ha_state.c:2052): Group 1: ending runtime state sync hold
2014-08-26 16:55:16.189 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Non-Functional to Passive
2014-08-26 16:55:16.189 +1000 HA Group 1: Moved from state Non-Functional to state Passive
2014-08-26 16:55:16.189 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Passive
2014-08-26 16:55:16.189 +1000 debug: ha_sysd_dev_alarm_update(src/ha_sysd.c:1326): Set dev alarm to off
2014-08-26 16:55:16.189 +1000 debug: ha_state_clear_monitor_log_history(src/ha_state.c:3702): Clearing all monitoring log history
2014-08-26 16:55:16.189 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored
Ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:55:16.189 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Passive (4)
priority : 50
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:55:16.189 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:55:16.189 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:55:16.189 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:55:16.204 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message
Msg Hdr
-------
version : 1
groupID : 1
type : Hello (2)
token : 0xe11b
flags : 0x1 (req:)
length : 52
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 100
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:55:16.204 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true
2014-08-26 16:55:19.479 +1000 debug: ha_state_cfg_md5_set(src/ha_state_cfg.c:480): We were in sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no; pre-old-insync yes; pre-new-insync yes
2014-08-26 16:55:19.479 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Passive (4)
priority : 50
cookie : 2789
num tlvs : 3
Printing out 3 tlvs
TLV[1]: type 62 (CONFIG_MD5_PRE); len 33; value:
38666163 30376537 33626265 62666139 38633466 61376565
35343763 39656461 00
TLV[2]: type 2 (CONFIG_MD5SUM); len 33; value:
31613835 64353131 31613964 63643636 33663735 30613766
36396134 32656635 00
TLV[3]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:55:20.620 +1000 debug: ha_sysd_config_status_notifier_callback(src/ha_sysd.c:2685): Ending monitor increase holdup on commit end
2014-08-26 16:55:20.620 +1000 debug: ha_state_stop_increase_monitor_holdup(src/ha_state.c:1203): Ending monitor holdup increase after commit in 60 seconds
2014-08-26 16:56:16.189 +1000 debug: ha_state_monitor_hold_callback(src/ha_state.c:2106): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:56:16.190 +1000 debug: ha_peer_recv_preempt(src/ha_peer.c:5070): Group 1 (HA1-MAIN): Receiving preempt message
Msg Hdr
-------
version : 1
groupID : 1
type : Preempt (4)
token : 0xe123
flags : 0x1 (req:)
length : 24
Preempt Msg
-----------
flags : 0x1 (preempt:)
priority : 100
num tlvs : 0
2014-08-26 16:56:16.190 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active
2014-08-26 16:56:16.190 +1000 debug: ha_state_start_promotion_hold(src/ha_state.c:2144): Group 1: starting promotion hold (0 ms)
2014-08-26 16:56:16.190 +1000 Group 1: Received preempt message for us to take over which we accepted
2014-08-26 16:56:16.190 +1000 debug: ha_peer_send_preempt_ack(src/ha_peer.c:428): Group 1 (HA1-MAIN): Sending preempt ack message
Ack Msg
-------
accept : 1
status : 0
num tlvs : 0
2014-08-26 16:56:16.190 +1000 debug: ha_state_promotion_hold_callback(src/ha_state.c:2201): Group 1: ending promotion hold
2014-08-26 16:56:16.190 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored
Ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:56:16.190 +1000 debug: ha_state_promotion_hold_callback(src/ha_state.c:2236): Group 1: moving to Active due to preemption
2014-08-26 16:56:16.190 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Passive to Active
2014-08-26 16:56:16.190 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Moved from state Passive to state Active
2014-08-26 16:56:16.190 +1000 debug: ha_sysd_dev_state_update(src/ha_sysd.c:1354): Set dev state to Active
2014-08-26 16:56:16.190 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting
2014-08-26 16:56:16.190 +1000 debug: ha_state_start_loop_hold(src/ha_state.c:2733): Group 1: starting preempt loop hold
2014-08-26 16:56:16.190 +1000 debug: ha_state_start_monitor_hold(src/ha_state.c:1153): Starting initial monitor hold for group 1; linkmon not monitored
Ignoring link and path monitoring failures due to an HA state transition
2014-08-26 16:56:16.190 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 50
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:56:16.190 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:56:16.190 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:56:16.190 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:56:16.195 +1000 debug: ha_peer_recv_hello(src/ha_peer.c:4953): Group 1 (HA1-MAIN): Receiving hello message
Msg Hdr
-------
version : 1
groupID : 1
type : Hello (2)
token : 0xe124
flags : 0x1 (req:)
length : 52
Hello Msg
---------
flags : 0x1 (preempt:)
state : Passive (4)
priority : 100
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:56:16.195 +1000 debug: ha_peer_recv_tlv(src/ha_peer.c:3717): Group 1 (HA1-MAIN): Cfg Sync compat from peer set to true
2014-08-26 16:56:16.195 +1000 debug: ha_sysd_dev_peer_state_update(src/ha_sysd.c:1387): Set dev peer state to Passive
2014-08-26 16:56:16.195 +1000 Group 1 peer state moved from Active to Passive
2014-08-26 16:56:16.195 +1000 debug: ha_state_eval_peer(src/ha_state.c:3150): Group 1: Goto active automatically
2014-08-26 16:56:16.195 +1000 debug: ha_state_peer_change(src/ha_state.c:471): Group 1: Peer change requests group move to Active state from Active
2014-08-26 16:56:16.195 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active
2014-08-26 16:56:16.195 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active
2014-08-26 16:56:16.195 +1000 debug: ha_state_start_preemption_hold(src/ha_state.c:2271): Group 1: no need for preemption waiting
2014-08-26 16:56:16.195 +1000 debug: ha_peer_send_hello(src/ha_peer.c:4900): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x1 (preempt:)
state : Active (5)
priority : 50
cookie : 2789
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 8 (CFG_SYNC_COMPAT); len 4; value:
00000001
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
2014-08-26 16:56:16.195 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:56:16.195 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:56:16.195 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down
2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 down
2014-08-26 16:56:16.222 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/1 up
2014-08-26 16:56:16.222 +1000 Group 1: Link 'ethernet1/1' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:56:16.225 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down
2014-08-26 16:56:16.228 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 down
2014-08-26 16:56:16.233 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down
2014-08-26 16:56:16.233 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 down
2014-08-26 16:56:16.235 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down
2014-08-26 16:56:16.243 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down
2014-08-26 16:56:16.245 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/5 down
2014-08-26 16:56:16.247 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/6 down
2014-08-26 16:56:16.248 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/7 down
2014-08-26 16:56:16.249 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/8 down
2014-08-26 16:56:16.250 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/9 down
2014-08-26 16:56:16.251 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/10 down
2014-08-26 16:56:16.252 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/11 down
2014-08-26 16:56:16.253 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/12 down
2014-08-26 16:56:16.254 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/13 down
2014-08-26 16:56:16.255 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/14 down
2014-08-26 16:56:16.256 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/15 down
2014-08-26 16:56:16.257 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/16 down
2014-08-26 16:56:16.258 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/17 down
2014-08-26 16:56:16.259 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/18 down
2014-08-26 16:56:16.260 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/19 down
2014-08-26 16:56:16.261 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/20 down
2014-08-26 16:56:16.262 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/21 down
2014-08-26 16:56:16.263 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/22 down
2014-08-26 16:56:16.264 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/23 down
2014-08-26 16:56:16.265 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/24 down
2014-08-26 16:56:16.266 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no
2014-08-26 16:56:16.266 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no
2014-08-26 16:56:16.841 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no
2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up
2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up
2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:56:17.227 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 up
2014-08-26 16:56:17.227 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:56:24.769 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status In-progress, force no
2014-08-26 16:56:24.777 +1000 debug: ha_rts_local_update(src/ha_rts.c:79): Group 1: local status User-disabled, new local status Complete, force no
2014-08-26 16:56:24.777 +1000 debug: ha_state_runtime_insync(src/ha_state.c:661): Ignoring runtime insync while HA2 links down: NO
08-26-2014 12:14 AM
We need to check HA-agent logs from both firewalls. Do you have any case open with support for this issue...? If so, please update the case ID here.
Thanks
08-26-2014 03:57 PM
Hi Hulk,
thanks for the help. I just logged a case and number is 00247140.
Regards,
Saeed
08-26-2014 07:01 PM
Hello saeed,
I am not sure if you have checked this setting or not but, just want to put it out there. Under active/passive settings, there is something called as "Monitor hold down time". It is the length of time (minutes) that a firewall will spend in the non-functional state before becoming passive or active (depends on the priority). This timer is used only when the failure reason is a link or path monitor failure. By default it is set to 1 min. The range is from 1 to 60 min. There might be a possibility that you might have configured it to hold the non functional state for more then 1 min. Hope this helps.
Thanks
08-27-2014 01:21 AM
Hello Saeed,
As per my observation: The Active device (A--IP-10.101.200.70) is configured with Priority 50 and Passive device(B--IP-10.101.200.71) configured with priority 100. Preemtion has enabled on both firewalls.
Hence, once firewall B will become active and the monitored link (SP-IF-MON) came UP on firewall A, the FW A should automatically become Active without any manual intervention.
A--IP-10.101.200.70
B--IP-10.101.200.71
Logs from firewall A:
2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/3 up
2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/3' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/2 up
2014-08-26 16:50:12 2014-08-26 16:50:12.118 +1000 Group 1: Link 'ethernet1/2' in link group 'SP-IF-MON' state is going from down to up
2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 up
2014-08-26 16:50:12 2014-08-26 16:50:12.119 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from down to up>>>>>>>>>>>>>>> link group came UP on firewall A
As per expectation, the FW A became Active:
2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Active
2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_state_move(src/ha_state.c:1386): Group 1: moving from state Active to Active >>>>>>>>>>>>>>>> Going to Active state
But, at the same time we observed that HA-1 link became DOWN and the monitor interface went down again :
2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 Group 1 (HA1-MAIN): Starting hello with timeout: 8s/0ns
2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-BKUP): can't start hello, no connection
2014-08-26 16:51:21 2014-08-26 16:51:21.411 +1000 debug: ha_peer_start_hello(src/ha_peer.c:1064): Group 1 (HA1-MGMT): can't start hello, no connection
2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 debug: ha_sysd_linkmon_link_change(src/ha_sysd.c:3916): Link 1/4 down
2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Group 1: Link 'ethernet1/4' in link group 'SP-IF-MON' state is going from up to down
2014-08-26 16:53:07 2014-08-26 16:53:07.730 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' link 'ethernet1/4' is down
2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Link group 'SP-IF-MON' failure; one or more links are down >>>>>>>>>>> Link DOWN
2014-08-26 16:53:07 2014-08-26 16:53:07.731 +1000 debug: ha_state_transition(src/ha_state.c:1301): Group 1: transition to state Non-Functional >>>>>>>>>>>>> The firewall went into non-functional state.
Suggestion: According to the current HA configuration, you have set failure condition as "any". Could you please change it to "all" and perform the same test.
Failure condition: any >>>>>>>>>>
Group SP-IF-MON:
Hope this helps.
Thanks
08-27-2014 04:04 PM
Hi Hulk,
Thanks for your time to analyse the logs. Once the active firewall goes into non-functional mode it will not negotiate any HA with its peer to become active even if network has restored from failure unless I disable the SP-IF-MON monitor and my requirement is that even if a single link fails just failover to the other peer and this is to address an incident that I recently had and setting it to "all" will have no value in my scenario .
Cheers,
Saeed
08-27-2014 04:07 PM
Hi,
I am using the default 1 min and in one of the tests I waited 30 minutes and no change! the only cure is disable monitor.
Regards,
Saeed
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!