Getting errors: Running Configuration not synchronized after retries

ZachSmith · ‎03-17-2014

Hello. I have two PA 5050's in an HA active/passive pair. I will randomly - once a week get a message "SYSTEM ALERT : critical : HA Group 1 : Running configuration not synchronized after retries. If I check on the dashboard-HighAvailability - I see the config is not synchronized. If I wait a few seconds and refresh this status and the configs are synch'd. I am running version 5.0.8. I have seen this post - but it says it was fixed in 5.0.7. Any help is appreciated.

PAN-OS 5.0.7: Addressed Issues

Zach

HULK · ‎03-17-2014

Hello Zach,

1. Could you please verify ha-agent.logs for more detail information regarding this failure.

2. If you have configured/uploaded any certificates on one HA member, could you please make sure the same information has been updated on the Passive member as well.

3. As per previous support case information, 5.0.9 and 5.0.10 is not having this issue.

4. Could you please take a look on mgmtserver CPU and memory usage..?

Thanks

ZachSmith · ‎03-18-2014

Thank you for the reply.

1 - You referring to the monitor - system tab on the Passive unit? If so nothing here is helpful other than the critical alert and then about 1 minute later stating the configs have successfully sync'd

2 - Certificates are the same on both boxes

3 - we are 5.0.8. Can you point me to the documentation that says to upgrade past 5.0.8 to resolve this issue? (I want to ensure I am fighting either a config issue or a bug issue)

4 - CPU utilization on HA peer at time of issue was DP: 0 MP: 2. CPU utilization on Active at this time was DP: 12 MP: 9. And wow - just realized i'm not collecting memory information. Adding this monitor.

ZachSmith · ‎03-18-2014

So it would appear I cannot monitor the memory utilization on my 5050's running 5.0.8. If someone is aware of a way to do this let me know. I looked at the below two links and found nothing for memory/RAM utilization. I am already monitoring CPU on MP/DP.

https://live.paloaltonetworks.com/message/17485#17485

https://live.paloaltonetworks.com/docs/DOC-1744

HULK · ‎03-18-2014

Hello Sir,

1. I am not talking about system logs, Please find below the command to verify HA-agent.logs.

> less mp-log ha_agent.log ( "/" key-word to search | Shift + G to go to the end of the file)

2. I have given you the suggestion, based on previous case history ( all 3 cases the problem gets resolved after bring the firewall to 5.0.9 /5.0.10.)

Thanks

ZachSmith · ‎03-18-2014

Here are relevant logs from Active device:

Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x3
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:228): config length=193157
Mar 17 16:07:39 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:07:39 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:07:46 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:07:46 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:07:46 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:51 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:07:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x0 ()
state    : Active (5)
priority : 10
cookie   : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    64373232 61383635 64663231 65626361 32323462 37353739
    66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Mar 17 16:07:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:08:46 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:08:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 1; insync: no
Mar 17 16:08:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:08:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:08:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:09:18 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0x6f58
flags   : 0x1 (req:)
length : 81

Hello Msg
---------
flags    : 0x0 ()
state    : Passive (4)
priority : 100
cookie   : 64519
num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      64373232 61383635 64663231 65626361 32323462 37353739
      66313261 31313865 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:12:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 2; insync: yes

*********************************

Here are same logs from HA device:

Mar 17 16:07:51 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0x7136
flags   : 0x1 (req:)
length : 81

Hello Msg
---------
flags    : 0x0 ()
state    : Active (5)
priority : 10
cookie   : 64519
num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      64373232 61383635 64663231 65626361 32323462 37353739
      66313261 31313865 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_state_cfg_from_insync_to_outsync(src/ha_state_cfg.c:686): peer group 1 has changed the md5, waiting for an update
Mar 17 16:07:59 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Mar 17 16:07:59 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Out-of-Sync
Mar 17 16:07:59 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Mar 17 16:07:59 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 4; insync: no
Mar 17 16:07:59 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x4
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:228): config length=193161
Mar 17 16:09:04 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:09:04 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:09:13 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:09:13 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:09:13 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
        Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:18 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync yes; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:09:18 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x0 ()
state    : Passive (4)
priority : 100
cookie   : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    64373232 61383635 64663231 65626361 32323462 37353739
    66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Mar 17 16:10:13 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition

Unlock your full community experience!

Getting errors: Running Configuration not synchronized after retries

Getting errors: Running Configuration not synchronized after retries

Show your appreciation!