Getting errors: Running Configuration not synchronized after retries

Reply
Highlighted
Not applicable

Getting errors: Running Configuration not synchronized after retries

Hello.  I have two PA 5050's in an HA active/passive pair.  I will randomly - once a week get a message "SYSTEM ALERT : critical : HA Group 1 : Running configuration not synchronized after retries.  If I check on the dashboard-HighAvailability - I see the config is not synchronized.  If I wait a few seconds and refresh this status and the configs are synch'd.  I am running version 5.0.8.  I have seen this post - but it says it was fixed in 5.0.7.  Any help is appreciated.

PAN-OS 5.0.7: Addressed Issues

Zach

L7 Applicator

Re: Getting errors: Running Configuration not synchronized after retries

Hello Zach,

1. Could you please verify ha-agent.logs for more detail information regarding this failure.

2. If you have configured/uploaded any certificates on one HA member, could you please make sure the same information has been updated on the Passive member as well.

3. As per previous support case information, 5.0.9 and 5.0.10 is not having this issue.

4. Could you please take a look on mgmtserver CPU and memory usage..?

Thanks

Not applicable

Re: Getting errors: Running Configuration not synchronized after retries

Thank you for the reply.

1 - You referring to the monitor - system tab on the Passive unit?  If so nothing here is helpful other than the critical alert and then about 1 minute later stating the configs have successfully sync'd

2 - Certificates are the same on both boxes

3 - we are 5.0.8.  Can you point me to the documentation that says to upgrade past 5.0.8 to resolve this issue? (I want to ensure I am fighting either a config issue or a bug issue)

4 - CPU utilization on HA peer at time of issue was DP: 0 MP: 2.  CPU utilization on Active at this time was DP: 12 MP: 9.  And wow - just realized i'm not collecting memory information.  Adding this monitor.

Not applicable

Re: Getting errors: Running Configuration not synchronized after retries

So it would appear I cannot monitor the memory utilization on my 5050's running 5.0.8.  If someone is aware of a way to do this let me know.  I looked at the below two links and found nothing for memory/RAM utilization.  I am already monitoring CPU on MP/DP.

https://live.paloaltonetworks.com/message/17485#17485

https://live.paloaltonetworks.com/docs/DOC-1744

L7 Applicator

Re: Getting errors: Running Configuration not synchronized after retries

Hello Sir,

1. I am not talking about system logs, Please find below the command to verify HA-agent.logs.

> less mp-log ha_agent.log   ( "/"  key-word to search | Shift + G to go to the end of the file)

2. I have given you the suggestion, based on previous case history ( all 3 cases the problem gets resolved after bring the firewall  to 5.0.9 /5.0.10.)


Thanks


Not applicable

Re: Getting errors: Running Configuration not synchronized after retries

Here are relevant logs from Active device:

Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x3
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:228): config length=193157
Mar 17 16:07:39 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:07:39 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:07:46 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:07:46 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:07:46 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
        Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:51 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:07:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x0 ()
state    : Active (5)
priority : 10
cookie   : 64519
num tlvs : 2
  Printing out 2 tlvs
  TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    64373232 61383635 64663231 65626361 32323462 37353739
    66313261 31313865 00
  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Mar 17 16:07:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:08:46 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:08:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 1; insync: no
Mar 17 16:08:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:08:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:08:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:09:18 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0x6f58
flags   : 0x1 (req:)
length  : 81

  Hello Msg
  ---------
  flags    : 0x0 ()
  state    : Passive (4)
  priority : 100
  cookie   : 64519
  num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      64373232 61383635 64663231 65626361 32323462 37353739
      66313261 31313865 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:12:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 2; insync: yes

*********************************

Here are same logs from HA device:

Mar 17 16:07:51 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0x7136
flags   : 0x1 (req:)
length  : 81

  Hello Msg
  ---------
  flags    : 0x0 ()
  state    : Active (5)
  priority : 10
  cookie   : 64519
  num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      64373232 61383635 64663231 65626361 32323462 37353739
      66313261 31313865 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_state_cfg_from_insync_to_outsync(src/ha_state_cfg.c:686): peer group 1 has changed the md5, waiting for an update
Mar 17 16:07:59 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Mar 17 16:07:59 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Out-of-Sync
Mar 17 16:07:59 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Mar 17 16:07:59 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 4; insync: no
Mar 17 16:07:59 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x4
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:228): config length=193161
Mar 17 16:09:04 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:09:04 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:09:13 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:09:13 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:09:13 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
        Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:18 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync yes; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:09:18 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x0 ()
state    : Passive (4)
priority : 100
cookie   : 64519
num tlvs : 2
  Printing out 2 tlvs
  TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    64373232 61383635 64663231 65626361 32323462 37353739
    66313261 31313865 00
  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Mar 17 16:10:13 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!

The Live Community thanks you for your participation!