V5.0.4 HA Group1: Running configuration not synchronized after retries

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

V5.0.4 HA Group1: Running configuration not synchronized after retries

L2 Linker

This message appears (email, and SNMP trap) pretty much anytime I run a "commit" on the box. It appears cosmetic, as the GUI on both boxes show them being in synch. (possible latency/delay issues during synchsynch causing this mis-fire?)

I noted this was an issue in the 3.x version of code, did they somehow "unfix" this bug? Anyone else experiencing this error message?

1 accepted solution

Accepted Solutions

L2 Linker

Tech Support had my change the Heartbeat Interval to 2000ms from 1000ms, but that did not make any difference.

We upgraded to 5.0.5, and have not had the issue since.  I cannot say if it was a bug or not, but the upgrade is what seemed to fix our issue.

Russ

View solution in original post

7 REPLIES 7

L6 Presenter

I did not see this with 5.0.4 but if your issue is like the following , you can try to upgrade 5.0.5 or open a case.

Automatic configuration synchronization was not occurring between peers in an HA configuration after a policy change. Status of the synchronization was not correct, the device that the configuration change was made on showed sync was complete, but the peer device showed it was in progress.

Thanks for the quick response and suggestions.  Ours is somewhat similar, but appears cosmetic. The device we did the 'commit' on will still be showing "synching to peer" when we get the error message. This will change to "Synchronized" on both devices after a short while. More of an irritant than an issue at this point, but wanted to know if this is just on our installation, or if it is a version issue...

what's the mgmtsrvr and/or devsrvr resources show when the logs generate? when you perform another sync or see another sync pending, try to retrieve the aformentioned as follows:

admin@Phoenix-VM-Lab143> show system resources | match mgmt

580       20   0  442m 266m 8052 S    0  6.7   5:19.64 mgmtsrvr

admin@Phoenix-VM-Lab143> show system resources | match devsrvr

2288       20   0  221m  74m  12m S    0  1.9   4:01.07 devsrvr

You can also attempt to tail and collect the ha_agent.logs when the issue takes place

admin@Phoenix-VM-Lab143> tail follow yes mp-log ha_agent.log

It would behoove us to have more data so opening a support case would likely yield a more expeditious response.     

Thank you.. I will try to do this tomorrow (Tried capturing this information earlier with 2 separate commits, with no error, then did not try, and had the error pop up... That has been my luck today!) It still looks like a cosmetic or timing issue on the sync.

If I am able to get the detail, I will open a case and send this up..

If it is cosmetic, we can enable a php debug and grab UI I/O as well as attempt replication.

L2 Linker

I am holding off on starting a support case on this, as I have one open already, and will be starting another today (first for DataPlane 100% CPU utilization, slowly getting resolved, second for Custom Application signature Pattern Matching issue, pattern not getting any "hits" on the Security Rule).

It took me half a dozen "commit" to have the issue occur again (never happens when you want it to). I was running the "log tail", and below is a snip of the log.

It appears that there IS an error, and eventually times out trying to sync, but then starts over and succeeds:

Jun 18 09:45:07 ha_peer_send_hello(src/ha_peer.c:4547): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x1 (preempt:)
state    : Active (5)
priority : 100
cookie   : 7596
num tlvs : 2
  Printing out 2 tlvs
  TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    37396664 63616664 36343735 30663435 39646539 37373935
    36653839 66383230 00
  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Jun 18 09:45:45 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Jun 18 09:46:07 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 1; insync: no
Jun 18 09:46:07 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:46:07 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:46:07 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:46:07 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:46:07 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:46:07 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:46:12 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 2; insync: no
Jun 18 09:46:12 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:46:12 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:46:12 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Committing
Jun 18 09:47:07 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:47:07 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:47:07 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:47:07 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:47:13 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 3; insync: no
Jun 18 09:47:13 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:47:13 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:47:13 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Committing
Jun 18 09:47:13 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:47:13 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:47:13 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:47:13 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:47:13 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 4; insync: no
Jun 18 09:47:13 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries
Jun 18 09:47:23 Error: elog_callback(pan_elog.c:51): failed to send system log to management server: TIMEOUT
Jun 18 09:48:13 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:49:58 ha_peer_recv_hello(src/ha_peer.c:4600): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0xff0f
flags   : 0x1 (req:)
length  : 81

  Hello Msg
  ---------
  flags    : 0x1 (preempt:)
  state    : Passive (4)
  priority : 101
  cookie   : 7596
  num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      37396664 63616664 36343735 30663435 39646539 37373935
      36653839 66383230 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Jun 18 09:49:58 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Jun 18 09:49:58 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to In-Sync

L2 Linker

Tech Support had my change the Heartbeat Interval to 2000ms from 1000ms, but that did not make any difference.

We upgraded to 5.0.5, and have not had the issue since.  I cannot say if it was a bug or not, but the upgrade is what seemed to fix our issue.

Russ

  • 1 accepted solution
  • 5256 Views
  • 7 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!