V5.0.4 HA Group1: Running configuration not synchronized after retries

ktm530russ · ‎06-17-2013

This message appears (email, and SNMP trap) pretty much anytime I run a "commit" on the box. It appears cosmetic, as the GUI on both boxes show them being in synch. (possible latency/delay issues during synchsynch causing this mis-fire?)

I noted this was an issue in the 3.x version of code, did they somehow "unfix" this bug? Anyone else experiencing this error message?

ktm530russ · ‎07-08-2013

Tech Support had my change the Heartbeat Interval to 2000ms from 1000ms, but that did not make any difference.

We upgraded to 5.0.5, and have not had the issue since. I cannot say if it was a bug or not, but the upgrade is what seemed to fix our issue.

Russ

View solution in original post

Retired Member · ‎06-17-2013

I did not see this with 5.0.4 but if your issue is like the following , you can try to upgrade 5.0.5 or open a case.

Automatic configuration synchronization was not occurring between peers in an HA configuration after a policy change. Status of the synchronization was not correct, the device that the configuration change was made on showed sync was complete, but the peer device showed it was in progress.

ktm530russ · ‎06-17-2013

Thanks for the quick response and suggestions. Ours is somewhat similar, but appears cosmetic. The device we did the 'commit' on will still be showing "synching to peer" when we get the error message. This will change to "Synchronized" on both devices after a short while. More of an irritant than an issue at this point, but wanted to know if this is just on our installation, or if it is a version issue...

gswcowboy · ‎06-17-2013

what's the mgmtsrvr and/or devsrvr resources show when the logs generate? when you perform another sync or see another sync pending, try to retrieve the aformentioned as follows:

admin@Phoenix-VM-Lab143> show system resources | match mgmt

580 20 0 442m 266m 8052 S 0 6.7 5:19.64 mgmtsrvr

admin@Phoenix-VM-Lab143> show system resources | match devsrvr

2288 20 0 221m 74m 12m S 0 1.9 4:01.07 devsrvr

You can also attempt to tail and collect the ha_agent.logs when the issue takes place

admin@Phoenix-VM-Lab143> tail follow yes mp-log ha_agent.log

It would behoove us to have more data so opening a support case would likely yield a more expeditious response.

ktm530russ · ‎06-17-2013

Thank you.. I will try to do this tomorrow (Tried capturing this information earlier with 2 separate commits, with no error, then did not try, and had the error pop up... That has been my luck today!) It still looks like a cosmetic or timing issue on the sync.

If I am able to get the detail, I will open a case and send this up..

gswcowboy · ‎06-18-2013

If it is cosmetic, we can enable a php debug and grab UI I/O as well as attempt replication.

ktm530russ · ‎06-19-2013

I am holding off on starting a support case on this, as I have one open already, and will be starting another today (first for DataPlane 100% CPU utilization, slowly getting resolved, second for Custom Application signature Pattern Matching issue, pattern not getting any "hits" on the Security Rule).

It took me half a dozen "commit" to have the issue occur again (never happens when you want it to). I was running the "log tail", and below is a snip of the log.

It appears that there IS an error, and eventually times out trying to sync, but then starts over and succeeds:

Jun 18 09:45:07 ha_peer_send_hello(src/ha_peer.c:4547): Group 1 (HA1-MAIN): Sending hello message

Hello Msg
---------
flags    : 0x1 (preempt:)
state    : Active (5)
priority : 100
cookie   : 7596
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
    37396664 63616664 36343735 30663435 39646539 37373935
    36653839 66383230 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
    00000000
Jun 18 09:45:45 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Jun 18 09:46:07 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 1; insync: no
Jun 18 09:46:07 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:46:07 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:46:07 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:46:07 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:46:07 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:46:07 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:46:12 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 2; insync: no
Jun 18 09:46:12 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:46:12 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:46:12 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Committing
Jun 18 09:47:07 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:47:07 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:47:07 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:47:07 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:47:13 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 3; insync: no
Jun 18 09:47:13 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Jun 18 09:47:13 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Jun 18 09:47:13 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Committing
Jun 18 09:47:13 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:47:13 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Jun 18 09:47:13 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to Out-of-Sync
Jun 18 09:47:13 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Jun 18 09:47:13 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 4; insync: no
Jun 18 09:47:13 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries
Jun 18 09:47:23 Error: elog_callback(pan_elog.c:51): failed to send system log to management server: TIMEOUT
Jun 18 09:48:13 Error: ha_sysd_mgmt_dosync_modify_callback(src/ha_sysd.c:3701): Error when trying to modify sw.mgmt.runtime.dosync
Jun 18 09:49:58 ha_peer_recv_hello(src/ha_peer.c:4600): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr
-------
version : 1
groupID : 1
type    : Hello (2)
token   : 0xff0f
flags   : 0x1 (req:)
length : 81

Hello Msg
---------
flags    : 0x1 (preempt:)
state    : Passive (4)
priority : 101
cookie   : 7596
num tlvs : 2
    Printing out 2 tlvs
    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
      37396664 63616664 36343735 30663435 39646539 37373935
      36653839 66383230 00
    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
      00000000
Jun 18 09:49:58 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Jun 18 09:49:58 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1506): Set dev cfgsync to In-Sync

ktm530russ · ‎07-08-2013

Tech Support had my change the Heartbeat Interval to 2000ms from 1000ms, but that did not make any difference.

We upgraded to 5.0.5, and have not had the issue since. I cannot say if it was a bug or not, but the upgrade is what seemed to fix our issue.

Russ

Unlock your full community experience!

V5.0.4 HA Group1: Running configuration not synchronized after retries

V5.0.4 HA Group1: Running configuration not synchronized after retries

Show your appreciation!