- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
03-17-2014 01:57 PM
Hello. I have two PA 5050's in an HA active/passive pair. I will randomly - once a week get a message "SYSTEM ALERT : critical : HA Group 1 : Running configuration not synchronized after retries. If I check on the dashboard-HighAvailability - I see the config is not synchronized. If I wait a few seconds and refresh this status and the configs are synch'd. I am running version 5.0.8. I have seen this post - but it says it was fixed in 5.0.7. Any help is appreciated.
PAN-OS 5.0.7: Addressed Issues
Zach
03-17-2014 04:19 PM
Hello Zach,
1. Could you please verify ha-agent.logs for more detail information regarding this failure.
2. If you have configured/uploaded any certificates on one HA member, could you please make sure the same information has been updated on the Passive member as well.
3. As per previous support case information, 5.0.9 and 5.0.10 is not having this issue.
4. Could you please take a look on mgmtserver CPU and memory usage..?
Thanks
03-18-2014 05:46 AM
Thank you for the reply.
1 - You referring to the monitor - system tab on the Passive unit? If so nothing here is helpful other than the critical alert and then about 1 minute later stating the configs have successfully sync'd
2 - Certificates are the same on both boxes
3 - we are 5.0.8. Can you point me to the documentation that says to upgrade past 5.0.8 to resolve this issue? (I want to ensure I am fighting either a config issue or a bug issue)
4 - CPU utilization on HA peer at time of issue was DP: 0 MP: 2. CPU utilization on Active at this time was DP: 12 MP: 9. And wow - just realized i'm not collecting memory information. Adding this monitor.
03-18-2014 06:19 AM
So it would appear I cannot monitor the memory utilization on my 5050's running 5.0.8. If someone is aware of a way to do this let me know. I looked at the below two links and found nothing for memory/RAM utilization. I am already monitoring CPU on MP/DP.
03-18-2014 07:42 AM
Hello Sir,
1. I am not talking about system logs, Please find below the command to verify HA-agent.logs.
> less mp-log ha_agent.log ( "/" key-word to search | Shift + G to go to the end of the file)
2. I have given you the suggestion, based on previous case history ( all 3 cases the problem gets resolved after bring the firewall to 5.0.9 /5.0.10.)
Thanks
03-18-2014 08:15 AM
Here are relevant logs from Active device:
Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:07:39 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x3
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:07:39 cfgagent_config_callback(pan_cfgagent.c:228): config length=193157
Mar 17 16:07:39 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:07:39 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:07:39 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:07:46 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:07:46 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:07:46 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:07:46 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:47 Received HA2 MAC address: <output ommitted>
Mar 17 16:07:51 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:07:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x0 ()
state : Active (5)
priority : 10
cookie : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
64373232 61383635 64663231 65626361 32323462 37353739
66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
Mar 17 16:07:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:08:46 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:08:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 1; insync: no
Mar 17 16:08:51 ha_state_cfg_sync_start(src/ha_state_cfg.c:738): Starting config sync for group 1
Mar 17 16:08:51 ha_sysd_start_config_sync(src/ha_sysd.c:781): Sending start sync to mgmtsrvr
Mar 17 16:08:59 ha_state_cfg_check_insync(src/ha_state_cfg.c:279): group 1: mgmtsrvr insync: NO
Mar 17 16:09:18 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message
Msg Hdr
-------
version : 1
groupID : 1
type : Hello (2)
token : 0x6f58
flags : 0x1 (req:)
length : 81
Hello Msg
---------
flags : 0x0 ()
state : Passive (4)
priority : 100
cookie : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
64373232 61383635 64663231 65626361 32323462 37353739
66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:12:51 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 2; insync: yes
*********************************
Here are same logs from HA device:
Mar 17 16:07:51 ha_peer_recv_hello(src/ha_peer.c:4682): Group 1 (HA1-MAIN): Receiving hello message
Msg Hdr
-------
version : 1
groupID : 1
type : Hello (2)
token : 0x7136
flags : 0x1 (req:)
length : 81
Hello Msg
---------
flags : 0x0 ()
state : Active (5)
priority : 10
cookie : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
64373232 61383635 64663231 65626361 32323462 37353739
66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
Mar 17 16:07:51 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were in sync and now we are out of sync; autocommit no; ha-sync no; panorama no; cfg-sync-off no
Mar 17 16:07:51 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Committing
Mar 17 16:07:51 ha_state_cfg_from_insync_to_outsync(src/ha_state_cfg.c:686): peer group 1 has changed the md5, waiting for an update
Mar 17 16:07:59 Error: ha_state_cfg_dosync_fail(src/ha_state_cfg.c:387): Group 1: Config sync start failed on local mgmt srvr
Mar 17 16:07:59 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to Out-of-Sync
Mar 17 16:07:59 ha_state_cfg_dosync_fail(src/ha_state_cfg.c:397): Group 1: setting reason to failure for config sync when we got a dosync failure
Mar 17 16:07:59 ha_state_cfg_sync_callback(src/ha_state_cfg.c:836): ha_state_cfg_sync_callback: retries: 4; insync: no
Mar 17 16:07:59 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:187): ha_agent: cfg agent received flags from server
Mar 17 16:09:04 cfgagent_flags_callback(pan_cfgagent.c:191): new flags=0x4
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:212): ha_agent: cfg agent received configuration from server
Mar 17 16:09:04 cfgagent_config_callback(pan_cfgagent.c:228): config length=193161
Mar 17 16:09:04 ha_cfgagent_phase1(src/ha_cfgagent.c:545): start
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): start
Mar 17 16:09:04 ha_state_cfg_commit_start(src/ha_state_cfg.c:589): Starting monitor hold (no timeout) during phase1
Mar 17 16:09:04 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:516): sending back true for p1done
Mar 17 16:09:13 ha_cfgagent_phase2(src/ha_cfgagent.c:722): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:670): start
Mar 17 16:09:13 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:697): sending back true for p2done
Mar 17 16:09:13 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:640): Starting monitor hold after commit
Mar 17 16:09:13 ha_state_start_monitor_hold(src/ha_state.c:1014): Starting initial monitor hold for group 1; linkmon monitored
Ignoring link and path monitoring failures due to an HA state transition
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:15 Received HA2 MAC address: <output ommitted>
Mar 17 16:09:18 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2743): Got new config md5: <output ommitted>
Mar 17 16:09:18 ha_state_cfg_md5_set(src/ha_state_cfg.c:458): We were out of sync and now we are in sync; autocommit no; ha-sync yes; panorama no; cfg-sync-off no
Mar 17 16:09:18 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1505): Set dev cfgsync to In-Sync
Mar 17 16:09:18 ha_peer_send_hello(src/ha_peer.c:4629): Group 1 (HA1-MAIN): Sending hello message
Hello Msg
---------
flags : 0x0 ()
state : Passive (4)
priority : 100
cookie : 64519
num tlvs : 2
Printing out 2 tlvs
TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:
64373232 61383635 64663231 65626361 32323462 37353739
66313261 31313865 00
TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:
00000000
Mar 17 16:10:13 ha_state_monitor_hold_callback(src/ha_state.c:1936): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!