Enhanced Security Measures in Place:   To ensure a safer experience, we’ve implemented additional, temporary security measures for all users.

HA sync fail, not normal

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

HA sync fail, not normal

L1 Bithead

Need sync help. About two months ago we noticed our passive 3250 would slowly not respond. It finally died, no console response. Opened ticket, had one RMA, powered it up. Imported the saved config we took before it died. Checked ip's, management setting, etc and wired it up. Configs not in sync, expecting that. Sync to peer from active to passive via gui, fails. Sync via cli failes. Response we get back is "there was an error during the syncronation." That's it. Only one error in ha log on active. HA state on cli is normal for both. There is no PAN.

2024-04-23 16:09:34.483 -0500 Error: ha_peer_hello_callback(src/ha_peer.c:5373): Group 1 (HA1-MAIN): Peer namespace on peer device missing too long, trying to restart

It is two 3250's running 11.0.2-h4. They have both been rebooted recently due to code upgrades. Upgraded passive, reboot. Upgrade active, reboot. The passive took over with no issues and the active took control when it came back online (we used preempt). We use management port as HA1 and eth1/10 as HA2. Eth1/10 has been configured as a HA interface. Management IP on management port. The two can ping and ssh to each other. We have a set of 850's setup the same way that working fine. Can not direct connect the HA ports because they are 50 miles from each other.

 

We have opened multiple tickets with TAC. It has been over a month with no progress. There were six techs who all advised me to copy active to passive, change mnmt settings, sync. Done it eight times, still failes. Tried it with sync gui, sync cli, management plane restart, and force commit. We've defaulted the RMA and tried again. These are production so no "true" HA setup.

 

This is interesting. Passive can get active versions, active can not get passive versions. Idk.

 

Active_HA_no_ip.pngPassive_HA_no_ip.png

5 REPLIES 5

Cyber Elite
Cyber Elite

if all else fails: have you tried simply setting up the passive one with basic config (manually) so it can join the cluster, then push everything from the active down to passive ?

 

the HA error is a bit weird, log name space... do the firewalls have very long host names ? 

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

We did default it. Set up HA and management settings. That's it. No luck.

 

Hostname is 13, domain is 22.  Only special characters is two '-' in hostname.  

Cyber Elite
Cyber Elite

@C.Dunn241943,

An interesting test that would remove HA from the equation would be to take the running-config on your active firewall and swap out everything in <deviceconfig> in the XML file for what's in your passive firewall's config. Then load that updated configuration onto the passive firewall and commit. 

 

Also just to be clear, I'm assuming that you haven't already done this. Your second paragraph where you mention the passive device taking over, it's not clear to me if it doesn't have your configuration and you just took the outage or if you already did the above step to manually sync things across and the updates simply won't sync automatically. 

I get typing and things get confusing, sorry.

 

"There were six techs who all advised me to copy active to passive, change mnmt settings, sync. Done it eight times, still failes. Tried it with sync gui, sync cli, management plane restart, and force commit. "

- This what you were recommending. Take the active config, load it on passive, change all the ha and management settings to the passive ones, then commit.

 

"The passive took over with no issues and the active took control when it came back online (we used preempt)."

- This is from when the active was rebooted for code upgrade.  We duplicate all the changes on the active to the passive since we can't sync.

 

Latest update:

This is an update on case xxxxxxxx to inform you that we are engaging additional resources on your case to proceed further. We will get back to you within the next 1 Business day.

 

L0 Member

Hi guys,
Have you found a solution in the meantime? Unfortunately, I have a similar issue.


On both firewalls the App, Threat, AV versions are Unknown. The running configuration is synchronized, but I had to synchronize them manually.
But auto sync is still not working afterwards. If I just click on "Click to see local and peer running configuration diff", I get the following error message: "Failed to get content for 'base--peer-running'!" and I found the same error as C.Dunn241943 in mp-log ha_agent.log:

Error: ha_peer_hello_callback(src/ha_peer.c:5373): Group 1 (HA1-MAIN): peer namespace on peer device missing too long, trying to restart

 

 

  • 1680 Views
  • 5 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!