PA 500 cluster synchronization failure

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

PA 500 cluster synchronization failure

L4 Transporter

Hello,

I've a problem with a cluster of PA500 running PANOS 4.1.8.

Config File synchronization is not working between members.

After a config change is done on the master, the following error message appears in the log file of the passive member:

HA Group 1: Running configuration not synchronized after retries

The only way to sync is to move on the CLI on the master and sync manually (request high-availability sync-to-remote running-config)

No problem before upgrading to 4.1.8...

Regards,

HA

1 accepted solution

Accepted Solutions

L4 Transporter

The fix to HA sync will be introduced in software version 4.1.9 .

However 4.1.8 hotfix is now available. So please open a support ticket with Palo Alto Networks and once verified, it would be made available to you.

4.1.8-hotfix should take care of  HA A/P, A/A, and Panorama HA.

For more details look up this document:-

https://live.paloaltonetworks.com/docs/DOC-3890

Regards

Parth

View solution in original post

11 REPLIES 11

L4 Transporter

Hello,

So, as far as I understand after upgrade to 4.1.8 customer is seeing automatic HA config sync not being triggered after a config change.

Do you see the following behavior:-

>When commit is successful on the active unit, HA sync on the passive will go on for long.

>No jobs will be seen under the passive device for HA sync.  (admin@PA>show jobs processed)

>Running Configuration on the passive will show:- synchronization in progress

>After few minutes , the config on the passive device will be out of sync

It will show the following:-

Running Configuration: not synchronized

Out-of-sync Reason: Failure to complete config sync

>However at this time the the active device running configuration will show "synchronized.

If the ABOVE is the case please open a support ticket with Palo Alto Networks and get the issue looked upon.

I might have seen this issue while doing a recreation in-house but will be curious to get into the details.

Regards

Parth

Hello,

First, thanks for comment.

Q: Do you see the following behavior:"

A: No jobs will be seen under the passive device for HA sync.  (admin@PA>show jobs processed)


After few minutes , the config on the passive device will be out of sync

It will show the following:-

Running Configuration: not synchronized

Out-of-sync Reason: Running configuration not synchronized after retries

Q :However at this time the the active device running configuration will show "synchronized.

A: Exact.

I had to upgrade from 4.1.6 to 4.1.8 because of the bug ID 43575 (mgmt-plane unresponsive).

This is the only problem I face with 4.1.8.

Regards,

HA

Hello,


When commit is successful on the active unit, HA sync on the passive will go on for long.

At this time, On the passive device, when the automatic synchronization is going on execute the following command:-

admin@PA-500> tail lines 100 follow yes mp-log ha_agent.log

Look for the error:-

mp \ ha_agent.log   ha_state_cfg_from_insync_to_outsync(src/ha_state_cfg.c:609): peer group 1 has changed the md5, waiting for an update

Submit all these details by opening up a support ticket.

Regards

Parth

Also when you open a support ticket, please make sure you attach the tech support files from active and passive unit to the case.

How to generate the TS file?

From the Palo Alto Device Web Interface,

1) Go to Device Tab --> Support

2) Click Generate Tech Support File

3) Once Generated, Download it to your Desktop

4) Log into your case management Tool to open up the case, scroll down towards the bottom and Click "Upload File"

5)Click OK

Let me know if the above details helped you to proceed with the next steps.

Regards

Parth

Hello,

That's result of the command on both FW

Active FW

---------

Oct 03 12:07:14 cfgagent_flags_callback(pan_cfgagent.c:178): ha_agent: cfg agent received flags from server

Oct 03 12:07:14 cfgagent_flags_callback(pan_cfgagent.c:182): new flags=0x6

Oct 03 12:07:14 cfgagent_config_callback(pan_cfgagent.c:203): ha_agent: cfg agent received configuration from server

Oct 03 12:07:14 cfgagent_config_callback(pan_cfgagent.c:219): config length=30532

Oct 03 12:07:14 ha_cfgagent_phase1(src/ha_cfgagent.c:514): start

Oct 03 12:07:14 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:454): start

Oct 03 12:07:14 ha_state_cfg_commit_start(src/ha_state_cfg.c:512): Starting monitor hold (no timeout) during phase1

Oct 03 12:07:14 ha_cfgagent_phase1_callback(src/ha_cfgagent.c:485): sending back true for p1done

Oct 03 12:07:49 ha_cfgagent_phase2(src/ha_cfgagent.c:691): start

Oct 03 12:07:49 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:639): start

Oct 03 12:07:49 ha_cfgagent_phase2_callback(src/ha_cfgagent.c:666): sending back true for p2done

Oct 03 12:07:49 ha_state_cfg_commit_succeed(src/ha_state_cfg.c:563): Starting monitor hold after commit

Oct 03 12:07:49 ha_state_start_monitor_hold(src/ha_state.c:973): Starting initial monitor hold for group 1; linkmon monitored

        Ignoring link and path monitoring failures due to an HA state transition

Oct 03 12:07:53 Error: ha_ping_send(src/ha_ping.c:488): Unable to send icmp packet:(errno: 22) Invalid argument

Oct 03 12:07:53 Error: ha_ping_send(src/ha_ping.c:488): Unable to send icmp packet:(errno: 22) Invalid argument

Oct 03 12:07:54 Error: ha_ping_send(src/ha_ping.c:488): Unable to send icmp packet:(errno: 22) Invalid argument

Oct 03 12:07:54 Received HA1 MAC address: 00:1b:17:54:f3:13

Oct 03 12:07:54 Received HA2 MAC address: 00:1b:17:54:f3:15

Oct 03 12:07:54 Received HA2 MAC address: 00:1b:17:54:f3:15

Oct 03 12:07:55 Error: ha_ping_peer_miss(src/ha_ping.c:554): Missed 1 ping timeouts out of 3 (ha1)

Oct 03 12:07:56 Error: ha_ping_send(src/ha_ping.c:488): Unable to send icmp packet:(errno: 22) Invalid argument

Oct 03 12:07:56 ha_sysd_config_md5_notifier_callback(src/ha_sysd.c:2615): Got new config md5: ed4dfc17e96c6dacdc0f8a9e72f60fa4

Oct 03 12:07:56 ha_state_cfg_md5_set(src/ha_state_cfg.c:411): We were in sync and now we are out of sync; autocommit no; ha-sync yes; cfg-sync-off no

Oct 03 12:07:56 ha_peer_send_hello(src/ha_peer.c:3950): Group 1 (HA1-MAIN): Sending hello message

Hello Msg

---------

flags    : 0x1 (preempt:)

state    : Active (5)

priority : 10

cookie   : 59474

num tlvs : 2

  Printing out 2 tlvs

  TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:

    65643464 66633137 65393663 36646163 64633066 38613965

    37326636 30666134 00

  TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

    00000000

Oct 03 12:07:57 Error: ha_ping_send(src/ha_ping.c:488): Unable to send icmp packet:(errno: 22) Invalid argument

Oct 03 12:07:57 Received HA1-Backup MAC address: 00:1b:17:54:f3:14

Oct 03 12:07:57 Received HA2-Backup MAC address: 00:1b:17:54:f3:16

Oct 03 12:07:57 Received HA2-Backup MAC address: 00:1b:17:54:f3:16

Oct 03 12:07:58 Error: ha_ping_peer_miss(src/ha_ping.c:554): Missed 1 ping timeouts out of 3 (ha1-backup)

Oct 03 12:08:49 ha_state_monitor_hold_callback(src/ha_state.c:1688): Group 1: ending initial monitor hold; no longer ignoring link and path monitoring failures due to an HA state transition

Passive FW

----------

Oct 03 12:07:54 Error: ha_ping_peer_miss(src/ha_ping.c:554): Missed 1 ping timeouts out of 3 (ha1)

Oct 03 12:07:56 ha_peer_recv_hello(src/ha_peer.c:3998): Group 1 (HA1-MAIN): Receiving hello message

Msg Hdr

-------

version : 1

groupID : 1

type    : Hello (2)

token   : 0x4e35

flags   : 0x1 (req:)

length  : 81

  Hello Msg

  ---------

  flags    : 0x1 (preempt:)

  state    : Active (5)

  priority : 10

  cookie   : 59474

  num tlvs : 2

    Printing out 2 tlvs

    TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:

      65643464 66633137 65393663 36646163 64633066 38613965

      37326636 30666134 00

    TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

      00000000

Oct 03 12:07:56 ha_peer_recv_tlv(src/ha_peer.c:2898): Group 1 (HA1-MAIN): Received TLVs:

Printing out 2 tlvs

TLV[1]: type 2 (CONFIG_MD5SUM); len 33; value:

  65643464 66633137 65393663 36646163 64633066 38613965

  37326636 30666134 00

TLV[2]: type 11 (SYSD_PEER_DOWN); len 4; value:

  00000000

Oct 03 12:07:56 ha_state_cfg_md5_set(src/ha_state_cfg.c:411): We were in sync and now we are out of sync; autocommit no;                                    ha-sync no; cfg-sync-off no

Oct 03 12:07:56 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1474): Set dev cfgsync to Committing

Oct 03 12:07:56 ha_state_cfg_from_insync_to_outsync(src/ha_state_cfg.c:609): peer group 1 has changed the md5, waiting f                                   or an update

Oct 03 12:07:57 Error: ha_ping_peer_miss(src/ha_ping.c:554): Missed 1 ping timeouts out of 3 (ha1-backup)

Oct 03 12:22:56 ha_state_cfg_sync_callback(src/ha_state_cfg.c:751): ha_state_cfg_sync_callback: retries: 4; insync: no

Oct 03 12:22:56 Warning: ha_event_log(src/ha_event.c:47): HA Group 1: Running configuration not synchronized after retries

Oct 03 12:22:56 ha_sysd_dev_cfgsync_update(src/ha_sysd.c:1474): Set dev cfgsync to Out-of-Sync

Regards,

HA

Open a support ticket with all the data as requested in above threads.

Thanks

Parth

Not applicable

We are seeing the same issue since the upgrade to 4.1.8.  Is there a way to manually invoke a config synch in the meantime?

John,

Yes, the manual sync from the active device should work to the passive.

From the CLI ,use the following command:-

admin@Lab-59-PA-500(active)> request high-availability sync-to-remote running-config

Regards

Parth

Hello,

I got the following official response from the TAC

"We are aware of this issue which was introduced after PanOS 4.1.8.
  Our devteam is currently working on a fix for this issue.
  The workaround for now is to manually do the sync command in CLI."


Regards,


HA

L4 Transporter

The fix to HA sync will be introduced in software version 4.1.9 .

However 4.1.8 hotfix is now available. So please open a support ticket with Palo Alto Networks and once verified, it would be made available to you.

4.1.8-hotfix should take care of  HA A/P, A/A, and Panorama HA.

For more details look up this document:-

https://live.paloaltonetworks.com/docs/DOC-3890

Regards

Parth

Hello,

I have updated a cluster of PA500 yesterday.

The sync issue is now solved...

Regards,

HA

  • 1 accepted solution
  • 5123 Views
  • 11 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!