Virtual-wire active/passive HA issue

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Virtual-wire active/passive HA issue

Hello!

We are testing out a topology in the lab, with 2 PA-2020 in an active/passive HA cluster. They are between 2 pairs of Cisco switches and should play a role of redundant in-line firewalls. The connection to the switches is with FO modules on ports e1/13 and e1/14 (these ports are in a monitor group).

What we have noticed is some strange behaviour, and it is the same with PANOS 4.1.9. and 4.1.11.

If we pull out the cable on port e1/13 on the primary/active device, the firewalls failover, and the secondary/passive device becomes secondary/active. The now primary/passive will go to a non-functional state, and after a minute to passive, and will then again move to the active state. Of course, the cable is still unplugged, so the failover happens again, and the secondary device becomes active once more. The process will continue until the primary device moves into a suspended state (3 times by default). The data traffic is highly effected with the failovering and spanning-tree recalculations on the Cisco switches.

When we disable the preemption, this does not happen, and failovering worked perfectly through different scenarios.

So, my question is - should the preemption be disabled in vwire active/passive HA? I have not found any reference or configuration best practice for this kind of topology in any document.

To me it seems logical that the firewall should check the state of the monitored interfaces (or path) before trying to resume its active role, even with preemption enabled.

Thanks!

2 REPLIES 2

L6 Presenter

Out of the blue that sounds like a bug.

Also even if preemptive it shouldnt failover back to unit1 if unit1 isnt 100% available (that is couldnt ping whatever gateways you are monitoring against) - dunno on the other hand how PA handles this case.

If you run vwire you could use these two boxes as two independent PA units and put the same security policy on them through shared config in panorama. This way no session sync is needed and no hazzle with failover who lives on its own.

As described in http://www.aristanetworks.com/media/system/pdf/palo_alto_networks_arista.pdf

The setup would be something like (example):

Cisco1: e1/13 (PA1_1), e1/14 (PA2_1)

||

PA1: VWIRE1: int1 (Cisco1_e1_13), int2 (Cisco2_e1_13)

PA2: VWIRE1: int1 (Cisco1_e1_14), int2 (Cisco2_e1_14)

||

Cisco2: e1/13 (PA1_2), e1/14 (PA2_2)

and then make sure that loadbalancing for the etherchannel is L3 or lower (L2 etc). That is srcip+dstip on both ends is ok but its better if its dstip on the outer cisco and srcip on the inner cisco (to make life easier for the PA when it will identify bittorrent, skype etc heuristic based stuff). Also srcmac+dstmac would be ok.

Most modern cisco gear can use at least 8 paths for a single etherchannel in case you would like to scale things up.

Another note is if your cisco boxes can do virtual chassis then this etherchannel can be shared by multiple boxes (otherwise you would need some active/passive thingy on the cisco gear or spanningtree or such to disable the "looping" switch. That is if you have 2 ciscos as outer and 2 ciscos as inner switches (and no virtual chassis). This would also mean two VWIREs on each PA unit.

Today I tested the same setup with a pair o PA-500 on PANOS 4.1.9, and the behaviour is the same.

Part of the log file from one of the devices:

2013/04/11 00:10:18info     ha             state-c 0  HA Group 1: Moved from state Passive to state Active

2013/04/11 00:10:18info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer started.

2013/04/11 00:10:18info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer started.

2013/04/11 00:10:19info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer exit.

2013/04/11 00:10:19info     routing        routed- 0  FIB HA sync started when local device becomes master.

2013/04/11 00:10:19info     routing        routed- 0  FIB HA sync started when peer device becomes passive.

2013/04/11 00:10:20info     port           link-ch 0  Port  2: Up   100Mb/s-full duplex

2013/04/11 00:10:22info     port           link-ch 0  Port  1: Up   1Gb/s-full duplex

2013/04/11 00:11:04info     port           link-ch 0  Port  1: Down 1Gb/s-full duplex  <---------------------------------------------------------- pulling out the cable

2013/04/11 00:11:18high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/1' is down

2013/04/11 00:11:18high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/2' is down

2013/04/11 00:11:18critical ha             link-mo 0  HA Group 1: Link group '1' failure; one or more links are down

2013/04/11 00:11:18critical ha             state-c 0  HA Group 1: Moved from state Active to state Non-Functional

2013/04/11 00:11:18info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer no longer needed.

2013/04/11 00:11:18info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer no longer needed.

2013/04/11 00:11:18critical general        general 0  Chassis Master Alarm: HA-event

2013/04/11 00:11:19info     routing        routed- 0  FIB HA sync started when local device becomes master.

2013/04/11 00:12:18info     ha             state-c 0  HA Group 1: Moved from state Non-Functional to state Passive

2013/04/11 00:12:18critical general        general 0  Chassis Master Alarm: Cleared

2013/04/11 00:12:42info     general        general 0  User admin accessed Monitor tab

2013/04/11 00:13:29info     ha             state-c 0  HA Group 1: Moved from state Passive to state Active <---------------------------------------------------------- first transition to active state

2013/04/11 00:13:29info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer started.

2013/04/11 00:13:29info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer started.

2013/04/11 00:13:29info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer exit.

2013/04/11 00:13:30info     routing        routed- 0  FIB HA sync started when local device becomes master.

2013/04/11 00:13:30info     routing        routed- 0  FIB HA sync started when peer device becomes passive.

2013/04/11 00:13:30info     port           link-ch 0  Port  2: Up   100Mb/s-full duplex

2013/04/11 00:14:29high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/1' is down

2013/04/11 00:14:29high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/2' is down

2013/04/11 00:14:29critical ha             link-mo 0  HA Group 1: Link group '1' failure; one or more links are down

2013/04/11 00:14:29critical ha             state-c 0  HA Group 1: Moved from state Active to state Non-Functional

2013/04/11 00:14:29info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer no longer needed.

2013/04/11 00:14:29info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer no longer needed.

2013/04/11 00:14:29critical general        general 0  Chassis Master Alarm: HA-event

2013/04/11 00:14:30info     routing        routed- 0  FIB HA sync started when local device becomes master.

2013/04/11 00:15:29info     ha             state-c 0  HA Group 1: Moved from state Non-Functional to state Passive

2013/04/11 00:15:29critical general        general 0  Chassis Master Alarm: Cleared

2013/04/11 00:15:38info     general        general 0  User admin accessed Monitor tab

2013/04/11 00:16:39info     ha             state-c 0  HA Group 1: Moved from state Passive to state Active <---------------------------------------------------------- second transition to active state

2013/04/11 00:16:39info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer started.

2013/04/11 00:16:39info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer started.

2013/04/11 00:16:40info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer exit.

2013/04/11 00:16:40info     routing        routed- 0  FIB HA sync started when local device becomes master.

2013/04/11 00:16:40info     routing        routed- 0  FIB HA sync started when peer device becomes passive.

2013/04/11 00:16:41info     port           link-ch 0  Port  2: Up   100Mb/s-full duplex

2013/04/11 00:17:39high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/1' is down

2013/04/11 00:17:39high     ha             link-mo 0  HA Group 1: Link group '1' link 'ethernet1/2' is down

2013/04/11 00:17:39critical ha             link-mo 0  HA Group 1: Link group '1' failure; one or more links are down

2013/04/11 00:17:39critical ha             state-c 0  HA Group 1: Moved from state Active to state Non-Functional <---------------------------------------------------------- third transition to active state, but momentarily moves to "non-fuctional"

2013/04/11 00:17:39info     ras            rasmgr- 0  RASMGR daemon sync all user info to HA peer no longer needed.

2013/04/11 00:17:39critical ha             preempt 0  HA Group 1: Going to Suspended state due to detection of a preemption loop after 3 loops

2013/04/11 00:17:39info     vpn            keymgr- 0  KEYMGR sync all IPSec SA to HA peer no longer needed.

2013/04/11 00:17:39critical ha             state-c 0  HA Group 1: Moved from state Non-Functional to state Suspended

  • 3731 Views
  • 2 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!