- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
04-03-2013 07:37 AM
Hello!
We are testing out a topology in the lab, with 2 PA-2020 in an active/passive HA cluster. They are between 2 pairs of Cisco switches and should play a role of redundant in-line firewalls. The connection to the switches is with FO modules on ports e1/13 and e1/14 (these ports are in a monitor group).
What we have noticed is some strange behaviour, and it is the same with PANOS 4.1.9. and 4.1.11.
If we pull out the cable on port e1/13 on the primary/active device, the firewalls failover, and the secondary/passive device becomes secondary/active. The now primary/passive will go to a non-functional state, and after a minute to passive, and will then again move to the active state. Of course, the cable is still unplugged, so the failover happens again, and the secondary device becomes active once more. The process will continue until the primary device moves into a suspended state (3 times by default). The data traffic is highly effected with the failovering and spanning-tree recalculations on the Cisco switches.
When we disable the preemption, this does not happen, and failovering worked perfectly through different scenarios.
So, my question is - should the preemption be disabled in vwire active/passive HA? I have not found any reference or configuration best practice for this kind of topology in any document.
To me it seems logical that the firewall should check the state of the monitored interfaces (or path) before trying to resume its active role, even with preemption enabled.
Thanks!
04-03-2013 04:42 PM
Out of the blue that sounds like a bug.
Also even if preemptive it shouldnt failover back to unit1 if unit1 isnt 100% available (that is couldnt ping whatever gateways you are monitoring against) - dunno on the other hand how PA handles this case.
If you run vwire you could use these two boxes as two independent PA units and put the same security policy on them through shared config in panorama. This way no session sync is needed and no hazzle with failover who lives on its own.
As described in http://www.aristanetworks.com/media/system/pdf/palo_alto_networks_arista.pdf
The setup would be something like (example):
Cisco1: e1/13 (PA1_1), e1/14 (PA2_1)
||
PA1: VWIRE1: int1 (Cisco1_e1_13), int2 (Cisco2_e1_13)
PA2: VWIRE1: int1 (Cisco1_e1_14), int2 (Cisco2_e1_14)
||
Cisco2: e1/13 (PA1_2), e1/14 (PA2_2)
and then make sure that loadbalancing for the etherchannel is L3 or lower (L2 etc). That is srcip+dstip on both ends is ok but its better if its dstip on the outer cisco and srcip on the inner cisco (to make life easier for the PA when it will identify bittorrent, skype etc heuristic based stuff). Also srcmac+dstmac would be ok.
Most modern cisco gear can use at least 8 paths for a single etherchannel in case you would like to scale things up.
Another note is if your cisco boxes can do virtual chassis then this etherchannel can be shared by multiple boxes (otherwise you would need some active/passive thingy on the cisco gear or spanningtree or such to disable the "looping" switch. That is if you have 2 ciscos as outer and 2 ciscos as inner switches (and no virtual chassis). This would also mean two VWIREs on each PA unit.
04-05-2013 12:48 AM
Today I tested the same setup with a pair o PA-500 on PANOS 4.1.9, and the behaviour is the same.
Part of the log file from one of the devices:
2013/04/11 00:10:18info ha state-c 0 HA Group 1: Moved from state Passive to state Active
2013/04/11 00:10:18info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer started.
2013/04/11 00:10:18info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer started.
2013/04/11 00:10:19info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer exit.
2013/04/11 00:10:19info routing routed- 0 FIB HA sync started when local device becomes master.
2013/04/11 00:10:19info routing routed- 0 FIB HA sync started when peer device becomes passive.
2013/04/11 00:10:20info port link-ch 0 Port 2: Up 100Mb/s-full duplex
2013/04/11 00:10:22info port link-ch 0 Port 1: Up 1Gb/s-full duplex
2013/04/11 00:11:04info port link-ch 0 Port 1: Down 1Gb/s-full duplex <---------------------------------------------------------- pulling out the cable
2013/04/11 00:11:18high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/1' is down
2013/04/11 00:11:18high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/2' is down
2013/04/11 00:11:18critical ha link-mo 0 HA Group 1: Link group '1' failure; one or more links are down
2013/04/11 00:11:18critical ha state-c 0 HA Group 1: Moved from state Active to state Non-Functional
2013/04/11 00:11:18info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer no longer needed.
2013/04/11 00:11:18info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer no longer needed.
2013/04/11 00:11:18critical general general 0 Chassis Master Alarm: HA-event
2013/04/11 00:11:19info routing routed- 0 FIB HA sync started when local device becomes master.
2013/04/11 00:12:18info ha state-c 0 HA Group 1: Moved from state Non-Functional to state Passive
2013/04/11 00:12:18critical general general 0 Chassis Master Alarm: Cleared
2013/04/11 00:12:42info general general 0 User admin accessed Monitor tab
2013/04/11 00:13:29info ha state-c 0 HA Group 1: Moved from state Passive to state Active <---------------------------------------------------------- first transition to active state
2013/04/11 00:13:29info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer started.
2013/04/11 00:13:29info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer started.
2013/04/11 00:13:29info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer exit.
2013/04/11 00:13:30info routing routed- 0 FIB HA sync started when local device becomes master.
2013/04/11 00:13:30info routing routed- 0 FIB HA sync started when peer device becomes passive.
2013/04/11 00:13:30info port link-ch 0 Port 2: Up 100Mb/s-full duplex
2013/04/11 00:14:29high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/1' is down
2013/04/11 00:14:29high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/2' is down
2013/04/11 00:14:29critical ha link-mo 0 HA Group 1: Link group '1' failure; one or more links are down
2013/04/11 00:14:29critical ha state-c 0 HA Group 1: Moved from state Active to state Non-Functional
2013/04/11 00:14:29info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer no longer needed.
2013/04/11 00:14:29info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer no longer needed.
2013/04/11 00:14:29critical general general 0 Chassis Master Alarm: HA-event
2013/04/11 00:14:30info routing routed- 0 FIB HA sync started when local device becomes master.
2013/04/11 00:15:29info ha state-c 0 HA Group 1: Moved from state Non-Functional to state Passive
2013/04/11 00:15:29critical general general 0 Chassis Master Alarm: Cleared
2013/04/11 00:15:38info general general 0 User admin accessed Monitor tab
2013/04/11 00:16:39info ha state-c 0 HA Group 1: Moved from state Passive to state Active <---------------------------------------------------------- second transition to active state
2013/04/11 00:16:39info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer started.
2013/04/11 00:16:39info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer started.
2013/04/11 00:16:40info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer exit.
2013/04/11 00:16:40info routing routed- 0 FIB HA sync started when local device becomes master.
2013/04/11 00:16:40info routing routed- 0 FIB HA sync started when peer device becomes passive.
2013/04/11 00:16:41info port link-ch 0 Port 2: Up 100Mb/s-full duplex
2013/04/11 00:17:39high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/1' is down
2013/04/11 00:17:39high ha link-mo 0 HA Group 1: Link group '1' link 'ethernet1/2' is down
2013/04/11 00:17:39critical ha link-mo 0 HA Group 1: Link group '1' failure; one or more links are down
2013/04/11 00:17:39critical ha state-c 0 HA Group 1: Moved from state Active to state Non-Functional <---------------------------------------------------------- third transition to active state, but momentarily moves to "non-fuctional"
2013/04/11 00:17:39info ras rasmgr- 0 RASMGR daemon sync all user info to HA peer no longer needed.
2013/04/11 00:17:39critical ha preempt 0 HA Group 1: Going to Suspended state due to detection of a preemption loop after 3 loops
2013/04/11 00:17:39info vpn keymgr- 0 KEYMGR sync all IPSec SA to HA peer no longer needed.
2013/04/11 00:17:39critical ha state-c 0 HA Group 1: Moved from state Non-Functional to state Suspended
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!