HA Active/Active Mode with Multi VSYS

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

HA Active/Active Mode with Multi VSYS

L3 Networker

Hi All,

 

Is it possible to use a Multi-VSYS Palo Alto to have the active-primary on one Palo Alto and a second VSYS Active-Primary on the second Palo Alto in Active-Active HA mode. I've done this on Cisco Active-Active firewalls but I need to do this on a Palo Alto pair.

 

Regards

Adrian

 

2 accepted solutions

Accepted Solutions

Cyber Elite
Cyber Elite

hi @a.jones 

 

On the Palo Alto chassis HA is achieved at the system level meaning that all components are subservient to the state of the chassis, so you can't have a vsys that is active on one, but not on the other chassis

 

what you can do to achieve a sort of 'vsys spread' among the peers is to use floating IP with a preference priority for one member or the other, wherever you want the specific vsys to receive it's sessions

floating IP.png

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

View solution in original post

Agreed, don't think of Active/Active as Active Primary and Active Secondary.  Think of them as equal partners both able to process or hand off the same traffic simultaneously.  If you really want to do any kind of traffic management and push certain traffic one direction or the other, you need to do this with your routing protocols and NOT a setting on the firewall.  Usually this is done by using Anycast with your default gateway so that two physically disparate locations will prefer the Firewall closest to them and not have to traverse or hairpin through come kind of site-to-site interconnect.  Does this help?

 

PS - I love PAN's Active/Active implementation but I only consider it for very specific use cases.  If your firewalls are stacked together at the same location, you most likely should be using Active/Passive instead.  The goal of Active/Active is NOT to increase throughput.  If this is the mindset you are taking, you will most likely be VERY disappointed.

View solution in original post

18 REPLIES 18

Cyber Elite
Cyber Elite

hi @a.jones 

 

On the Palo Alto chassis HA is achieved at the system level meaning that all components are subservient to the state of the chassis, so you can't have a vsys that is active on one, but not on the other chassis

 

what you can do to achieve a sort of 'vsys spread' among the peers is to use floating IP with a preference priority for one member or the other, wherever you want the specific vsys to receive it's sessions

floating IP.png

Tom Piens
PANgurus - Strata specialist; config reviews, policy optimization

Agreed, don't think of Active/Active as Active Primary and Active Secondary.  Think of them as equal partners both able to process or hand off the same traffic simultaneously.  If you really want to do any kind of traffic management and push certain traffic one direction or the other, you need to do this with your routing protocols and NOT a setting on the firewall.  Usually this is done by using Anycast with your default gateway so that two physically disparate locations will prefer the Firewall closest to them and not have to traverse or hairpin through come kind of site-to-site interconnect.  Does this help?

 

PS - I love PAN's Active/Active implementation but I only consider it for very specific use cases.  If your firewalls are stacked together at the same location, you most likely should be using Active/Passive instead.  The goal of Active/Active is NOT to increase throughput.  If this is the mindset you are taking, you will most likely be VERY disappointed.

I ran active/active for nearly 2 years.

I would recommend stay away from active / active . what they called A/A is not prod ready

especially if you have asyn routing through the nodes

if you use NAT'ing

Also if you have OSPF this can cause asym routing and issue.

 

 

Interesting, I have run Active/Active with OSPF and NAT without a single issue.  I'd be curious to know what version of PAN-OS you were using and how you were setting up NAT.  The biggest hurdle is understanding how to set up your dynamic routing properly and how to set up NAT with floating IPs to make it work correctly.

I had issues with VIP's and the way they were implemented.

 

a packet would enter a node, but would be routerd out OSPF backbone into to the other node this would cause issues as the return path would be different and this would affect session setup and lost packets /32 were taking preference over /24 and force packets via a strange path. Should mention with is with VIP for default gateway

 

with NAT, I was trying to setup a NAT pool, single IP port overload for the internet. It can't be active on both nodes not supported so I was told by support.

 

its been a year plus now so bit hard to remember the whole details. But I gave up the fight after having a long chat with a L2/L3 support person whilst working through some issue.

 

From memory the A/A NAT pool setup was to have different SNAT addresses one on each node.

Correct, you cannot use a single IP.  You have to have 2 different rules, one for each side.  They can however failover for each other.  I don't see why this would be a problem though unless you don't have some kind of control (ie - BGP) on the internet side.

Because there would be async routing happenning going out one node and then returning via the other node.

 

There (at the time) lots of issues. Plus I wanted to use just 1 ip - didn't see the reason to have to waste (duplicate my ip's).

 

Also had issues with GP, portal , gateway and nat.

 

Plus -(thinking of other things now).  I used a load based VIP - DGW for my vlans, part of my monitoring for DGW from VM's was to ping the DGW - only half would work, cause the /32 for the VIP would only be assigned to the active node and some times - just the way it woked the /32 wouldn't respond on the non active node

 

so node a node b 

node a 192.168.1.2/24

node b 192.168.1.3/24

 

node a is active so it gets 192.168.1.1/32 

sounds okay so far.

 

Turn on OSPF passive for 192.168.1.0/24

but active for your OSPF backbone - say 192.168.255.0/24 - different interface than above

 

so ping for 192.168.1.1 going to 192.168.2.3 device see's a route for 192.168.1.1/32 via the OSPF backbone so it send the ping out the OSPF backbone interface - why cause /32 is more precise than /24 and not over the special node to node connection but out the normal interface - of couse the other node then goes why am i getting a 192.168.1.0/24 packet from the OSPF backbone interface ???

 

 

 

 

Yes.  Handling asynchronous routing is the biggest "use case"/reason you would use Active/Active and it handles it very well.  Is there a reason you are avoiding asynchronous routing?

Um, it didn't work well that was the problem.

Apologies as been away.

 

Thanks all. I will try when I migrate using the weighted solution.

bringing back old thread.

 

But q about A/A how do you do SNAT, if you want all traffic to SNAT to 1 ip 

 

Great question!

 

1. Make sure you are using a use case that merits Active/Active.  Usually this is two data centers very close together (HA3 = LOW LATENCY REQUIREMENTS).  This also means two ISP egress points and therefore two different SNAT routable interfaces.

2. You have to have your dynamic routing set up in a way that allows you to have movable/dynamic SNAT routable interfaces (so each firewall can take over if one side goes down).  You can also set this in a way that forces traffic to one firewall for egress.  But then you might as well use Active/Passive.

3. If your firewalls are in the same facility, why are you using Active/Active?  Just use Active/Passive.  A lot of people believe there is a throughput gain from Active/Active and in "certain" scenarios there is.  BUT, remember what happens if you lose a member.  You have to push all that "theoretical" throughput through a single member.  Build planning that one of you firewalls is down.

4. Active/Active is design to handle the asynchronous routing paradigm.  If your firewalls have the same SNAT, I don't see how this applies.

Helo Jermey, All,

 i was reading this forum as i was in some sort of same issue - we do have A-A Palo cluster as both firewalls sits in 2 different DCs and both having separate ISPs connected for internet. Also the NAT was all configured accordingly as well each device id bind to  its respective configured ISP  internet interface ip for traffic out. Also all our Firewall internal interfaces configured as HA floating ip and all our internal networks use respective floating ips of firewall as its gateway address. But course of time one ISP broken and stop using for a while, say 3- 4 years now. lol i know its been a long time to fix an isp issue but this is the history.  since then node -2 where the broken isp connected was suspended. Now its immense pressure from top to put the node back to cluster as active for Full redundancy which is very reasonable request. So we connected the suspend node internet interface to the other live isp serving other node as well. Set up is like ISP connecting to layer-2 switch one one DC where we have a dedicated vlan and the internet interfaces of both firewalls are in same VLAN (we run a spanned VLAN between DCs, so it was an easy task for us to extend internet connection of firewall  to other DC ISP by extending the dedicated VLAN). the firewalls internet interface have different public ips on same pool and no floating ip configured for them. Also we set up duplicate nat rules for incoming (we have many for different hosted applications using dedicated public ips in the /26 pool) and outgoing (only using interface ip of node-1 in both duplicate NAT rules) to bind to both device IDs. Once the secondary firewall back active everything works fine but after some weeks we noticed many of the hosted application performances affected and then we found "arp duplication error" on internet interface of firewalls and we immediately suspend node-2 again. This resolved the arp duplication issue immediately.  But now we need to bring back node-2 to active-secondary again ASAP with a workable solution for internet traffic. So we think to make internet interfaces as well with a floating ip - bind to primary-node [please note all our LAN side interfaces already having floating ip with HA devices ID priority configured, not bind to primary node]. Will this solution going to work with out any issue? what so you think . i have spoken to couple of  palo alto support engineers but they cant answer it correctly.

Please help at the earliest. 

many thanks guys for reading this !

Libin

More more information to add for below

"So we think to make internet interfaces as well with a floating ip - bind to primary-node " and use NAT outgoing duplicate NAT rules bind to both devices using same floating ip. So we assume any outgoing internet traffic uses floating ip only to NAT out always in the Active-primary at any point of given time. but again will there be any issue for incoming nat traffics (many nat rules for different hosted applications) for hosted application since those are duplicated as well to bind to both device ids, but those are just virtual NAT ips only in the /26 pool and not configured for any floating ips or interface ips in the firewalls.

 

thanks agian.

Libin

  • 2 accepted solutions
  • 12692 Views
  • 18 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!