Active/Active ECMP

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Active/Active ECMP

L3 Networker

I have two Palo Alto 5250s running in my core network as a core firewall for all campus and datacetner traffic. They are running active/active. I have layer 3 routing south bound to two cat9500s not in VSS. So I am running HSRP on each 9500 alternating vlans to utilize them both. All 4 units are running OSPF to advertise loopbacks and iBGP is used to carry routes. The 9500s are setup for ECMP and so are the Palo Altos. I feel like there is some weird traffic issues with this, Should the Palo Altos even be setup with ECMP? If so should I be using the symetrical return option? Would having ECMP on the Cat9500s be enough to achieve load sharing/balancing over each layer 3 link to each Palo Alto? Each cat 9500 has a layer 3 link to each Palo Alto. And yes before people tell me Active/Active is not a good idea I cant see why not when my network is symetrical. 

23 REPLIES 23

L0 Member

Could be this GUY?!?Could be this GUY?!?

 

L0 Member
 

Cyber Elite
Cyber Elite

@Stevenjwilliams83,

hould the Palo Altos even be setup with ECMP?

Depends on how many uplinks you have between the Palo Alto and the 9500s, and if the answer is more than one are you not utilizing simple AE interfaces? It sounds like you have two 9500s and each 9500 has a link to each active peer correct? If that's the case you could actually be introducing asymetrical return traffic depending on your configuration. 

 

If so should I be using the symetrical return option?

We'd need to know more about how your firewall is actually configured to give a yay or nay on this. From the rough outline that can be gathered from your post it sounds as though enforcing symetrical return traffic would be a good idea; but we would need more info about the actual config to be positive. 

 

Would having ECMP on the Cat9500s be enough to achieve load sharing/balancing over each layer 3 link to each Palo Alto?

I assume that you are talking about removing the ECMP configuration on the firewall? It would achieve a level of load balancing from your cores, but depending on the rest of your configuration this could cause issues.

 

And yes before people tell me Active/Active is not a good idea I cant see why not when my network is symetrical. 

With emphisis on the fact that you state your network is symetrical, this would actually be an instance where we would 100% not generally recommend someone utilize Active/Active outside of a handful of other factors. Active/Active is best deployed in a network where you have asymetrical traffic

 

 

Yes Each 9500 has a link to each Palo Alto so "criss-crossed" essentially. No ae interfaces being used just straight /30 routed links between 9500s and Palo Altos. So if I have vlan 10, 20, 30, and 40, 9500-01 has hsrp active for vlan 10,30 and 9500-02 has hsrp active for vlan 20,40 to use both swtiches rather then one sit idle. 

 

I to this day do not understand why you want active and active in an asymetrical network rather then symetrical? Also wouldnt you want to ultilize BOTH firewalls when you can?

@Stevenjwilliams83l

I'll be honest here, unless someone is able to look at a detailed map of how you have things configured along with the configuration on your firewalls it's impossible to say if things are configured correctly or where you could be introducing issues. The posts that I've seen and what you alude to in this post causes me to believe that you have something misconfigured and this is causing asymetrical return traffic from your firewalls. I could be wrong, but until someone can take a look at how you have everything confiugred we can only really take a guess at what could possibly be misconfigured. When you start talking about ECMP and Active/Active the first thing that comes to my mind is that you would really need to design this setup carefully to ensure you won't run into routing/pathing issues further down the road. 

 

I to this day do not understand why you want active and active in an asymetrical network rather then symetrical? 

Because utilizing Active/Passive firewalls in a situation with asymetrical return traffic would cause the traffic hitting the passive firewall to be dropped. This is eliminated in an Active/Active setup as the HA3 link will bring the traffic back to the origin firewall as long as you've configured things properly. The downside to Active/Active is that if you've poorly configured/designed the setup it can actually introduce asymetrical routing where you wouldn't have had any to being with. In addition every single use of Active/Active is intorducing configuration complexity that otherwise wouldn't be there.

Active/Active while being a supported configuration has a lot of gotchas and issues in its own right. It can be worked around and everything can be perfectly fine but it complicates the confiugration, support, and maintenance of the device. There is a reason that PAN users and support has long had a stance that Active/Active is best used in asymetrical environments, and it's largely to do with the fact that if it isn't designed and configured well you can introduce a lot of issues into your enviornment.

 

Also wouldnt you want to ultilize BOTH firewalls when you can?

Real world stats, utilizing both of these firewalls at the same time is seeing you net a performance gain of +25% in the best well designed scenario during peak workloads, while still properly allowing a single firewall to manage your entire environment in the event one of those units fail. Most environemnts that are symetrical would have no need to enable Active/Active for performance reasons unless they've failed to size their firewalls correctly or have outgrown the 7000 series chassis where this becomes a necessity. 

If you look at that stats on your firewall I would be willing to bet that your seeing a net-zero return by utilizing Active/Active over Active/Passive as far as performance goes. Which means that the only benefit Active/Active is actually giving you is near-instant failover times (again, depending on configuration) for the cost of increasing configuration complexity and the chance of introducing issues into your network. 

 

Thanks for the input. I will open a support case and have someone look into the configuration. 

The culporit in this whole thing is HSRP. The design of HSRP is flawed as each vlan has an active and standby hsrp peer. Because the vlan network has an SVI on each HSRP peer regardless of active or standby, the network is known as connected and tells this to each palo alto. So the palo alto sees each downstream HSRP peer as equal cost, so it at times sends return traffic for that said vlan to the standby HSRP peer and I assume the ARP and MAC address as some mismatches doing on since from the clients aspect the gateway address for the HSRP group sends a virtual mac, but when traffic returns the peers use their real mac and not the virtual. 

 

So I can fix this by just turning off ECMP on the palo altos which means I will use ecmp from the core 9500s out to the rest of the network but return traffic will only return on one path. BUT what I do not get is that if I am setting my ECMP with symmetric return why would it ingest traffic from the active hsrp peer and not send it back to the same peer as the reason for symmetric return is "return traffic on the same interface it was received on"

 

 

I was going to say it first but you beat me to it.  If you want to use ECMP then you need to ditch HSRP.  You can't use them together because HSRP by it's very nature forces routing to a single point.  Does that make sense?

 

PS - Active/Active is not designed to increase throughput (even though this can be a slight byproduct in the right scenario).  It is designed to handle asynchronous routing.  If you are basing your purchasing decision based on increased throughput, you will be dissappointed.  Right size your firewalls with the assumption that one of them is offline.

Ya what I figured out was that without any HSRP trickery or some kind of BGP attribute to influence routes . inbound, each palo is always going to choose Core SW01 due to lowest router-id. I guess I could migrate to GLBP or just stack the 9500s in VSS. 

ECMP should still work as long as you remove HSRP regardless of router ID for "Equal Cost Paths" only.

Not running active/active for increase performance because I know you get maybe 20% on that. I am using it because I have two active core switches and it becomes a load share game. Do ditch HSRP and use what for LAN side? thats the real challenge. 

 

 

MC-LAG?  Not sure the new 9500s support that.  Are you avoiding stacking them for any specific reason?

Cat9500s not stacked, no VSS, no Nexus core running active/active HSRP. So doesnt qualify. 

Now, I am fantasizing here, but I thought about running HSRP egress side to the palos and running palos with floating IP (essentially vrrp)....

  • 11764 Views
  • 23 replies
  • 1 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!