Multi site dual-isp with redundant VPN connections: PBF vs alternatives?

uvdes · ‎03-13-2018

When we got out PanOS firewalls a few years back, we set them up with a single virtual router and PBF to handle our active/passive ISPs.

Time went on, and to support fancier topologies, such as fully redundant VPN connections between us an AWS, we moved to dual VRs: one default that holds all our routes AND the standby ISP, and one that pretty much just holds our primary ISP. We still use PBF in this scenario.

PBF is great, but how it handles traffic within PanOS vs behind the firewall has always broken my brain a bit. I did our current setup myself, and understood it fully in an ill-constructed mind palace for about a day 🙂 We use IPSec tunnels for all our site-to-site VPNs (AWS and remote sites). It believe everythign is working because the tunnels bypass the PBF and site there setup, but the traffic passing OVER the tunnels follows PBF, and therefore things work as expected when we lose a ISP.

Given it's 2018, we're running 8.0.8, and moving towards remote offices with dual ISPs and meshed site-to-site VPNs (four tunnels instead of two per remote site), and we're looking towards to ECMP on our ISPs as we consider beefing up the backup ISP at the main office, is there a better way to do this? And, for me the million dollar question, is it documented somewhere? 😃

I've seen suggestions this can be probably be done with a three router setup, but I'm not sure if that's a smart idea: https://live.paloaltonetworks.com/t5/Configuration-Articles/Multi-Site-Dual-ISP-Redundant-Site-to-Si...

The immediate task is to setup those meshed tunnels between our main site and our first remote site with two ISPs. I'm going out to that site to turn up the new ISP next week, at which point I'll begin experimenting with how to get this up and running. Our AWS VPN tunnels use BGP, and our current site-to-site use static routing with different metrics to get stuff to fail over properly. Given what documentation I found, with the increased site-to-site complexity, I'll probaby try to use OSPF with different metrics.

Thanks all. Looking forward to any feedback, and will report what I find.

Remo · ‎03-13-2018

Hi @uvdes

May I ask some questions, which hopefully help us to come up with the best solution in your situation:

How many remote offices do you need tp set up?
Do you have your own public IP range which is reachable over both ISPs in your main location?
On the remote sites do you have routers or also PAN firewalls?
What would you prefer for your new setup: stay with active/passive ISPs or have as much as possible an active/active setup?
What do you prefer: static or dynamic routing?

uvdes · ‎03-13-2018

Of course! Thank you for the response.

Currently one remote office, probably never more than 5
We have our own reachable public ISP range for both ISPs at the main office, as well as the ISPs at the remote office
PAN firewalls at current remote site. Will probably stay this way with more remote sites.
I suppose my preference is active/active, but at the moment it's not strong, as we haven't fully commited to using dual active/active ISPs, and we are currently active/passive
I have little experience, but no problem with dynamic protocols, especially if they make this multip-isp multip-vpn setup go more smoothly with less room for human error. We're routing maybe 10-15 subnets. I HAD to use BGP to make AWS work.

uvdes · ‎03-13-2018

I've been trying to get back up to speed on my config, and I think the only way things are working right now is I'm using PBF to do the failover for the two ISPs at the main office for exnternal resources, but explicitly telling the PBF not to touch private (and therefor internal) IPs, letting the normal routing capability take care of that with.

Remo · ‎03-15-2018

Hi @uvdes

All right. One question I forgot: will you route everything to your main location and from there to the internet or will every branch office have direct internet access over the PAN firewall?

Anyway let's start with a setup with static routing and an active-active config (the one i would prefer in your situation - assuming you don't need branch-to-branch office tunnels)

The following is the same for all the up to 6 firewalls:

2 virtual routers: 1 for the ISP interfaces and one for the internal and tunnel interfaces
2 default routes with integrated route monitoring on the external virtual router
ECMP enabled on both virtual routers

Then you need to set up the configuration for all the vpn tunnels (20 on the main location):

I recommend to configure all tunnels with transportnetworks (/30). These IPs you need for the route monitoring later
Assign at least the same zone per location to all tunnel interfaces. This is required for active active use of the connections, because otherwise the firewall probably droos most of the traffic because of asymetric routing (or if you haven't enabled that in the zone protection you don't need firewallrules for both directions)
As already mentionned assign all tunnel interfaces to the internal virtual router
Set up all the IPSec configurations
Create for each location 4 static routes with equal metrics. Every route points into one tunnel and configure also the route monitoring for each route - for this you now need the tunnelinterface IP addresses. This monitoring is needed that a failed tunnel isn't used any longer

On the main location you need a default route on the internal virtual router with the destination the external virtual router for the internet access. On the remote sites it depends on my question from the beginning of this post. If everything has to go to your main location you only need 4 default routes for the 4 tunnels with the same metrics on every remote firewall. If these locations have direct internet access then you need the same default route configuration as on the main location and routes for all internal networks that you want to route to the main location (every route 4 times).

My recommendation so far is based on the fact that you don't need branch-to-branch and that you will not have more than 5 locations. If there are 6 or 7 locations sometime the additional routing config will be doable, but if you see that you will have a lot more locations in the future there is probably now way around dynamic routing or another possibility would be Global Protect Large Scale VPN, but with that you cannot have active-active configurations utilizing both ISPs.

I hope this is somehow understandable. Feel free to ask again if something is not clear or if you have further questions.

Regards,

Remo

OtakarKlier · ‎03-15-2018

Hello,

So PBF takes place before the virtual router, i.e. if you have a route x.x.x.x/24 to destination y.y.y.y it will go this way no matter what you put into your virtual router. If you want to use your two ISP's and VPN's in an active/passive mode and use PBF, make sure you put a monitor on your PBF rules and a route into your virtual router for the backup path, i.e. all traffic flows down 1 and if that ISP/VPN goes down, then the PBF rule is bypassed and the virtual router takes over.

https://live.paloaltonetworks.com/t5/Configuration-Articles/How-to-Configure-a-Palo-Alto-Networks-Fi...

So I currently have a multipath setup for my remote sites where I use OSPF and multiple tunnels. the concept is the same as a PBF but OSPF handels the failover rather than the PBF monitor and virtual router default path. However I only use 1 virtual router to accomplish this rather than two since it becomes a pain with dynamic routing and multiple VR's. I control the traffic flow by using the OSPF interface metrics to make certain paths more desierable.

There is more to the design etc, but above are the cliffs notes. I dont think you really need 2 VR's to accomplish what you are looking for however. Maybe just the PBF rules Monitor and static default route to go down the other tunnel.

Hope that doesnt muddy the waters more.

Regards,

Remo · ‎03-15-2018

Hi @uvdes

@OtakarKlier is right. My solution also works with 1 virtual router per firewall. The solution with two routers is just my personal preference how I would do it. This way you have external ant internal routing completely separated and if you start with static routing and want to change to dynamic in the future, you could simply enable the dynamic routing on the internal virtual router without touching the external one which still handles both ISPs with ECMP.

Regards,

Remo

uvdes · ‎03-15-2018

Thank you so much for the feedback.

To answer the outstanding question: branch offices access the internet directly, and the tunnels just carry traffic destined for the home office.

Most of this makes sense to me at a high level so far. I'm tempted to use OSPF for the tunnels rather than static routing as there are quite a few routes to send back from the branch locations to the home office, and I can use tunnel metrics to pick the best tunnel.

I'm curious: in the two-VR design, why does ECMP need to be on both VRs?

uvdes · ‎03-15-2018

We currently have two virutal routers: one with the primary (active) ISP, and one with the secondary (passive) ISP and all the other routes. Please reality check my thoughts on this: I did this so that the IPSEC tunnels on primary and secondary tunnels could be up at the same time, and therefore failover faster.

Does the same requirement for multiple routers hold true with ECMP to keep the tunnels active, or is it different now?

Remo · ‎03-15-2018

Hi @uvdes

@uvdeswrote:
I'm curious: in the two-VR design, why does ECMP need to be on both VRs?

I have to admit that with my previous assumption ECMP was only needed on the main location for both VRs. But with the direct internet access now I would use it really on all VRs. So let's take a remote site as example. There you have 2 ISPs on the external router and 4 tunnels on the internal router (and of course the internal interface (s)). So to use both ISPs actively for internet connections ECMP is required to make that possible. And from the view of the internal VR you have 4 links towards your main location, so you need ECMP also there to use all these 4 links for connections towards your main location. Hope this makes sense.

Using OSPF works perfectly fine for this. Also in combination with ECMP to utilize all available links from and to every location (from just a technical view this active-active setup is just the "cooler" solution as it would be also recommended to size the uplinks big enough that you won't have problems if one of them is down for a longer period of time. But this of course is up to you, if you simply need redundancy and in case of a problem it is not a big deal to use "half" the max bandwidth, then it simply is cheaper with active active as you don't need that much bandwidth.

If you don't want to use this all-active setup, I think you should also check out Global Protect Large Scale VPN because of the following reasons:

No static routing needed. It is possible to configure the satelites to tell their local networks to the main location when the connection is established
Redundandancy is easy to set up. If one uplink fails on any location, Global Protect automatically connects the other way
Simple setup if you once have even more locations. No additional configuration (IPSec tunnels (and routing in case of static) required on the main firewall if you add new locations to the setup. Simply configure the Global Protect Portal to the new branch firewall, connect an internet uplink and you're done. (Works also when you don't have static IPs from your ISP)

Regards,

Remo

Remo · ‎03-15-2018

@uvdeswrote:

Does the same requirement for multiple routers hold true with ECMP to keep the tunnels active, or is it different now?

I don't see a requirement for more than one virtual router. (Yes, I would use 2, but just to separate internal and external routing). Actually I don't really see a technical reason why you use 2 VRs today, as it is also possible to keep all tunnels up and running (with tunnel monitoring and/or PBF monitoring).

In any case with PAN-OS 8 and the route monitoring there is no need for PBF rules in your case.

uvdes · ‎03-15-2018

Thank you all for all the information. I'll let you know how things work out next week!

uvdes · ‎04-01-2018

I wanted to give a report on how this all went.

I ended up going active-standby on the ISPs since while the primary ISPs are symetric upload/download, the secondaries aren't, which makes balancing a challenge with ECMP.

I used two virtual routers in the active-standby ISP config to keep all ipsec tunnels up and running all the time. When I tried to use a single VR, I could only have one active default route at a time, so the standby tunnels weren't up and running. With two VRs, all the tunnels are up and running all the time. I have one main VR when most everything terminates including ispec tunnels and the primary ISP, and another VR where just the standby ISP terminates. In the main VR I setup path monitoring for the primary ISP default route, and if it drops, have a default route with a higher metric that goes to the next VR.

I also used OSPF to handle the tunnels. It worked really well, was relatively easy, and handles tunnel failover very nicely, as well as saving me from entering a ton of static routes.

Thanks for all your help!

Unlock your full community experience!

Multi site dual-isp with redundant VPN connections: PBF vs alternatives?

Multi site dual-isp with redundant VPN connections: PBF vs alternatives?

Show your appreciation!