When we got out PanOS firewalls a few years back, we set them up with a single virtual router and PBF to handle our active/passive ISPs.
Time went on, and to support fancier topologies, such as fully redundant VPN connections between us an AWS, we moved to dual VRs: one default that holds all our routes AND the standby ISP, and one that pretty much just holds our primary ISP. We still use PBF in this scenario.
PBF is great, but how it handles traffic within PanOS vs behind the firewall has always broken my brain a bit. I did our current setup myself, and understood it fully in an ill-constructed mind palace for about a day :-) We use IPSec tunnels for all our site-to-site VPNs (AWS and remote sites). It believe everythign is working because the tunnels bypass the PBF and site there setup, but the traffic passing OVER the tunnels follows PBF, and therefore things work as expected when we lose a ISP.
Given it's 2018, we're running 8.0.8, and moving towards remote offices with dual ISPs and meshed site-to-site VPNs (four tunnels instead of two per remote site), and we're looking towards to ECMP on our ISPs as we consider beefing up the backup ISP at the main office, is there a better way to do this? And, for me the million dollar question, is it documented somewhere? =)
I've seen suggestions this can be probably be done with a three router setup, but I'm not sure if that's a smart idea: https://live.paloaltonetworks.com/t5/Configuration-Articles/Multi-Site-Dual-ISP-Redundant-Site-to-Si...
The immediate task is to setup those meshed tunnels between our main site and our first remote site with two ISPs. I'm going out to that site to turn up the new ISP next week, at which point I'll begin experimenting with how to get this up and running. Our AWS VPN tunnels use BGP, and our current site-to-site use static routing with different metrics to get stuff to fail over properly. Given what documentation I found, with the increased site-to-site complexity, I'll probaby try to use OSPF with different metrics.
Thanks all. Looking forward to any feedback, and will report what I find.
May I ask some questions, which hopefully help us to come up with the best solution in your situation:
Of course! Thank you for the response.
I've been trying to get back up to speed on my config, and I think the only way things are working right now is I'm using PBF to do the failover for the two ISPs at the main office for exnternal resources, but explicitly telling the PBF not to touch private (and therefor internal) IPs, letting the normal routing capability take care of that with.
All right. One question I forgot: will you route everything to your main location and from there to the internet or will every branch office have direct internet access over the PAN firewall?
Anyway let's start with a setup with static routing and an active-active config (the one i would prefer in your situation - assuming you don't need branch-to-branch office tunnels)
The following is the same for all the up to 6 firewalls:
Then you need to set up the configuration for all the vpn tunnels (20 on the main location):
On the main location you need a default route on the internal virtual router with the destination the external virtual router for the internet access. On the remote sites it depends on my question from the beginning of this post. If everything has to go to your main location you only need 4 default routes for the 4 tunnels with the same metrics on every remote firewall. If these locations have direct internet access then you need the same default route configuration as on the main location and routes for all internal networks that you want to route to the main location (every route 4 times).
My recommendation so far is based on the fact that you don't need branch-to-branch and that you will not have more than 5 locations. If there are 6 or 7 locations sometime the additional routing config will be doable, but if you see that you will have a lot more locations in the future there is probably now way around dynamic routing or another possibility would be Global Protect Large Scale VPN, but with that you cannot have active-active configurations utilizing both ISPs.
I hope this is somehow understandable. Feel free to ask again if something is not clear or if you have further questions.
So PBF takes place before the virtual router, i.e. if you have a route x.x.x.x/24 to destination y.y.y.y it will go this way no matter what you put into your virtual router. If you want to use your two ISP's and VPN's in an active/passive mode and use PBF, make sure you put a monitor on your PBF rules and a route into your virtual router for the backup path, i.e. all traffic flows down 1 and if that ISP/VPN goes down, then the PBF rule is bypassed and the virtual router takes over.
So I currently have a multipath setup for my remote sites where I use OSPF and multiple tunnels. the concept is the same as a PBF but OSPF handels the failover rather than the PBF monitor and virtual router default path. However I only use 1 virtual router to accomplish this rather than two since it becomes a pain with dynamic routing and multiple VR's. I control the traffic flow by using the OSPF interface metrics to make certain paths more desierable.
There is more to the design etc, but above are the cliffs notes. I dont think you really need 2 VR's to accomplish what you are looking for however. Maybe just the PBF rules Monitor and static default route to go down the other tunnel.
Hope that doesnt muddy the waters more.
@Otakar.Klier is right. My solution also works with 1 virtual router per firewall. The solution with two routers is just my personal preference how I would do it. This way you have external ant internal routing completely separated and if you start with static routing and want to change to dynamic in the future, you could simply enable the dynamic routing on the internal virtual router without touching the external one which still handles both ISPs with ECMP.
Thank you so much for the feedback.
To answer the outstanding question: branch offices access the internet directly, and the tunnels just carry traffic destined for the home office.
Most of this makes sense to me at a high level so far. I'm tempted to use OSPF for the tunnels rather than static routing as there are quite a few routes to send back from the branch locations to the home office, and I can use tunnel metrics to pick the best tunnel.
I'm curious: in the two-VR design, why does ECMP need to be on both VRs?
We currently have two virutal routers: one with the primary (active) ISP, and one with the secondary (passive) ISP and all the other routes. Please reality check my thoughts on this: I did this so that the IPSEC tunnels on primary and secondary tunnels could be up at the same time, and therefore failover faster.
Does the same requirement for multiple routers hold true with ECMP to keep the tunnels active, or is it different now?
I'm curious: in the two-VR design, why does ECMP need to be on both VRs?
I have to admit that with my previous assumption ECMP was only needed on the main location for both VRs. But with the direct internet access now I would use it really on all VRs. So let's take a remote site as example. There you have 2 ISPs on the external router and 4 tunnels on the internal router (and of course the internal interface (s)). So to use both ISPs actively for internet connections ECMP is required to make that possible. And from the view of the internal VR you have 4 links towards your main location, so you need ECMP also there to use all these 4 links for connections towards your main location. Hope this makes sense.
Using OSPF works perfectly fine for this. Also in combination with ECMP to utilize all available links from and to every location (from just a technical view this active-active setup is just the "cooler" solution as it would be also recommended to size the uplinks big enough that you won't have problems if one of them is down for a longer period of time. But this of course is up to you, if you simply need redundancy and in case of a problem it is not a big deal to use "half" the max bandwidth, then it simply is cheaper with active active as you don't need that much bandwidth.
If you don't want to use this all-active setup, I think you should also check out Global Protect Large Scale VPN because of the following reasons:
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!