BGP in a cluster deployment

santonic · ‎04-04-2011

hi!

I was wondering how to use BGP in a HA active/standby deployment? a common design with floating IP addresses (HSRP/VRRP like) is to use two additional switches to connect to two upstream ISPs so a link failure doesn't result in an active member takover within the cluster. can you please help me understanding how routing tables are synchronised between cluster members? should we use iBGP like in a typical CE deployment?

BR, Andrej

kbrazil · ‎04-04-2011

Hi Andrej,

Attached is a document with some scenarios I came up with. I have highlighted some that I would recommend over others, though the final recommendation depends on many factors. I have not tested all of these, but in theory it seems they should work fine. I listed some prerequisites and pros/cons in the scenarios. There are many ways to skin a cat with BGP.

Hope this helps!

Cheers,

Kelly

View solution in original post

kbrazil · ‎04-04-2011

Hi Andrej,

Attached is a document with some scenarios I came up with. I have highlighted some that I would recommend over others, though the final recommendation depends on many factors. I have not tested all of these, but in theory it seems they should work fine. I listed some prerequisites and pros/cons in the scenarios. There are many ways to skin a cat with BGP.

Hope this helps!

Cheers,

Kelly

ncampagna · ‎04-04-2011

Hi Andrej,

To address the HA part of the question, we will sync the forwarding table to the passive device. The packet forwarding decisions made by the active device (before failure) will be the same as those made by the passive device once it takes over. A failover does not result in any loss of service due to routing. The routing table (and BGP protocol state) must be built up on the passive device when it becomes active. Graceful restart is recommended.

Thanks,

Nick Campagna

panman · ‎08-10-2012

Hi kbrazil-

Regarding Dual box multi-homed bgp config – full mesh, HA cluster (Active/Passive) on page 6 of your document...we are currently using that design on our production Palo cluster deployed just a few weeks ago. I'm interested in what you mean by 'GRES for BGP failover' as one of the pros/cons, could you explain more how that works?

The best peering setup that I have come up with for the above design is:

ISP A and ISP B peers announce the summary routes of their owned prefixes, plus a default route.
We announced our AS prefix (public /24) to both ISPs.
An import route map sets a higher weight on learned routes from ISP A than those from ISP B. This installs on the Palo only the default route (0.0.0.0/0) from ISP A, and will install the ISP B default route only if ISP A peering fails.

The result is:

Externally sourced, ingress traffic bound for our AS arrives at the Palo on the closest AS path via either ISP.
Internally sourced, egress traffic bound for prefixes announced by either ISP will egress via that same ISP peer (closest AS path).
All other internally sourced, egress traffic will egress on the Palo's default route.

I am debating whether there would be any benefits to removing the default route method, and instead route internally sourced egress traffic out across both ISPs with some type of load balancing mechanism. This could theoretically double our ISP bandwidth among other benefits. Any ideas on what the best BGP path selection algorithm would accomplish this? Also I'm thinking that there would need to be some type of mechanism on the Palo for source and destination IP address binding; basically to ensure sessions route symmetrically, that is, the packets flow ingress and egress on the same path.

On the other hand I am thinking that the only real improvement to my current set up listed above would be to have both ISPs announce full internet BGP tables, and remove the default route, bu that comes with another set of responsibilities...

Anyone have input on this BGP routing setup?

kbrazil · ‎01-16-2013

Hi Missouririver,

Sorry I didn't see this post... better late than never.

GRES is shorthand for Graceful Restart for BGP. This would allow the upstream ISP to continue forwarding traffic during a failover while the management plane BGP sessions are established between the ISP routers and the newly active firewall. Otherwise the forwarding would temporarily stop as the ISP notices the BGP session has reset. In an Active/Passive configuration the ISP router will not know it is talking to a different firewall and will just assume the TCP BGP session dropped and restarted.

I think receiving only a default route or a small set of partial routes from the ISP is the best option. Though the BGP process can support tens of thousands of prefixes, the forwarding table is limited to 20,000 routes or less, depending on the firewall platform.

You can ensure some symmetry with source NAT for outbound initiated traffic. For inbound initiated traffic it may be more difficult to achieve. Maybe the new 'return-to-sender' feature in PAN-OS 5.0 PBF can help here.

It's hard to say what is more beneficial - keeping as much symmetry as possible vs. load sharing the outbound. It just depends on the type of traffic going over the links and how different the paths are for the session.

Hope that helps.

Cheers,

Kelly

panman · ‎01-21-2013

Thanks Kelly. Yep, better late than never

Yes I do remember now that configuring Nonstop Forwarding will allow a route process to continue processing and forwarding session packets upon a peer failure or stateful failover.

So in your reply, am I understanding correctly that when a failover occurs on the active/passive PAN pair, the BGP routing instance (and any dynamic protocol) is reset and actually has to re-converge once the passive firewall takes over the "high-availability state functional" status? Nick's reply (Apr 4, 2011 12:40 PM) above contradicts this. Or, are we all saying the same thing, that if the PANs and upstream BGP peers at the ISPs are NSF aware (by use of GRES), then BGP forwarding will be completely uninterrupted by a PAN failover event. Gosh I can't remember if 'Graceful Restart' was enabled by default (in the VR's BGP>Advanced tab) on the PANs when we deployed PANs...either that or I did enable NFS awareness with our ISPs at the time, as you had suggested to do so in your document.

Just for fun, can you elaborate more on your "management plane BGP sessions" comment. I'm still confused on the PAN functionality of handling routing processes on the management plane vs data plane, and that hardware offload for packet processing (which is perhaps totally unrelated to the planes)?

I've concluded the same as you regarding a BGP default plus partial tables, as the best option for small/medium enterprise.

Josh

kbrazil · ‎01-21-2013

Hi Josh,

You are correct - the routing protocols will reconverge and the routing table will be rebuilt on the management plane, but the forwarding table on the data plane is sync'd between A/P HA units. This allows Graceful Restart to work - even though the routing table is being rebuilt, the forwarding table is still correct so any packets sent to the firewall will still be correctly forwarded during that time. It looks like Nick was saying the same thing in a different way.

Graceful Restart basically signals the management planes (or routing engines) of the routers/firewalls to ignore the reconvergence and keep sending packets to the neighbor for a certain period of time. Since they both support Graceful Restart, they both know the forwarding tables are still capable of forwarding packets correctly during the short time it takes for the protocol to set up.

This really works best on routers or firewalls that have separate management and data planes like all of the PA-series. Basically the routing protocol daemons (OSPF, BGP, RIP, etc.) run on the management plane and build the routing table there. Think of the management plane as a regular Linux workstation. This is also where all redistribution of routes between protocols and route filtering happens. The routing table is then copied to the data plane where the physical next-hop for each destination is recorded. This new table is called the forwarding table and is how the hardware makes fast packet-by-packet forwarding decisions. The forwarding table is refreshed from the routing table regularly. The data plane is not like a regular Linux workstation, but contains a lot of custom silicon for fast handling of data packets. This is where the hardware offload for packet processing happens, though some software handling of packets also happens in the data plane. No packets are ever forwarded through the management plane.

I believe Graceful Restart is configured by default for BGP on the PA-series.

Hope that helps.

Cheers,

Kelly

Unlock your full community experience!

BGP in a cluster deployment

BGP in a cluster deployment

Show your appreciation!