Active-passive HA with BGP to 2 ISPs, BFD + graceful restart

PetGoh1 · ‎07-03-2023

Hi, Anyone ever configured BGP + BFD + Graceful restart, trying to do this setup but not sure if there is any timers to ensure below. Can't find anywhere on any knowledge base.

1. when ISP link go down - BFD ensure seconds failover, ISP gateways are on same subnets attached through switches to Palo

2. when firewall failover - the BGP sessions should be maintained and session table not cleared by using graceful restart

BGP hello - 15 seconds, 45 seconds dead timer

BFD - 600 ms so expecting 1.8 seconds for it to detect failed BGP link and make the other gateway

Graceful restart - default at the moment.

What we are seeing for ISP failover, the default route is still being maintained much longer than 1.8 seconds. 24 pings packets before the default route to gateway that is down removed from routing table. 😕

Wonder whether anyone done this setup/similar with BFD + graceful restart and whether graceful restart is what preventing BFD to do its work?

PetGoh1 · ‎07-04-2023

just to share more, guess no one doing this setup..

1. when enabled with graceful restart, it seemed the BFD do not clear/flush the routing table until the graceful restart timeout exceeded.

disabled graceful restart will result in 1 ping lost when we failover from one internet gateway to another through BFD detection of BGP links.

Question still remain as to whether it is possible to have bfd + graceful restart namely. Maybe have graceful restart timer tweaked. Raised TAC case, they have lab that they can test it out. Would be good if they can come up with knowledgebase on timers + recommendation of how to get this done.

Hopefully someone have done similar setup to this. Would appreciate if you can share your thoughts.

cheers

seb_rupik · ‎07-06-2023

Hi there,

I'd need to lab this up too, but the PA knowledgebase does give this piece of advice:

Configure BFD (paloaltonetworks.com)

cheers,

Seb.

PetGoh1 · ‎09-21-2023

Hi Seb,

Thanks heaps for replying and for the link, yes read that already and did tweaked the timers to 500ms and 3x so that 3 missed timers (1.5seconds) will make the link failed over. That worked fine when we have any of the 2 bgp links failed over.

Separately when configured BFD works and Graceful restart works

Graceful restart is on control plane and function to tell the other end that it is about to move from active firewall to passive. This is so the far end will not clear the routing table and continue to send traffic across. That way the failover from one firewall to another is seamless. Minimal outage.

BFD is on data plane and detect when the other side is actually dead and quickly move to the secondary link by clearing the routing related to the primary link. Setting it to 500ms x3 tries means you have 1.5 seconds outage only. Fast failover to backup link.

The answer I want to figure out is if Palo Alto support the combination of the two (best of both world) but did not see any knowledge base or document that say, combining them will work or not.

when I configured it on our environment.

Test done is as follow
1. bring down the bgp peering by physically disconnecting the remote router

Expectation
BFD will detect that BGP peer is gone after 1.5 seconds (3 x 500ms) and clear routing table leaving the secondary link's default route (local pref of 100 vs primary that we set to 300) will now be only route available and traffic flows after 1.5 second.
I am expecting Palo Alto firewalls to say hmmm, i did not hear from remote peer on control plane tell meit is in process of failing over, so must be a link outage and I should trust BFD and clear my routing table and make the secondary link active.

But no... Palo seemed to wait for 120 seconds before clearing the routing table..... as if saying, I did not hear from you but I will assume you must be doing a failover and will wait so you can gracefully failover.. instead of trusting BFD

Actual result because of that..
default route from primary bgp peer that go down stays on for the duration of the 120 seconds (default graceful restart timer)

Moving forward
At the moment since Palo TAC never came back with solution, will just disabled the graceful restart. Will enable when we upgrade the firewalls. namely...living with possible outage when the firewall failed over for whatever reason as remote end will clear their routing table and tear down the BGP. Not ideal.. 😕

cheers

Pet

Unlock your full community experience!

Active-passive HA with BGP to 2 ISPs, BFD + graceful restart

Active-passive HA with BGP to 2 ISPs, BFD + graceful restart

Show your appreciation!