Active-passive HA with BGP to 2 ISPs, BFD + graceful restart

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

Active-passive HA with BGP to 2 ISPs, BFD + graceful restart

L1 Bithead

Hi, Anyone ever configured BGP + BFD + Graceful restart, trying to do this setup but not sure if there is any timers to ensure below. Can't find anywhere on any knowledge base. 

1. when ISP link go down - BFD ensure seconds failover, ISP gateways are on same subnets attached through switches to Palo

2. when firewall failover - the BGP sessions should be maintained and session table not cleared by using graceful restart

 

BGP hello - 15 seconds, 45 seconds dead timer

BFD - 600 ms so expecting 1.8 seconds for it to detect failed BGP link and make the other gateway 

Graceful restart - default at the moment. 

 

What we are seeing for ISP failover, the default route is still being maintained much longer than 1.8 seconds. 24 pings packets before the default route to gateway that is down removed from routing table. 😕

Wonder whether anyone done this setup/similar with BFD + graceful restart  and whether graceful restart is what preventing BFD to do its work?

3 REPLIES 3

L1 Bithead

just to share more, guess no one doing this setup..

1. when enabled with graceful restart, it seemed the BFD do not clear/flush the routing table until the graceful restart timeout exceeded. 

 

disabled graceful restart will result in 1 ping lost when we failover from one internet gateway to another through BFD detection of BGP links. 

Question still remain as to whether it is possible to have bfd + graceful restart namely. Maybe have graceful restart timer tweaked. Raised TAC case, they have lab that they can test it out. Would be good if they can come up with knowledgebase on timers + recommendation of how to get this done. 

Hopefully someone have done similar setup to this. Would appreciate if you can share your thoughts.

 

cheers

 

 

L4 Transporter

Hi there,

I'd need to lab this up too, but the PA knowledgebase does give this piece of advice:

seb_rupik_0-1688648936148.png

Configure BFD (paloaltonetworks.com)

 

cheers,

Seb.

Hi Seb,

 

Thanks heaps for replying and for the link, yes read that already and did tweaked the timers to 500ms and 3x so that 3 missed timers (1.5seconds) will make the link failed over. That worked fine when we have any of the 2 bgp links failed over. 

Separately when configured BFD works and Graceful restart works

 

Graceful restart is on control plane and function to tell the other end that it is about to move from active firewall to passive. This is so the far end will not clear the routing table and continue to send traffic across. That way the failover from one firewall to another is seamless. Minimal outage.

 

BFD is on data plane and detect when the other side is actually dead and quickly move to the secondary link by clearing the routing related to the primary link. Setting it to 500ms x3 tries means you have 1.5 seconds outage only. Fast failover to backup link.

 

The answer I want to figure out is if Palo Alto support the combination of the two (best of both world) but did not see any knowledge base or document that say, combining them will work or not.

 

when I configured it on our environment.


Test done is as follow
1. bring down the bgp peering by physically disconnecting the remote router

 

Expectation
BFD will detect that BGP peer is gone after 1.5 seconds (3 x 500ms) and clear routing table leaving the secondary link's default route (local pref of 100 vs primary that we set to 300) will now be only route available and traffic flows after 1.5 second.
I am expecting Palo Alto firewalls to say hmmm, i did not hear from remote peer on control plane tell meit is in process of failing over, so must be a link outage and I should trust BFD and clear my routing table and make the secondary link active.


But no... Palo seemed to wait for 120 seconds before clearing the routing table..... as if saying, I did not hear from you but I will assume you must be doing a failover and will wait so you can gracefully failover.. instead of trusting BFD

 

Actual result because of that.. 
default route from primary bgp peer that go down stays on for the duration of the 120 seconds (default graceful restart timer)

 

Moving forward
At the moment since Palo TAC never came back with solution, will just disabled the graceful restart. Will enable when we upgrade the firewalls. namely...living with possible outage when the firewall failed over for whatever reason as remote end will clear their routing table and tear down the BGP. Not ideal.. 😕 

 

cheers

Pet

  • 1654 Views
  • 3 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!