BGP failover with a HA active/passive pair?

dimensiondavid · ‎10-19-2013

Hello All,

I have two PAN devices that are configured in HA active/passive mode (at the moment in a lab environment, but to simulate an imminent deployment whereby the PAN devices will connect to an ISP router and swop routes via BGP).

The pair are configured for link monitoring, so that if either connection into the trusted or untrusted networks is lost then a failover to the other PAN in the pair occurs. The lab BGP router (Cisco device) is configured with some loopback addresses to simulate Internet addresses that it may advertise.

If I configure static routes for communication from the trusted network to the loopback addresses, pings happen freely and if I remove one of the links on the active PAN the firewalls failover and only one ping is lost on average.

If I remove the statics and have the trusted networks and loopback addresses advertised via BGP, then pings work successfully between the two points (I can see that BGP is working and that the correct routes are being advertised between the PAN and the lab BGP router), however when I remove one of the links on the active PAN my lab BGP router receives a 'neighbor down Peer closed the session message'. This then means that the now newly active PAN and the lab BGP router need to setup a new BGP session and advertise routes between one another. This setup is taking about 50 seconds from the moment the cable is pulled to the newly active PAN and lab BGP creating their new BGP session and routes being swopped - pings then start becoming active again. I have the BGP holdtime configured as 3 seconds on the PANs and the BGP router.

My question is: is this expected behaviour? I was expecting that the failover would be relatively seamless and that the BGP sessions would swop between the PANs and so stay established. If this is expected behaviour what are the optimum settings, so that the BGP sessions can be re-esatblished as quickly as possible?

nbilly · ‎10-21-2013

Hello,

Yes, it is an expected behaviour.

This is due to floating ip's used in A/P. Data-plane Interfaces on Passive node are down (logically and/or physically), so BGP peering needs to be re-established and route exchanged after failover.

You can alleviate this by using graceful restart for instance.

But 50 secs is an unexpected high delay for convergence, even without graceful restart configured...

-Nicolas

Unlock your full community experience!

BGP failover with a HA active/passive pair?

BGP failover with a HA active/passive pair?

Show your appreciation!