10-19-2013 06:33 AM
I have two PAN devices that are configured in HA active/passive mode (at the moment in a lab environment, but to simulate an imminent deployment whereby the PAN devices will connect to an ISP router and swop routes via BGP).
The pair are configured for link monitoring, so that if either connection into the trusted or untrusted networks is lost then a failover to the other PAN in the pair occurs. The lab BGP router (Cisco device) is configured with some loopback addresses to simulate Internet addresses that it may advertise.
If I configure static routes for communication from the trusted network to the loopback addresses, pings happen freely and if I remove one of the links on the active PAN the firewalls failover and only one ping is lost on average.
If I remove the statics and have the trusted networks and loopback addresses advertised via BGP, then pings work successfully between the two points (I can see that BGP is working and that the correct routes are being advertised between the PAN and the lab BGP router), however when I remove one of the links on the active PAN my lab BGP router receives a 'neighbor down Peer closed the session message'. This then means that the now newly active PAN and the lab BGP router need to setup a new BGP session and advertise routes between one another. This setup is taking about 50 seconds from the moment the cable is pulled to the newly active PAN and lab BGP creating their new BGP session and routes being swopped - pings then start becoming active again. I have the BGP holdtime configured as 3 seconds on the PANs and the BGP router.
My question is: is this expected behaviour? I was expecting that the failover would be relatively seamless and that the BGP sessions would swop between the PANs and so stay established. If this is expected behaviour what are the optimum settings, so that the BGP sessions can be re-esatblished as quickly as possible?
10-21-2013 02:00 AM
Yes, it is an expected behaviour.
This is due to floating ip's used in A/P. Data-plane Interfaces on Passive node are down (logically and/or physically), so BGP peering needs to be re-established and route exchanged after failover.
You can alleviate this by using graceful restart for instance.
But 50 secs is an unexpected high delay for convergence, even without graceful restart configured...
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!