Tomorrow I'm cutting over a new pair of 3410's. I have 3 LAG connections (AE.1, AE.11, and AE.10). AE.11 is the physical connections to my ISP switch. There are two L3 sub interfaces (VLAN 800 & 801). VLAN 800 = ISP1 and VLAN 801 = ISP2. Both ISP routes are static and have the same metric / AD. I'm using ECMP and it works well from my testing.
When I fail over (using reboot to simulate power loss), the passive FW goes active immediately and if I'm lucky, I may see one PING packet drop on both ISP links. The failover is impressive.
When failing back to the primary FW, I lose ISP 2 for approximately 12 - 20 seconds. I've configured the election settings to "standard" and tried using 1min, 2min, and 5min for the Preemption Hold Time. "Preemptive" is checked on both FW's and the primary Device Priority is set to 100 & the Secondary is set to 200. It works well except when failing back to the primary / preempting / preemption, the ISP2 circuit drops for 12-20 seconds. The ISP1 circuit may drop 1 packet, but is more consistent.
ISP1 has a /24 interface bound to VLAN 800 (not crazy about this, but handed this situation).
ISP2 has a /30 interface bound to VLAN 801.
Both are L3 sub interfaces on the same LAG (AE11.801 and AE11.800).
I thought maybe LACP could be an issue, but if it was, it would impact both ISP's as they traverse the same LAG. The upstream ISP switches are a pair of Extreme 10/100/1G switches in an MLAG configuration using Extreme ELRP for loop detection and prevention.
Any advice other timers, tweaks, troubleshooting steps, etc is greatly appreciated. Both ISP's are static and not BGP. Both have the same AD and Metric (10 & 10).
I "think" this is layer2, eg STP, messing with settings and MAC address advertisement. Or it could be the hold timers on the PAN etc. Not sure on how the Extreme switches deal with failover and clearing mac tables, etc. but I know Cisco has a lag and I've just come to except ~1-2 minutes of downtime. This article kinda goes into how to prevent that due to keeping the passive interfaces in an 'UP' state.
Also check the following to see if it applies to your scenario:
Here is a link to a bunch of HA articles:
I hope this helps explain some of what you are seeing.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!