Failover methods Manual vs Link Down (traffic loss)

Trustnet · ‎11-29-2017

There are few triggers that could cause a failover in HA cluster.

I'm interested to understand the difference between manual (graceful) and a hard failover like Link Down.

In a matter of network traffic loss, is there a difference between Link monitoring triggered failover and a manual failover? Meaning, would the manual failover will cause less traffic loss than Link monitoring failover?

ET · ‎12-03-2017

Once the failover condition is met(failure is detected), the time it takes for failover should be same, manual or automatic. The time to detect the failure may be different depending on the type of the failure and your configuration if applicable in case of automatic failure.

Depending on the failure that creates the failover, automatic failover might cause more packet loss.

If the link failure happens on the firewall port, meaning that the port on the firewall is disconnected (not the remote port connected to switch router), I would say firewall would detect it fairly quickly. In such a case the failover should be triggered immediately, just like suspend.

If your network does not tolerate potential latency during failover, we would recommend you to do a failover test to make sure what would be the exact delay in case a failover triggered by link failure will happen. The overall end-user impact may be affected by surrounding L2/L3 network design as well.

But if you need to failover due to a maintenance activity we recommend to use suspend functionality. It is easier to control the failover via suspend and the failover will be initiated immediately.

View solution in original post

BPry · ‎11-29-2017

@Trustnet,

I'm not entirely sure what your question is, so this may not be the answer you are looking for exactly.

Manual Failover:

Manual failover would only need to be done if you are not setting up either Link Monitoring or Path monitoring. If you are going to encure the costs of having an Active/Passive system I'm not sure why you wouldn't have at least one monitoring profile in place, if not both.

Manual failover is not going to be something you really want to go with; by the time someone logs into the firewall to manually issue the suspend command you'll have already interrupted traffic to the organization, and that's if you notice it immediately.

Link Monitoring:

Link Monitoring is what I almost always see used by everyone if they only have one monitoring profile active. Link monitoring will do exactly what it sounds like, if the interface goes down it'll failover the traffic to the passive firewall.

You would setup Link Groups that specify the Group Failure condition, along with the interfaces. For example if ethernet1/2 was your inside link to your network cores you would likely want to failover to the passive firewall immeditely, you would likely do the same if you lost your DMZ link if you host external services. You might setup a lesser Link Group that would be set to a failure condition of 'all' if you have different zones for multiple different VPN connections.

The downside to the Link Monitoring is that unless it's paired with a Path Monitoring profile you'll never experience a failover if ethernet1/2 is still up, but traffic can not reach anything.

Path Monitoring:

Path monitoring is something that should realistically be a part of any HA setup, for the exact reasons that are mentioned above. Since Link Monitoring is simply monitoring the interface status, you will not experience a failover if the ability to reach a host across that link goes down without the interface itself showing as down as well.

Path Monitoring separates everything the same as Link Monitoring. So you can set it up that a failover event will take place if all Path Groups go down, or you can failover if any of the Path Groups go down. The advantage to the Path Group is that you can manually specify what Destination IP you are supposed to have access to.

As an example of Path Monitoring I have three different Path Groups configured on all of my firewalls where applicable. The first checks outside connectivity to things like Google's DNS servers, OpenDNS, and a few other addresses that I have direct control over; if all of these connections ever return they are down it will trigger a failover event. I have similar policies setup for my internal and DMZ links. The only precaution that you really need to take with Path Monitoring is that you don't want to have the Failure Condition set to 'Any' on a Path Group if you are monitoring servers that are not HA, as any maintenance would cause your firewall to failover as the Failure Condition would technically be met.

As far as a direct answer to the question of which one is faster, it's almost always going to be Link Monitoring if the link itself goes down. The possibility of you being able to not only get the alert the link is down, get logged into the firewall, and either suspend the device through the GUI or issue the request high-availability state suspend command through the CLI before the Link Monitoring profile automatically suspends the firewall is next to nothing. Manual interaction will always be slower than letting the device take care of things itself.

Trustnet · ‎11-30-2017

Hi

My question is the difference between failovers regarding TRAFFIC LOSS.

Is the traffic loss the same in manual and automatic failover?

BPry · ‎11-30-2017

@Trustnet,

A failover is a failover, regardless of whether it is triggered manually by yourself or automatically through the firewall HA monitoring the same commands are being issued. If there is any TRAFFIC LOSS it would be exactly the same regardless how the firewall failover is initiated.

That being said if you have user noticable traffic loss during a failover event, you need to evaluate your HA setup. If you follow best practices, depending on if you have L3 or L2 deployments, it shouldn't be noticable to your end-users that a failover event has even taken place.

Trustnet · ‎11-30-2017

So, based on what you are saying, assume I have the following scenario:

I need to change firewall connections to ports 1-4

So, I just disconnect them from the Primary (Active), replace and then the same for the secondary.

No need to manual failover before disconnection of the cables

BPry · ‎11-30-2017

@Trustnet,

Can you explain exactly what you are attempting to do in greater detail. It sounds like there is a much better way to do this, but I need to be sure of what exactly you are attempting to do before I start telling you to do something that isn't going to work.

Trustnet · ‎11-30-2017

Very simple: replace cable for port 1 on both members

BPry · ‎11-30-2017

@Trustnet,

Okay so this is a fairly simple failover event, and you would likely want to simply replace the cable on the passive unit, and then suspend the active firewall so that you can replace the cable on that unit when it isn't passing any traffic. You will just have to remember to go back and move the high-availability state to functional on the 'active' firewall that you suspended.

While removing port 1 would trigger a failover as long as it's in a Link Group that is actively being monitoring, manaul failover is recommended for maintenance as it isn't susceptable to any missconfiguration within the Link Monitoring or Link Group options, nor is it succeptable to poorly configured HA Timers that could interfere with HA event failover times.

Brandon_Wertz · ‎11-30-2017

@Trustnet wrote:
Very simple: replace cable for port 1 on both members

Like @BPry mentioned when you're working on an "active" device it's probably "best" to just perform a fail-over anyway.

That said depending on your config pulling one port out isn't necessarily going to cause a HA / fail-over scenario.

I have 4 links on my 5060s... 2 inside and 2 external links. In our HA config we'd need to lose both links in a single group to cause an automated HA event.

(This is from the "failure condition" ANY or ALL on Link Group. Coupled with the Link Monitoring of an "ANY." If this we're "ALL" then all 4 of my interfaces would need to go down in order to fail-over which wouldn't be a good thing.)

Trustnet · ‎12-03-2017

So I will ask again, assuming HA configured right and all timers are perfectly configured, Why would I prefer manual failover over automatic? (Again, assuming everything is working as expected - I'm not talking about any exceptional scenario )

ET · ‎12-03-2017

Once the failover condition is met(failure is detected), the time it takes for failover should be same, manual or automatic. The time to detect the failure may be different depending on the type of the failure and your configuration if applicable in case of automatic failure.

Depending on the failure that creates the failover, automatic failover might cause more packet loss.

If the link failure happens on the firewall port, meaning that the port on the firewall is disconnected (not the remote port connected to switch router), I would say firewall would detect it fairly quickly. In such a case the failover should be triggered immediately, just like suspend.

If your network does not tolerate potential latency during failover, we would recommend you to do a failover test to make sure what would be the exact delay in case a failover triggered by link failure will happen. The overall end-user impact may be affected by surrounding L2/L3 network design as well.

But if you need to failover due to a maintenance activity we recommend to use suspend functionality. It is easier to control the failover via suspend and the failover will be initiated immediately.

Unlock your full community experience!

Failover methods Manual vs Link Down (traffic loss)

Failover methods Manual vs Link Down (traffic loss)

Show your appreciation!