Layer 3 High Availability with Optimal Failover Times Best Practices

Layer 3 High Availability with Optimal Failover Times Best Practices

74617
Created On 09/25/18 17:41 PM - Last Modified 07/14/20 00:45 AM


Symptom


  • When deploying a Palo Alto Networks (PAN) HA pair in L3 there are some considerations that should be taken into account to achieve the most optimal failover time.
 
  • The Palo Alto Firewall Series supports an active/passive configuration of two devices.  The active device continuously synchronizes its configuration and session information with the passive device over two dedicated interfaces and, in the event of a hardware or software disruption on the active firewall, the passive firewall becomes active automatically without loss of service.  The time it takes for the surrounding devices to begin forwarding traffic to the new active unit is the bottleneck to achieving optimal failover times.

 

  • There are two areas of configuration that need to be considered when trying to achieve the shortest failover time in a L3 HA deployment: the PAN HA configuration and the adjacent switch configuration. 


Environment


  • PA-2000 series
  • PA-4000 series


Cause


PAN HA configuration considerations
  • An important fact to consider in designing an HA architecture is that the traffic handling links on the passive device default to a down state, and therefore upstream and downstream devices connected to the passive device will not see a valid path unless the passive PAN firewall becomes active. A configuration option was added in PAN-OS version 2.0.x to force the interfaces to always be active on the passive device. This option only takes affect if the interface(s) are L3. This capability allows the adjacent device to not have to go through a port transition when there is a failover. This setting is configured under Device > High Availability > Election settings. The Passive Link State defaults to shutdown and should be set to auto, if it is desired to have the link status on the passive device to be forced up.

 

Note: The other two considerations on the PAN firewall are the values configured for the Passive Hold Timer and the Hello interval. These two settings are configured on the HA Election settings page.
 

 

  • The Hello Interval is the time interval between heartbeat packets that are sent to verify that the opposing firewall is operational. The minimum value for hello interval is 1000 milliseconds (1 second) on the PA-4000 series, and 8000 milliseconds on the PA-2000 series. Setting the value to the minimum is recommended to achieve optimal failover times. When there is a loss of 3 hello messages the adjacent firewall is declared to be down and the passive device will become active.

 

  • The Passive hold down timer is the amount of time the passive device waits when the active device is declared to be down (either because of a loss of heartbeat packets or a link or path monitoring failure) before switching to the Active state.  Setting this value to 0 will trigger an immediate switchover when the failure is detected. The best practice is to set this value to a nominal value to introduce some delay so that the surrounding devices can stabilize the state changes. Recommendations: on PA-2000 series, set passive hold down timer to 2000 ms; on PA-4000 series, set passive hold down time to 0. This will achieve optimal failover time.

 

Note: In a L3 deployment the PAN device will issue a gratuitous ARP when a failover occurs; this will populate the surrounding devices forwarding tables so that traffic will be forwarded to the newly activated device.
 

 

  • Implementing link monitoring is also a best practice when trying to optimize failover times. When this is configured the loss of a physical link will trigger a switchover from active to passive. If an adjacent device goes down or there is a physical connection issue the HA switchover will be immediate.  The time the passive device takes to become active will depend upon the hold down timer value set as discussed above.
 

Switch configuration considerations

  • Assuming the PAN HA pair is being connected to a Layer2 switch, there is one port-based setting that should be considered. Most Layer2 switches have spanning tree enabled by default; this is to prevent loops from occurring due to cabling errors.

 

  • When spanning tree is enabled on a switch port, it will not immediately start to forward data. It will instead go through a number of states while it determines the topology of the network. This can cause of a delay of up to 30-50 seconds before traffic starts to be forwarded. This applies to the original spanning tree protocol (STP) defined by the IEEE 802.1D. (see http://en.wikipedia.org/wiki/Spanning_tree_protocol for more details on the protocol and other references)

 

  • Some vendors have implemented proprietary extensions to STP to minimize the delay when a switch port becomes active. Cisco switches have a configuration option called PortFast.  PortFast immediately transitions the port into STP forwarding mode upon link up. The port still participates in STP. So if the port is determined to be a part of the loop, the port will still transitions into STP blocking mode. (http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a008009482f.shtml#topic1) There is a new IEEE standard (802.1w – Rapid Spanning Tree) that now includes protocol extensions such as PortFast. Switch vendors other than Cisco will have a similar configuration setting, and it is recommended that you contact your switch manufacture on how to configure the equivalent.

 

  • Here is a configuration example from a Cisco 29xx switch with PortFast enabled on a port:
  1. interface FastEthernet0/16
  2. switchport access vlan 500
  3. spanning-tree portfast

 

  • The best practice is to enable PortFast or the equivalent on all switch interfaces that connect to the PAN firewalls in an HA configuration.


Resolution


The best practice for achieving the optimal failover time for a PAN L3 HA pair is as follows:

  • Set holddown timer to 0 ms on PA-4000, 2000 ms on PA-2000 (under HA election settings)
  • Set hello interval to 1000 ms on PA-4000, 8000 ms on PA-2000 (under HA election settings)
  • Configure link monitoring (under HA configuration for interfaces)
  • Configure “PortFast” or equivalent on adjacent L2 switch ports that the PAN firewall is connected to.

 

owner: jnguyen



Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClHnCAK&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language