Active/Pasive HA with LAG to Virtual Chassis = Dropped Packets?

Showing results for 
Show  only  | Search instead for 
Did you mean: 

Active/Pasive HA with LAG to Virtual Chassis = Dropped Packets?

L1 Bithead

Good afternoon,

I tried to deploy a Active/Passive cluster yesterday with only partial success!

Things didn't work as expected. Sessions were forming but servers would work intermittently. At times it would change so that what was working, stopped, and what wasn't, started. Some services worked fine for some people throughout. And for others nothing worked. After 45 minutes of trying various things we rolled back and I got to wondering what I'd missed...

In the lab I'd modelled the set up fairly closely to the real world scenario;


  • I have a total of 4 BGP peers - 2 for each device that sit in the "Internet" zone on interfaces 1/1, 1/2, 1/3, 1/4 (1/1 & 1/3 are up on Active and 1/2 & 1/4 are ready to go up on the Passive in case of a fail over scenario)
  • We accept only a default route (0/0) and announce only 1 prefix. BGP/Routing worked in both the lab and the real world.
  • 3 ports per device form part of an aggregated Ethernet bundle, "AE1", making up the "Trust" zone. (1/5, 1/7, 1/9)
  • The AE1 bundle mounts a number of L3 subnets that act as default gateways for downstream servers.
  • The AE1 bundle connects from each PAN device to an EX4200 virtual switch stack running a single AE bundle, "AE11". (Not modelled in lab)
  • There is no routing occurring on the switch fabric.
  • There are no "Deny" rules - only a default Any/Any/Any "Allow" rule.
  • There are no fail over rules enabled.

Things I tried

  1. Disabled Jumbo frames - didn't need them anyway, was a relic from an earlier Active/Active setup.
  2. Changed "Passive link state" to "Shutdown" from "Auto"

My current working theory is that having both PAN devices (even though one is shutdown/passive) connected to the switch fabric over a single AE bundle caused traffic to get lost at L2. Is this possible? Perhaps I've missed something else. Either way I'd love to know what I got wrong and how it can be fixed.

Thanks for your time,



Hello Supplier,

global counter is the best option to find out root cause.

Apart from that you can do packet capture on firewall to troubleshoot particular data stream. Even that is effective.

Further suggestion can be provided after results of this output.


Hardik Shah

Hello Hardik,

My name is Simon!

Aside from that - I've been reading up and found that the scenario "Layer 3 Active/Passive with Link Aggregation" on page 80 of this document - Designing Networks with Palo Alto Networks Firewalls makes use of MC-LAG. I'm only using LAG. Could this be the problem?

Hello Simmon,

Topologies are different, but both should be supported and none should have drops.

Global counter data would be really useful here.


Hardik Shah

L1 Bithead

Hello again,

To those of you interested - we successfully deployed these firewalls yesterday, after making a change to the topology.

We replaced the single LAG between the switch fabric with a LAG to each device.

For whatever reason this has solved the issue and we're no longer seeing dropped packets.

Thank you hshah for your help.

Here is the revised, working topology. I hope it helps someone else.


Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!