- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
07-12-2014 09:45 AM
Good afternoon,
I tried to deploy a Active/Passive cluster yesterday with only partial success!
Things didn't work as expected. Sessions were forming but servers would work intermittently. At times it would change so that what was working, stopped, and what wasn't, started. Some services worked fine for some people throughout. And for others nothing worked. After 45 minutes of trying various things we rolled back and I got to wondering what I'd missed...
In the lab I'd modelled the set up fairly closely to the real world scenario;
Things I tried
My current working theory is that having both PAN devices (even though one is shutdown/passive) connected to the switch fabric over a single AE bundle caused traffic to get lost at L2. Is this possible? Perhaps I've missed something else. Either way I'd love to know what I got wrong and how it can be fixed.
Thanks for your time,
Simon
07-17-2014 03:00 AM
Hello again,
To those of you interested - we successfully deployed these firewalls yesterday, after making a change to the topology.
We replaced the single LAG between the switch fabric with a LAG to each device.
For whatever reason this has solved the issue and we're no longer seeing dropped packets.
Thank you hshah for your help.
Here is the revised, working topology. I hope it helps someone else.
07-12-2014 10:27 AM
Hi Simon,
If device is in Passive state, it will not respond to any traffic, If it gets any traffic it just drops it. If you think passive unit is getting some of traffic due to switching issue then try following things.
1. Clear mac table on switch, it will clear stale entry for passive unit if it exist
2. If that doesnt fix the issue do packet capture on passive unit. That can help you to verify if firewall is getting any traffic
3. You may want to check hardware counters on switch connecting Passive unit, see if they are increasing.
If issue is on active unit and not on passive than provide me output for following command after each 5 minutes. I need 6 samples. This will provide precise reason for drop
show counter global filter packet-filter yes delta yes sev drop
Regards,
Hardik Shah
07-12-2014 10:50 AM
Thanks Hardik,
So this architecture is valid?
You say "3. You may want to check hardware counters on switch connecting Passive unit, see if they are increasing."
It's the same logical switch that's connected to the Active unit, under the same LAG. Just 3/6 of the LAG members are up and 3 down.
07-12-2014 10:55 AM
Hello Suppliers,
If switch is logical than I dont have much info on troubleshooting.
Architecture is correct, first try to find out drop reason.
It could be firewall, switch or BGP routers.
Regards,
Hardik Shah
07-12-2014 02:31 PM
Thanks Hardik,
It won't be the BGP routers - they're 3rd party and are working currently.
It was the switch itself that seemed to be having trouble doing L2 when we ran tests, though I didn't know whether the LAG to the new firewalls had caused the issue...
I will try again in the next maintenance window and check the filters with the deltas.
If anyone else has any other ideas or suggestions, I'm all ears.
07-12-2014 02:51 PM
Hello Supplier,
global counter is the best option to find out root cause.
Apart from that you can do packet capture on firewall to troubleshoot particular data stream. Even that is effective.
Further suggestion can be provided after results of this output.
Regards,
Hardik Shah
07-13-2014 03:15 AM
Hello Hardik,
My name is Simon!
Aside from that - I've been reading up and found that the scenario "Layer 3 Active/Passive with Link Aggregation" on page 80 of this document - Designing Networks with Palo Alto Networks Firewalls makes use of MC-LAG. I'm only using LAG. Could this be the problem?
07-13-2014 05:38 PM
Hello Simmon,
Topologies are different, but both should be supported and none should have drops.
Global counter data would be really useful here.
Regards,
Hardik Shah
07-17-2014 03:00 AM
Hello again,
To those of you interested - we successfully deployed these firewalls yesterday, after making a change to the topology.
We replaced the single LAG between the switch fabric with a LAG to each device.
For whatever reason this has solved the issue and we're no longer seeing dropped packets.
Thank you hshah for your help.
Here is the revised, working topology. I hope it helps someone else.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!