PA-3050 stops processing traffic

mjcorriganaaa · ‎12-16-2013

Has anyone had a PA-3050 stop processing traffic? Our PA-3050 started dropping all traffic today (internet access, DMZ, etc.), we failed over to the standby unit and were able to restore service.

Currently we have a support ticket opened but wanted to know if anyone here has had a similar experience. Thanks!

jdub01 · ‎01-06-2014

We are having similar issue. 3 times in 2 weeks primary 3050, running 5.0.8 stops passing all traffic in v-wire mode and won't fail-over automatically. We forced fail-over to passive box, rebooted fw1, and failed back to it just to have it happen again a few weeks later. We are currently pushing all traffic through the backup 3050 until PAN comes up with recommendation, or fix.

as-mg · ‎04-30-2014

Hi,

I've had this happen several times as well, last one today. We also have a HA cluster, and the primary (and active) firewall suddenly stops routing traffic, and the secondary (passive) does not try to take over, and acts like everything is OK.

Had to reboot the primary firewall for the secondary to become active, and I would like to avoid this happening again.

I've sent tech dumps to our support contact, so hopefully I'll get a better answer than "I'm sure this won't happen again" this time. Running 5.0.11, by the way.

HULK · ‎04-30-2014

Hello As-mg,

Could you please verify PAN brdagent log through CLI command and verify if matching with below mentioned symptoms.

A. PAN> less dp-log brdagent.log

1. Error message like : "need to reset ocelot link as XX error packet seen"

2. link flap messages similar to above.

3. XGE link error.

4. Check system logs for critical logs indicating “DP packet descriptor leak detected on slot 1 dp0” or similar.

Thanks

as-mg · ‎04-30-2014

Jumbo frames are not active for our HA links. Forgot to mention that this is a 3020 cluster.

After contacting support their best suggestion was to update to 5.0.12 or 6.0.2.

I have upgraded both to 5.0.12 tonight, but I have a feeling that the bug fixed in 6.0.2 better describes our situation, so I will upgrade the firewalls to 6.0.2 tomorrow.

Fixed in 5.0.12/6.0.1: When High Availability Active/Passive peers lost communication on HA1 and HA2 links, a race condition caused the dataplane to restart.

Fixed in 6.0.2: Fixed an issue with PA-3000 Series devices where traffic could stop passing through the firewall or the dataplane could restart due to an internal path monitoring failure.

HULK · ‎04-30-2014

Hello As-mg,

It's recommended to run the signature pattern matching (AHO) on software

> debug dataplane fpga set sw_aho yes

Thanks

networkadmin · ‎05-01-2014

We had this happen on a 500 running 5.0.10.

At the weekend the box spontaneously rebooted - according to support "They've seen it happen with 5.0.x and have no idea why" - there was nothing useful in the tech support bundle so right now we're just waiting to see if it happens again.

marcin_koprowski · ‎05-05-2014

I had the same situation on a 500 with 5.0.11 one week ago. Device restarted suddenly without any reason in logs

Marcin · ‎05-06-2014

Apparently we had encountered the same behavior on our 3020 cluster running 6.0.0. The device stopped processing any traffic and there was no evidence of what happened in logs. Restarting dataplane seems to be a working solution for us, yet i'd like to know what was the cause.

I also did verify dp-log brdagent.log as HULK mentioned above and we had these messages:

- Flapping Ocelot link 1

- Error: poll_func(3000/osprey_oct.c:164): Need to reset ocelot link as 51 error packets seen!

Does it mean i should run debug dataplane fpga set sw_aho yes command on our boxes or simply go for 6.0.2?

HULK · ‎05-06-2014

Hello Marcin,

I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes), instead of PAN OS upgrade.

Thanks

as-mg · ‎05-14-2014

This issue occured again (third time) on our 3020 HA cluster today, running 6.0.2.

Awaiting response from support.

HULK · ‎05-14-2014

Hello as-mg,

I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes) to avoid any future occurrence.

Thanks

as-mg · ‎05-14-2014

This was the suggestion given us by support as well.

However, they also did say that this change will be lost each time we reboot a firewall, which I would concider less handy and make this a not-so-much as permanent fix.

What we're going to do from now on is also have a computer connected to the failing device via the console, and keep it logging in case the issue occurs again.

It's been under a month since it last happened (PANOS 5.0.11) and the issue still persists on PANOS 6.0.2.

If you have any further suggestions, I would love to hear from you.

HULK · ‎05-14-2014

Hello as-mg,

There is a BUG open with engineering to fix this issue. I do agrre, this is not a permanent fix, only things, you need to re-apply the command if you reboot the FW.

Thanks

mizhou · ‎05-21-2014

The reboot for PA 500 has been fixed in 5.0.12, you may upgrade to the new version now.

Unlock your full community experience!

PA-3050 stops processing traffic

PA-3050 stops processing traffic

Show your appreciation!