PA-3050 stops processing traffic

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PA-3050 stops processing traffic

L0 Member

Has anyone had a PA-3050 stop processing traffic? Our PA-3050 started dropping all traffic today (internet access, DMZ, etc.), we failed over to the standby unit and were able to restore service.

Currently we have a support ticket opened but wanted to know if anyone here has had a similar experience. Thanks!

14 REPLIES 14

L0 Member

We are having similar issue. 3 times in 2 weeks primary 3050, running 5.0.8 stops passing all traffic in v-wire mode and won't fail-over automatically. We forced fail-over to passive box, rebooted fw1, and failed back to it just to have it happen again a few weeks later. We are currently pushing all traffic through the backup 3050 until PAN comes up with recommendation, or fix.

Hi,

I've had this happen several times as well, last one today. We also have a HA cluster, and the primary (and active) firewall suddenly stops routing traffic, and the secondary (passive) does not try to take over, and acts like everything is OK.

Had to reboot the primary firewall for the secondary to become active, and I would like to avoid this happening again.

I've sent tech dumps to our support contact, so hopefully I'll get a better answer than "I'm sure this won't happen again" this time. Running 5.0.11, by the way.

Hello As-mg,

Could you please verify PAN brdagent log through CLI command and verify if matching with below mentioned symptoms.

A. PAN> less dp-log brdagent.log

1. Error message like : "need to reset ocelot link as XX error packet seen"

2.  link flap messages similar to above.

3. XGE link error.

4.  Check system logs for critical logs indicating “DP packet descriptor leak detected  on slot 1 dp0” or similar.

Thanks

L3 Networker

Jumbo frames are not active for our HA links. Forgot to mention that this is a 3020 cluster.

After contacting support their best suggestion was to update to 5.0.12 or 6.0.2.

I have upgraded both to 5.0.12 tonight, but I have a feeling that the bug fixed in 6.0.2 better describes our situation, so I will upgrade the firewalls to 6.0.2 tomorrow.

Fixed in 5.0.12/6.0.1: When High Availability Active/Passive peers lost communication on HA1 and HA2 links, a race condition caused the dataplane to restart.

Fixed in 6.0.2: Fixed an issue with PA-3000 Series devices where traffic could stop passing through the firewall or the dataplane could restart due to an internal path monitoring failure.

Hello As-mg,

It's recommended to run the signature pattern matching (AHO) on software

> debug dataplane fpga set sw_aho yes

Thanks

We had this happen on a 500 running 5.0.10.

At the weekend the box spontaneously rebooted - according to support "They've seen it happen with 5.0.x and have no idea why" - there was nothing useful in the tech support bundle so right now we're just waiting to see if it happens again.

I had the same situation on a 500 with 5.0.11 one week ago. Device restarted suddenly without any reason in logs

L0 Member

Apparently we had encountered the same behavior on our 3020 cluster running 6.0.0. The device stopped processing any traffic and there was no evidence of what happened in logs. Restarting dataplane seems to be a working solution for us, yet i'd like to know what was the cause.

I also did verify dp-log brdagent.log as HULK mentioned above and we had these messages:

- Flapping Ocelot link 1

- Error: poll_func(3000/osprey_oct.c:164): Need to reset ocelot link as 51 error packets seen!

Does it mean i should run debug dataplane fpga set sw_aho yes command on our boxes or simply go for 6.0.2?

Hello Marcin,

I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes), instead of PAN OS upgrade.

Thanks

L3 Networker

This issue occured again (third time) on our 3020 HA cluster today, running 6.0.2.

Awaiting response from support.

Hello as-mg,

I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes) to avoid any future occurrence.


Thanks

This was the suggestion given us by support as well.

However, they also did say that this change will be lost each time we reboot a firewall, which I would concider less handy and make this a not-so-much as permanent fix.

What we're going to do from now on is also have a computer connected to the failing device via the console, and keep it logging in case the issue occurs again.

It's been under a month since it last happened (PANOS 5.0.11) and the issue still persists on PANOS 6.0.2.

If you have any further suggestions, I would love to hear from you.

Hello as-mg,


There is a BUG open with engineering to fix this issue. I do agrre, this is not a permanent fix, only things, you need to re-apply the command if you reboot the FW.


Thanks

The reboot for PA 500 has been fixed in 5.0.12, you may upgrade to the new version now.

  • 8192 Views
  • 14 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!