12-16-2013 01:55 PM
Has anyone had a PA-3050 stop processing traffic? Our PA-3050 started dropping all traffic today (internet access, DMZ, etc.), we failed over to the standby unit and were able to restore service.
Currently we have a support ticket opened but wanted to know if anyone here has had a similar experience. Thanks!
01-06-2014 01:54 PM
We are having similar issue. 3 times in 2 weeks primary 3050, running 5.0.8 stops passing all traffic in v-wire mode and won't fail-over automatically. We forced fail-over to passive box, rebooted fw1, and failed back to it just to have it happen again a few weeks later. We are currently pushing all traffic through the backup 3050 until PAN comes up with recommendation, or fix.
04-30-2014 03:40 AM
I've had this happen several times as well, last one today. We also have a HA cluster, and the primary (and active) firewall suddenly stops routing traffic, and the secondary (passive) does not try to take over, and acts like everything is OK.
Had to reboot the primary firewall for the secondary to become active, and I would like to avoid this happening again.
I've sent tech dumps to our support contact, so hopefully I'll get a better answer than "I'm sure this won't happen again" this time. Running 5.0.11, by the way.
04-30-2014 07:40 AM
Could you please verify PAN brdagent log through CLI command and verify if matching with below mentioned symptoms.
A. PAN> less dp-log brdagent.log
1. Error message like : "need to reset ocelot link as XX error packet seen"
2. link flap messages similar to above.
3. XGE link error.
4. Check system logs for critical logs indicating “DP packet descriptor leak detected on slot 1 dp0” or similar.
04-30-2014 01:30 PM
Jumbo frames are not active for our HA links. Forgot to mention that this is a 3020 cluster.
After contacting support their best suggestion was to update to 5.0.12 or 6.0.2.
I have upgraded both to 5.0.12 tonight, but I have a feeling that the bug fixed in 6.0.2 better describes our situation, so I will upgrade the firewalls to 6.0.2 tomorrow.
Fixed in 5.0.12/6.0.1: When High Availability Active/Passive peers lost communication on HA1 and HA2 links, a race condition caused the dataplane to restart.
Fixed in 6.0.2: Fixed an issue with PA-3000 Series devices where traffic could stop passing through the firewall or the dataplane could restart due to an internal path monitoring failure.
04-30-2014 01:44 PM
It's recommended to run the signature pattern matching (AHO) on software
> debug dataplane fpga set sw_aho yes
05-01-2014 01:19 AM
We had this happen on a 500 running 5.0.10.
At the weekend the box spontaneously rebooted - according to support "They've seen it happen with 5.0.x and have no idea why" - there was nothing useful in the tech support bundle so right now we're just waiting to see if it happens again.
05-05-2014 12:22 AM
I had the same situation on a 500 with 5.0.11 one week ago. Device restarted suddenly without any reason in logs
05-06-2014 08:01 AM
Apparently we had encountered the same behavior on our 3020 cluster running 6.0.0. The device stopped processing any traffic and there was no evidence of what happened in logs. Restarting dataplane seems to be a working solution for us, yet i'd like to know what was the cause.
I also did verify dp-log brdagent.log as HULK mentioned above and we had these messages:
- Flapping Ocelot link 1
- Error: poll_func(3000/osprey_oct.c:164): Need to reset ocelot link as 51 error packets seen!
Does it mean i should run debug dataplane fpga set sw_aho yes command on our boxes or simply go for 6.0.2?
05-06-2014 08:37 AM
I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes), instead of PAN OS upgrade.
05-14-2014 04:14 PM
I would recommend you to set the pattern matching on software (> debug dataplane fpga set sw_aho yes) to avoid any future occurrence.
05-14-2014 09:47 PM
This was the suggestion given us by support as well.
However, they also did say that this change will be lost each time we reboot a firewall, which I would concider less handy and make this a not-so-much as permanent fix.
What we're going to do from now on is also have a computer connected to the failing device via the console, and keep it logging in case the issue occurs again.
It's been under a month since it last happened (PANOS 5.0.11) and the issue still persists on PANOS 6.0.2.
If you have any further suggestions, I would love to hear from you.
05-14-2014 10:01 PM
There is a BUG open with engineering to fix this issue. I do agrre, this is not a permanent fix, only things, you need to re-apply the command if you reboot the FW.
05-21-2014 12:17 AM
The reboot for PA 500 has been fixed in 5.0.12, you may upgrade to the new version now.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!