VM100 keeps rebooting

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

VM100 keeps rebooting

L4 Transporter

Hi all,

We have a Palo Alto VM-100 running under ESXi 5.0 which up until this week has been rock solid. 

On Monday it rebooted itself.  No config changes had been made for almost a month prior to this.  It also rebooted itself twice yesterday and once so far today.

The messages in the log are as below.  Order is from bottom to top.

Autocommit job failed
Dataplane is now up
The system is starting up.
The system is shutting down.
data_plane: restarts exhausted, rebooting system
The dataplane is restarting.
supervisor: Exited 1 times, must be manually recovered.
tasks: Exited 1 times, must be manually recovered.
all_task_2: Exited 4 times, must be manually recovered.
PAN-DB cloud list loading failed (ERROR:Couldn't resolve host name).
all_task_3: exiting because missed too many heartbeats
all_task_2: exiting because missed too many heartbeats
all_task_3: exiting because missed too many heartbeats
all_task_2: exiting because missed too many heartbeats
all_task_3: exiting because missed too many heartbeats
all_task_2: exiting because missed too many heartbeats
all_task_3: exiting because missed too many heartbeats
all_task_2: exiting because missed too many heartbeats

I logged a call over 24 hours ago with our support company but so far nobody has been able to offer any assistance at all.

I'm hoping somebody on here has maybe seen this before?  Any help would be very much appreciated!

Many thanks,

Dave

Palo Alto VM-100

Software version 5.0.11

Application version 450-2330

Antivirus version 1346-1817

URL Filtering version 2014.08.13.411

11 REPLIES 11

L6 Presenter

you may try to reset from Maint mode if you have config backup and don't mind about logs...

I saw this problem in my VM too..I fixed with reverting back..

L6 Presenter

Hi Dyoung,

Provide us output for "show system files". That will confirm if reboot has generated any crash/core files. If yes, it would be easy to find out root cause.

Regards,

Hardik Shah

L3 Networker

Hi Dave,

I'm faced with same problem on VM-100 after I upgraded on 5.0.14. Can you have any response from support...?

Regards,

Predrag

L5 Sessionator

Hi DYoung,

Can you confirm if you are using AMD processor on you VM machine, if that is the case, we have a known issue that we have identified and the fix is scheduled to be in upcoming release. If you are not using AMD and are still seeing multiple crash I would suggest you to open a case with PA support to further analyze. Hope this helps. Thank you.

Hi ssharma,

No, my vm laying on Intel server infrastructure, so I think that definitely not an issue.  But definitely I found the cause of this crash and behavior before that. One of my GP clients after successful connection to gateway, I don't know how, but only his GP client and computer initiate this "exiting because missed too many heartbeats", then "Exited 4 times, must be manually recovered", then "The dataplane is restarting" and on the end "data_plane: restarts exhausted, rebooting system".

We tested this to be sure and only this client doing this..... First seen on version 5.0.14, but same thing on 6.0.5. I opened support case with local certified distributor and supplier.



case one>

     1.jpg

case two>

2.jpg

case three>

3.jpg

Always after he connect to GP gateway......very very strange...

Regards,

Predrag

Hi Predrag,

That was nice observation but still wired that one particular user would cause it to crash. Since you have already opened a case, engineer should be able to find root cause and possible solution/workaround. Please update this thread once you have answer from engineer so that other users can also look at it. Thank you.

L4 Transporter

Hi all,

Thank you for your replies.  I'm sorry I haven't responded sooner - I've not logged in for a while.

Our VM-100 rebooting stopped after we upgraded to 5.0.14.  Back in August this was escalated all the way up to Palo Alto development and still no reason for these reboots could be found.

The case was closed because the problem seemed to have gone away rather than the root cause discovered.

Unfortunately on Tuesday this week our VM-100 started rebooting again.  I have upgraded to 5.0.15 in the faint hope that this will help and contacted our support partner, but not heard anything back yet.

Did any of you discover the cause of this issue please?  We are not using GP so I don't think it can be related to that.

Also, we are running on Intel Xeon E5-2650v2 and show system files shows no files - just an empty crashinfo directory.

Many thanks,

Dave

PaloCrash.png

Hi dyoung,

yes problem revealed and isolated. My support case lasted to long but in the end, support said that they had reproduced crash in testbed environment while tracing some other case.

There is they answer:

A time-of-check-to-time-of-use race condition causes a buffer overflow that trashes a mutex. The mutex will not get unlocked causing a crash.

A fix was coded to fix this race condition & buffer overflow.

Our Q&A team is currently testing this code and once approved this code will be introduced in the new PanOS versions. I will update you as soon as I get more confirmation on when we can expect this fix to be released.


The fix is scheduled for release with PanOS version 6.1.3 which should be released somewhere near mid-March.

Backport to 6.0.x software is still pending.

Regards,

Predrag

Many thanks Predrag,

It is good to hear that support have now reproduced this!

Did they give any indication of what was might be triggering the issue please? 

Chasing up my support people now as still waiting to hear back from them...

Regards,

Dave

Hi Dave,

No they didn't.  From case history and progress pane I can see that issue was registered with bug numbers 69130 and 61575. I guess that this bug traces will be covered with comments on next PAN OS releases.....


Regards,


Predrag

Hi Predrag,

Thanks for this.  Hopefully I can use it to steer our support people in the right direction to try and get a bit more info.

Best regards,

Dave

  • 7402 Views
  • 11 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!