Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Palo Alto Networks Approved
Palo Alto Networks Approved
Community Expert Verified
Community Expert Verified

Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

L1 Bithead

This is a recurring issue, a reboot helps for time being.

 

When attempting to update to the latest antivirus version, we see that the commit fails.

System resources look normal.

And looking at the techsupport file in /var/log/messages, we see that during various attempts:

mgmtsrvr, devsrvr, logrcvr were the killed processes due to out of memory and we see a stack of call traces (crash stack) after every out-of-memory condition.

 

 

Attaching the output of messages:

messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 2714 (devsrvr) score 120 or sacrifice child
messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 16347 (mgmtsrvr) score 75 or sacrifice child
messages
2023-01-09 09:44:47
Jan  9 09:44:47 3000 klogd: Out of memory: Kill process 32707 (logrcvr) score 103 or sacrifice child

 

 

 

Looking in auto assistant tool - it points at Memory leak issue.

 

We tried to restart all three processes and tried to install the AV update but it failed again. 

We monitored the resource utilization again and saw that httpd process was consuming around 130% of CPU and fluctuating up and down on a regular basis. Upon restarting the web-server process, httpd consumption went down and we were able to commit the changes and AV install was successful.

 

We similarly did the same on the secondary device and httpd CPU consumption went down.

 

Customer running on 9.1.15 (Preferred Release)

Running on PA-3250

(PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device.)

 

We see the following already resolved issues for memory leak, but our customer is on 9.1.15 (Preferred Release)

PAN-175211 Fixed a memory leak issue in the mgmtsrvr process. mgmtsvr process memory leak - 9.0.16, 9.1.13, 10.0.9, 10.1.4
PAN-93839 Linux kernels on PANOS 8.x/9.x have the memory leak which being fixed in the main stream linux - 8.0.10 and 8.1.1
PAN-143485 Fixed a memory leak issue related to a process (*devsrvr*). device server memory leak - 9.0.13,9.1.8,10.0.0

 

Is this a bug behavior or what else can be done, please advise.

Is it recommended to downgrade to a lower version? If yes, which version?

show  system resourcesshow system resources

 

@UtkarshKumar @Didar_Bajwa 

6 REPLIES 6

Community Team Member

Hi @Param_Upadhyay ,

 

Memory leak sounds like a definite bug.

For memory leak issue I'd recommend grabbing the TSF and submit it to support for analysis.  TAC can confirm if you're hitting a known bug + guide you to the version with the fix (if available).

 

I can't confirm if you're hitting any of the bugs listed.

 

Kind regards,

-Kiwi.

 
LIVEcommunity team member, CISSP
Cheers,
Kiwi
Don't forget to hit that Like button if a post is helpful to you!

Cyber Elite
Cyber Elite

Better upgrade to the latest 10.2.x version just in case if this bug is solved. Outside of that you may try to find which process causes the issues and if it is not critical to just restart it and maybe add automation with XSOAR or Ansible to trigger the restart each night till TAC finds the root cause.

 

Memory :

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClUb

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000oNDmCAM

 

 

Restart process:

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLUeCAO

 

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClaGCAS

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000POIHCA4

 

 

 

 

 

 

Ansible or XSOAR to periodically restart the process or the managment plane:

 

https://paloaltonetworks.github.io/pan-os-ansible/modules/panos_op_module.html

 

https://xsoar.pan.dev/docs/reference/integrations/panorama

 

Thanks, but we are on PA-3250, and for PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device. As advised will try to get TAC involved.

Still if support takes too long  as workaround you can install the free version of ansible on linux and trigger this task each night with a cronjob or just test the Cortex XSOAR free version as automation is the way to go nowadays.

 

https://start.paloaltonetworks.com/sign-up-for-community-edition.html

 

There is also a free trainings:

 

https://www.redhat.com/en/services/training/do007-ansible-essentials-simplicity-automation-technical...

 

https://www.youtube.com/watch?v=BhpkZA9t1HA&list=PLD6FJ8WNiIqUVEA2e5LZhmqNnwFcFhDTZ

 

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!