Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.
Palo Alto Networks Approved
Palo Alto Networks Approved
Community Expert Verified
Community Expert Verified

Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

L2 Linker

This is a recurring issue, a reboot helps for time being.

 

When attempting to update to the latest antivirus version, we see that the commit fails.

System resources look normal.

And looking at the techsupport file in /var/log/messages, we see that during various attempts:

mgmtsrvr, devsrvr, logrcvr were the killed processes due to out of memory and we see a stack of call traces (crash stack) after every out-of-memory condition.

 

 

Attaching the output of messages:

messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 2714 (devsrvr) score 120 or sacrifice child
messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 16347 (mgmtsrvr) score 75 or sacrifice child
messages
2023-01-09 09:44:47
Jan  9 09:44:47 3000 klogd: Out of memory: Kill process 32707 (logrcvr) score 103 or sacrifice child

 

 

 

Looking in auto assistant tool - it points at Memory leak issue.

 

We tried to restart all three processes and tried to install the AV update but it failed again. 

We monitored the resource utilization again and saw that httpd process was consuming around 130% of CPU and fluctuating up and down on a regular basis. Upon restarting the web-server process, httpd consumption went down and we were able to commit the changes and AV install was successful.

 

We similarly did the same on the secondary device and httpd CPU consumption went down.

 

Customer running on 9.1.15 (Preferred Release)

Running on PA-3250

(PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device.)

 

We see the following already resolved issues for memory leak, but our customer is on 9.1.15 (Preferred Release)

PAN-175211 Fixed a memory leak issue in the mgmtsrvr process. mgmtsvr process memory leak - 9.0.16, 9.1.13, 10.0.9, 10.1.4
PAN-93839 Linux kernels on PANOS 8.x/9.x have the memory leak which being fixed in the main stream linux - 8.0.10 and 8.1.1
PAN-143485 Fixed a memory leak issue related to a process (*devsrvr*). device server memory leak - 9.0.13,9.1.8,10.0.0

 

Is this a bug behavior or what else can be done, please advise.

Is it recommended to downgrade to a lower version? If yes, which version?

show  system resourcesshow system resources

 

@UtkarshKumar @Didar_Bajwa 

9 REPLIES 9

Community Team Member

Hi @Param_Upadhyay ,

 

Memory leak sounds like a definite bug.

For memory leak issue I'd recommend grabbing the TSF and submit it to support for analysis.  TAC can confirm if you're hitting a known bug + guide you to the version with the fix (if available).

 

I can't confirm if you're hitting any of the bugs listed.

 

Kind regards,

-Kiwi.

 
LIVEcommunity team member, CISSP
Cheers,
Kiwi
Please help out other users and “Accept as Solution” if a post helps solve your problem !

Read more about how and why to accept solutions.

L6 Presenter

Better upgrade to the latest 10.2.x version just in case if this bug is solved. Outside of that you may try to find which process causes the issues and if it is not critical to just restart it and maybe add automation with XSOAR or Ansible to trigger the restart each night till TAC finds the root cause.

 

Memory :

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClUb

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000oNDmCAM

 

 

Restart process:

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLUeCAO

 

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClaGCAS

 

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000POIHCA4

 

 

 

 

 

 

Ansible or XSOAR to periodically restart the process or the managment plane:

 

https://paloaltonetworks.github.io/pan-os-ansible/modules/panos_op_module.html

 

https://xsoar.pan.dev/docs/reference/integrations/panorama

 

Thanks, but we are on PA-3250, and for PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device. As advised will try to get TAC involved.

Still if support takes too long  as workaround you can install the free version of ansible on linux and trigger this task each night with a cronjob or just test the Cortex XSOAR free version as automation is the way to go nowadays.

 

https://start.paloaltonetworks.com/sign-up-for-community-edition.html

 

There is also a free trainings:

 

https://www.redhat.com/en/services/training/do007-ansible-essentials-simplicity-automation-technical...

 

https://www.youtube.com/watch?v=BhpkZA9t1HA&list=PLD6FJ8WNiIqUVEA2e5LZhmqNnwFcFhDTZ

 

Hi @Param_Upadhyay ,

Just to clarify - PA-3250 is from PA-3200 series, which is the next generation after PA-3000.

You are probably confused by Hardware End-of-Life-Dates - Palo Alto Networks which only list PA-3000 and not PA-3200. That is because end-of-life/sale is not yet announced for PA-3200.

 

To summarize your PA-3250 can support 10.1+ and you shouldn't have any issues upgrading to 10.1 or higher.

Thank you. That really helps.

It turns out that "bebug..." can't be accessed by the API 🙂 . So the workaround is the good old expect and ssh:

 

GNU nano 6.2 expect_palo_alto
#!/usr/bin/expect -f


# Get the commands to run, one per line

set timeout 60
spawn $env(SHELL)
set DEBUG 1
set USER xxxx
set PASS xxxx

set IP_AD xxxx


spawn ssh $USER@$IP_AD
match_max 100000
expect "*?assword:*"
send -- "$PASS\r"
sleep 2
expect "*>*"
send -- "set cli terminal width 200\r"
sleep 2
expect "*>*"
send -- "set cli scripting-mode on\r"
sleep 2
expect "*>*"
send -- "set cli terminal type xterm\r"
sleep 2
expect "*>*"
send -- "debug software restart process web-server\r"
expect "Process*"
sleep 1

 

L0 Member

I am also facing the same problem.

L0 Member

We have a PA-3420 cluster running software 11.0.3. We seem to have this happening at least once a week or so. The memory leak causes the Active node to reboot in most cases.

We have been told there is a fix out soon, just waiting for the new release of the software, should have been out yesterday.

 

  • 5252 Views
  • 9 replies
  • 2 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!