Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

Param_Upadhyay · ‎01-10-2023

This is a recurring issue, a reboot helps for time being.

When attempting to update to the latest antivirus version, we see that the commit fails.

System resources look normal.

And looking at the techsupport file in /var/log/messages, we see that during various attempts:

mgmtsrvr, devsrvr, logrcvr were the killed processes due to out of memory and we see a stack of call traces (crash stack) after every out-of-memory condition.

Attaching the output of messages:

messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 2714 (devsrvr) score 120 or sacrifice child

messages

2023-01-09 09:42:03

Jan  9 09:42:03 3000 klogd: Out of memory: Kill process 16347 (mgmtsrvr) score 75 or sacrifice child

messages

2023-01-09 09:44:47

Jan  9 09:44:47 3000 klogd: Out of memory: Kill process 32707 (logrcvr) score 103 or sacrifice child

Looking in auto assistant tool - it points at Memory leak issue.

We tried to restart all three processes and tried to install the AV update but it failed again.

We monitored the resource utilization again and saw that httpd process was consuming around 130% of CPU and fluctuating up and down on a regular basis. Upon restarting the web-server process, httpd consumption went down and we were able to commit the changes and AV install was successful.

We similarly did the same on the secondary device and httpd CPU consumption went down.

Customer running on 9.1.15 (Preferred Release)

Running on PA-3250

(PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device.)

We see the following already resolved issues for memory leak, but our customer is on 9.1.15 (Preferred Release)

PAN-175211 Fixed a memory leak issue in the mgmtsrvr process. mgmtsvr process memory leak - 9.0.16, 9.1.13, 10.0.9, 10.1.4
PAN-93839 Linux kernels on PANOS 8.x/9.x have the memory leak which being fixed in the main stream linux - 8.0.10 and 8.1.1
PAN-143485 Fixed a memory leak issue related to a process (*devsrvr*). device server memory leak - 9.0.13,9.1.8,10.0.0

Is this a bug behavior or what else can be done, please advise.

Is it recommended to downgrade to a lower version? If yes, which version?

show system resources

@UtkarshKumar @Didar_Bajwa

kiwi · ‎01-11-2023

Hi @Param_Upadhyay ,

Memory leak sounds like a definite bug.

For memory leak issue I'd recommend grabbing the TSF and submit it to support for analysis. TAC can confirm if you're hitting a known bug + guide you to the version with the fix (if available).

I can't confirm if you're hitting any of the bugs listed.

Kind regards,

-Kiwi.

LIVEcommunity team member, CISSP
Cheers,
Kiwi
Please help out other users and “Accept as Solution” if a post helps solve your problem !

Read more about how and why to accept solutions.

nikoolayy1 · ‎01-11-2023

Better upgrade to the latest 10.2.x version just in case if this bug is solved. Outside of that you may try to find which process causes the issues and if it is not critical to just restart it and maybe add automation with XSOAR or Ansible to trigger the restart each night till TAC finds the root cause.

Memory :

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClUb

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000oNDmCAM

Restart process:

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLUeCAO

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClaGCAS

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000POIHCA4

Ansible or XSOAR to periodically restart the process or the managment plane:

https://paloaltonetworks.github.io/pan-os-ansible/modules/panos_op_module.html

https://xsoar.pan.dev/docs/reference/integrations/panorama

Param_Upadhyay · ‎01-11-2023

Thanks, but we are on PA-3250, and for PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device. As advised will try to get TAC involved.

nikoolayy1 · ‎01-12-2023

Still if support takes too long as workaround you can install the free version of ansible on linux and trigger this task each night with a cronjob or just test the Cortex XSOAR free version as automation is the way to go nowadays.

https://start.paloaltonetworks.com/sign-up-for-community-edition.html

There is also a free trainings:

https://www.redhat.com/en/services/training/do007-ansible-essentials-simplicity-automation-technical...

https://www.youtube.com/watch?v=BhpkZA9t1HA&list=PLD6FJ8WNiIqUVEA2e5LZhmqNnwFcFhDTZ

aleksandar.astardzhiev · ‎01-24-2023

Hi @Param_Upadhyay ,

Just to clarify - PA-3250 is from PA-3200 series, which is the next generation after PA-3000.

You are probably confused by Hardware End-of-Life-Dates - Palo Alto Networks which only list PA-3000 and not PA-3200. That is because end-of-life/sale is not yet announced for PA-3200.

To summarize your PA-3250 can support 10.1+ and you shouldn't have any issues upgrading to 10.1 or higher.

Param_Upadhyay · ‎01-24-2023

Thank you. That really helps.

nikoolayy1 · ‎01-31-2023

It turns out that "bebug..." can't be accessed by the API 🙂 . So the workaround is the good old expect and ssh:

GNU nano 6.2 expect_palo_alto
#!/usr/bin/expect -f

# Get the commands to run, one per line

set timeout 60
spawn $env(SHELL)
set DEBUG 1
set USER xxxx
set PASS xxxx

set IP_AD xxxx

spawn ssh $USER@$IP_AD
match_max 100000
expect "*?assword:*"
send -- "$PASS\r"
sleep 2
expect "*>*"
send -- "set cli terminal width 200\r"
sleep 2
expect "*>*"
send -- "set cli scripting-mode on\r"
sleep 2
expect "*>*"
send -- "set cli terminal type xterm\r"
sleep 2
expect "*>*"
send -- "debug software restart process web-server\r"
expect "Process*"
sleep 1

AliciaFoster · ‎02-21-2023

I am also facing the same problem.

Tawqeer-Hussain · ‎01-09-2024

We have a PA-3420 cluster running software 11.0.3. We seem to have this happening at least once a week or so. The memory leak causes the Active node to reboot in most cases.

We have been told there is a fix out soon, just waiting for the new release of the software, should have been out yesterday.

Jorge-Sanchez · ‎06-07-2024

Hi Hussain,

Hope you are well. We are presenting the same behavior with a PA-3410 Firewall. What version of PAN-OS did you recommend? Has it already been released? Did it solve the problem?

I hope you can help me with these questions, thank you very much.

Unlock your full community experience!

Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

Out of memory: Kill process xxxx (mgmtsrvr) score xx or sacrifice child

Show your appreciation!