This is a recurring issue, a reboot helps for time being.
When attempting to update to the latest antivirus version, we see that the commit fails.
System resources look normal.
And looking at the techsupport file in /var/log/messages, we see that during various attempts:
mgmtsrvr, devsrvr, logrcvr were the killed processes due to out of memory and we see a stack of call traces (crash stack) after every out-of-memory condition.
Attaching the output of messages:
Jan 9 09:42:03 3000 klogd: Out of memory: Kill process 2714 (devsrvr) score 120 or sacrifice child
Jan 9 09:42:03 3000 klogd: Out of memory: Kill process 16347 (mgmtsrvr) score 75 or sacrifice child
Jan 9 09:44:47 3000 klogd: Out of memory: Kill process 32707 (logrcvr) score 103 or sacrifice child
Looking in auto assistant tool - it points at Memory leak issue.
We tried to restart all three processes and tried to install the AV update but it failed again.
We monitored the resource utilization again and saw that httpd process was consuming around 130% of CPU and fluctuating up and down on a regular basis. Upon restarting the web-server process, httpd consumption went down and we were able to commit the changes and AV install was successful.
We similarly did the same on the secondary device and httpd CPU consumption went down.
Customer running on 9.1.15 (Preferred Release)
Running on PA-3250
(PA-3000 Series PAN-OS 9.1.x is the latest version, and so in our scenario, we are on the latest stable build 9.1.15 for this device.)
We see the following already resolved issues for memory leak, but our customer is on 9.1.15 (Preferred Release)
PAN-175211 Fixed a memory leak issue in the mgmtsrvr process. mgmtsvr process memory leak - 9.0.16, 9.1.13, 10.0.9, 10.1.4
PAN-93839 Linux kernels on PANOS 8.x/9.x have the memory leak which being fixed in the main stream linux - 8.0.10 and 8.1.1
PAN-143485 Fixed a memory leak issue related to a process (*devsrvr*). device server memory leak - 9.0.13,9.1.8,10.0.0
Is this a bug behavior or what else can be done, please advise.
Is it recommended to downgrade to a lower version? If yes, which version?
Hi @Param_Upadhyay ,
Memory leak sounds like a definite bug.
For memory leak issue I'd recommend grabbing the TSF and submit it to support for analysis. TAC can confirm if you're hitting a known bug + guide you to the version with the fix (if available).
I can't confirm if you're hitting any of the bugs listed.
Better upgrade to the latest 10.2.x version just in case if this bug is solved. Outside of that you may try to find which process causes the issues and if it is not critical to just restart it and maybe add automation with XSOAR or Ansible to trigger the restart each night till TAC finds the root cause.
Ansible or XSOAR to periodically restart the process or the managment plane:
Still if support takes too long as workaround you can install the free version of ansible on linux and trigger this task each night with a cronjob or just test the Cortex XSOAR free version as automation is the way to go nowadays.
There is also a free trainings:
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!