I received a critical event in the system log with the error message "Abnormal system memory usage detected, restarting ha_agent with virtual memory 3607332 KB."
After that, our ha active-active pair ran into a non-functional state:
- the active-primary machine thinks that active-secondary is down. it continues work as remaining cluster machine.
- the active-secondary "forgets" any information about ha and continues work as single machine without ha-status
As result we've had inconsistent routing.
Manual restart of the active-secondary machine from CLI results fortunately in the normal ha-state which means, the two machines are "seeing" and "knowing" each other. After restart they are working normal again as ha active-active pair.
What happened here? Any ideas?
This is certainly a bug. There are number of such a reported issues. Next thing is to find out which bug.
If Firewall is on version bellow 5.0.14 than device would have infected with bug/65146. Upgrade to 5.0.14 is solution.
There is another known issue bug/62323 where fix is 6.0.4.
Let me know for additional queries.
Could you please let us know, if there is any core file has been generated by the firewall during this incident, you may apply CLI command > show system files.
In case there is a core file exist on this firewall, then Instead of blindly assuming a software BUG, you may open a ticket with Palo Alto support to get the root cause of the issue/work around/ recommendation etc.
NOTE: Even if there are no core files, the support will generate the tech-support file firewall to have a deeper analysis.
Hope this helps.
thank you for your reply, we are currently running 6.0.3. The adressed issue in 6.0.4. could actually be the reason. (62323— Made fixes to improve the issue where the firewall went into a non-functional state due to an out of memory condition caused by an internal process. Updates have been made to resolve some of the memory utilization issues.) I will try to find an inspection window for the update very soon.
Thank you so far...
thank you for your reply, unfortunately there are no actual core files in the system. The latest files are from August 1st. So the update to 6.0.4 will be the only option so far.
Thank you so far...
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!