I received a critical event in the system log with the error message "Abnormal system memory usage detected, restarting ha_agent with virtual memory 3607332 KB."
After that, our ha active-active pair ran into a non-functional state:
- the active-primary machine thinks that active-secondary is down. it continues work as remaining cluster machine.
- the active-secondary "forgets" any information about ha and continues work as single machine without ha-status
As result we've had inconsistent routing.
Manual restart of the active-secondary machine from CLI results fortunately in the normal ha-state which means, the two machines are "seeing" and "knowing" each other. After restart they are working normal again as ha active-active pair.
What happened here? Any ideas?
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!