08-27-2012 10:28 AM
So, I recently ran into an issue and I wanted to try to see if I could get some feedback from users to see if anyone else had something similar happen to them.
We recently ran into an issue where our active firewall tanked and transferred responsibility to it's peer. Everything was working as it should, so i contact support to check what the issue could have been. After looking at the tech support files, they discovered that it's a memory leak issue in the 4.1.5 release and that we should upgrade to 4.1.7 because apparently it fixes "hundreds of memory leak issues". So, we upgraded and everything was working fine...for about 2 hours. I tried accessing the CLI and GUI of the active firewall but I was unable to. However, the passive was working fine AND the data plane on the active was still working as well. After doing a tac-login with a challenge/response for the tech to have root access the my box, he was able to restart the authd service because there's yet another race condition issue with 4.1.7 where there are lots of log queries happening at the same time which causes the authd service to fail. This is were the h2 or hotfix 2 comes in and fixes the issue.
Is it me, or is it every time that palo alto releases a new code version that they break something in the previous release that was once working? I've been dealing with this exact scenario since 4.0.x days, and frankly, it's getting annoying having to upgrade our firewalls every 6 weeks when they release a new code.
09-27-2012 03:29 AM
I somewhat agree with you. I forgot to add that it is possible work around but NOT guaranteed.
In some instances, we have achieved success by getting the prompt by pressing Ctrl + C when we get errors similar to "'Cannot connect to management server".Once we get the prompt we can log into root.
However, as I said, it is not guaranteed that we will get to enter the shell.
09-27-2012 06:19 AM
Same issue on 4.1.7. One of our 2050s became complete unresponsive on the management side. Data plane side worked great, and continued to flow traffic. Although I think I was having some issues with theMP dynamic URL cache because the management side was completely eaten up. I wasn't able to login with the serial cable either. I eventually had to force a fail over by disconnecting the HA1, HA2, and management interfaces and restart the locked up 2050.
Frustrating for the administrators, but the users never knew we were having a problem.
10-10-2012 02:44 AM
Just had the issue again on a 4050. We also had some weeks ago on a 2050.
After pressing CTRL-C several times after the login prompt I had the message 'Cannot connect to management server'
I then tried 'debug software restart management-server' -> no help.
then issued the command request restart system -> OK ( but reboot of course, while HA unit took over)
Opened a TAC case to obtain the 4.1.7 h2 hotfix.
Wondering why they don't post this one on the website....
10-10-2012 02:53 AM
The hotfix is listed in the Software Updates section, or at least in my account it is...
10-10-2012 03:01 AM
hmm, weird, not under my account...
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!