My PA-220's needed some SSH changes. After these were committed locally, I ran the
"set ssh service-restart mgmt" command, as the manual says, in run mode.
The firewall pings for about 30 seconds, then reboots. I'm using 8.1.13.
Why does it reboot? How can I get around it?
I have had a ticket in for a week, and the lady working the ticket doesn't respond.
Can you get that to actually reproduce at all? It's possible that you may have found a bug in 8.1.13, but I can't get it to reproduce on a VM-50. If you can actually get it to reproduce, the only fix is getting support to validate the issue and raising it with the engineering team to get fixed in a future release.
Thanks for the reply!
I would I COULDN'T get it to happen!
I've changed about 50 of our 60 firewalls over the last few days. They all do it, except in two cases the restart seemed to happen but it just left the SSH process inaccessible - so I had to reboot it!
In one case, it seemed the process didn't need to be restarted...after a day or so, it seemed to not have any issue and the changes were in place. But a 2% success rate isn't a good thing...especially since I (nor Palo) can explain why one worked and 2 killed the SSH process.
I also just don't get why there's no way to restart the process or push the SSH changes via Panorama. It just doesn't make sense. How is that "next generation"? It sounds more like "last century"!
And there's no version of 8.x.x. that will allow me to change the Key Exchange protocol? I need 9.0 or 9.1? That's just idiotic. I just upgraded all of these to 8.1.13 like 2 months ago, so we're not going to 9.x!
I have a HA pair of 5520s and a single 5520 which I'll try this on. It may work, or totally crash. It's anyone's guess!
I needed to do the CLI changes to SSH to address audit vulnerabilities. I was able to change settings, but not the KEX protocol. This key exchange protocol can't be changed until 9.0.
Palo says they have a fix for the rebooting issue. Unfortunately, their fix was to use a new OS, which wasn't in the cards for us.
Then, it occurred to me how we might fix this.
We normally disable telnet, and use SSH, as I expect most people do. The problem, according to Palo, was the PA220 (and perhaps 200) only had one key for SSH because that made reboots faster (see below). I believe it was that I was using SSH when I restarted the service that was an issue. A reboot was need to recreate the SSH key. The reboot was forced because the existing SSH session couldn't find a good SSH key, so the service tried to restart 6 times or so, then forced the reboot when the service failed to start.
So, what I decided to do was enable telnet, then telnet into our 220s, exit any SSH sessions, apply the SSH CLI changes, commit them, then restart the SSH service. I could then SSH into the units with the new SSH settings. I then disabled telnet again.
This seems to restart the service OK and not reboot. This issue isn't a problem with the PA500 and above (like our 5220s), because they all have multiple SSH keys.
So try this on one of your firewalls, and see how it goes!
See the info from my ticket:
"Our Engineering team has an update on the Root cause of the issue.
sshd_config file has a list of sshd keys that the daemon needs to start with. The list of keys are read from the config file and those file are then read from the filesystem
In 8.0 and earlier 8.1 releases slower platforms like 200, 500 and 800 series used to start sshd service with 6 keys). But we later realized that this was slowing the login in time. So in some maintenance release of 8.1 we reduced the number of keys from 6 to 1. This increased the log in times for all the slower platforms noticeably.
When we upgraded from earlier version having 6 keys to a newer version having just 1 key what happened was that the remaining 5 keys were never deleted from cryptod keystore. Having 5 keys in cryptod and never using them is not an issue in itself. But the problem happened when we changed the default host key.
Now, the code that does all this changes checked if the new default host key (ecdsa 256 in this case) was present in cryptod and since it was present, it did not generate that key, and more importantly it did not create a key file which sshd daemon reads during startup. So what happened was that when sshd tried to start it could not find any keys to start with and did not start. That's why it kept on failing and after trying few times, the device rebooted.
When the device rebooted, a script that checks ssh keys in cryptod and adds the correct key files to filesystem ran and it added all that was necessary for sshd daemon to start. That's why sshd started properly after reboot.
We have a fix for this and it should resolve this issue from happening in future."
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!