- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
02-20-2013 07:07 AM
After my Palo Alto 2050 in HA active/passive is up for about 1 week, I begin to get errors committing policies.
Management server failed to send phase 1 abourt to client logrcvr
Management server failed to send phase 1 abourt to client sslvpn
Management server failed to send phase 1 abourt to client websrvr
commit failed
This gets worse as uptime increases. This problem existed in 4.1.9 and 4.1.8 as well. The problem began when I started using FQDN object, although I do not know if that is related to this issue.
If I commit, eventually it will take. This occurs from both the web GUI as well as the CLI.
I contacted support, and they simply suggested a commit force.
Can anyone shed any additional insight into this problem?
Thank you.
02-20-2013 07:31 AM
The short answer is the management process is running high, and is consuming all of the memory/CPU on the management side. The command debug software restart management-server will solve the problem temporarily. It will take 10-15 minutes before you can log back into the firewall to make the commit.
PAN Support will need your tech-support files to see why it is running so high.
02-20-2013 11:11 AM
Hi,
There can be few reason why the commit is failing.
As mentioned above it is highly possible that the management plane is running high.
Running of high CPU can be because of a process (which can be any process user id, reporting etc)
You can run the following command on the device to see if you management is running high
show system resources or
show system resources follow
Here is a sample output
admin@> show system resources
top - 11:00:16 up 22 days, 18:12, 1 user, load average: 0.29, 0.08, 0.11
Tasks: 101 total, 2 running, 99 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.2%us, 1.4%sy, 1.5%ni, 94.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 995872k total, 915016k used, 80856k free, 4836k buffers
Swap: 2212876k total, 863728k used, 1349148k free, 165548k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20344 30 10 47116 7336 4144 S 46 0.7 0:00.24 pan_logdb_index
20345 30 10 26984 4260 1948 R 15 0.4 0:00.08 sdb
19778 30 10 3896 1284 1112 S 4 0.1 0:02.05 genindex.sh
20338 20 0 4468 1028 800 R 4 0.1 0:00.05 top
1 20 0 1836 564 536 S 0 0.1 0:02.48 init
2 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 RT 0 0 0 0 S 0 0.0 0:08.24 migration/0
4 20 0 0 0 0 S 0 0.0 0:00.17 ksoftirqd/0
5 RT 0 0 0 0 S 0 0.0 0:08.23 migration/1
6 20 0 0 0 0 S 0 0.0 0:00.07 ksoftirqd/1
7 20 0 0 0 0 S 0 0.0 1:46.39 events/0
8 20 0 0 0 0 S 0 0.0 0:40.84 events/1
9 20 0 0 0 0 S 0 0.0 0:00.02 khelper
12 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr
112 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers
114 20 0 0 0 0 S 0 0.0 0:00.00 bdi-default
115 20 0 0 0 0 S 0 0.0 0:11.58 kblockd/0
116 20 0 0 0 0 S 0 0.0 0:04.81 kblockd/1
125 20 0 0 0 0 S 0 0.0 0:00.00 ata/0
126 20 0 0 0 0 S 0 0.0 0:00.00 ata/1
127 20 0 0 0 0 S 0 0.0 0:00.00 ata_aux
132 20 0 0 0 0 S 0 0.0 0:00.00 khubd
135 20 0 0 0 0 S 0 0.0 0:00.00 kseriod
156 20 0 0 0 0 S 0 0.0 0:00.00 rpciod/0
157 20 0 0 0 0 S 0 0.0 0:00.00 rpciod/1
172 20 0 0 0 0 S 0 0.0 36:30.13 kswapd0
173 20 0 0 0 0 S 0 0.0 0:00.00 aio/0
174 20 0 0 0 0 S 0 0.0 0:00.00 aio/1
175 20 0 0 0 0 S 0 0.0 0:00.00 nfsiod
732 20 0 0 0 0 S 0 0.0 0:00.04 octeon-ethernet
760 20 0 0 0 0 S 0 0.0 0:00.00 scsi_eh_0
765 20 0 0 0 0 S 0 0.0 0:00.99 mtdblockd
793 20 0 0 0 0 S 0 0.0 0:00.00 usbhid_resumer
833 20 0 0 0 0 S 0 0.0 0:41.97 kjournald
886 16 -4 1996 404 400 S 0 0.0 0:01.23 udevd
1867 20 0 0 0 0 S 0 0.0 0:02.78 kjournald
1868 20 0 0 0 0 S 0 0.0 0:00.00 kjournald
1997 20 0 0 0 0 S 0 0.0 1:23.86 flush-8:0
2060 20 0 2008 620 572 S 0 0.1 0:10.14 syslogd
2063 20 0 1892 332 328 S 0 0.0 0:00.02 klogd
2072 20 0 1872 332 236 S 0 0.0 0:04.56 irqbalance
2080 rpc 20 0 2084 492 488 S 0 0.0 0:00.00 portmap
2098 20 0 2116 652 648 S 0 0.1 0:00.05 rpc.statd
2167 20 0 6868 584 500 S 0 0.1 0:02.87 sshd
2215 20 0 6804 388 384 S 0 0.0 0:00.00 sshd
2224 20 0 3280 620 616 S 0 0.1 0:00.03 xinetd
2243 20 0 0 0 0 S 0 0.0 0:00.00 lockd
2244 20 0 0 0 0 S 0 0.0 2:02.54 nfsd
2245 20 0 0 0 0 S 0 0.0 2:01.79 nfsd
2246 20 0 0 0 0 S 0 0.0 2:10.59 nfsd
2247 20 0 0 0 0 S 0 0.0 2:05.77 nfsd
2248 20 0 0 0 0 S 0 0.0 2:09.80 nfsd
2249 20 0 0 0 0 S 0 0.0 2:03.58 nfsd
2250 20 0 0 0 0 S 0 0.0 2:01.06 nfsd
2251 20 0 0 0 0 S 0 0.0 2:07.07 nfsd
2254 20 0 2488 672 580 S 0 0.1 0:01.55 rpc.mountd
2312 0 -20 65136 4624 1888 S 0 0.5 42:19.94 masterd_core
2315 20 0 1888 456 452 S 0 0.0 0:00.01 agetty
2322 0 -20 27864 1412 1036 S 0 0.1 7:29.42 masterd_manager
2329 15 -5 36656 2008 1216 S 0 0.2 254:45.16 sysd
2331 0 -20 32224 5084 1068 S 0 0.5 69:50.52 masterd_manager
2337 20 0 91984 2988 1676 S 0 0.3 1:53.24 dagger
2338 30 10 40568 3624 1656 S 0 0.4 59:15.43 python
2339 20 0 84284 3664 1644 S 0 0.4 0:39.58 cryptod
2340 20 0 166m 1760 1196 S 0 0.2 2:15.46 sysdagent
2354 20 0 7212 612 608 S 0 0.1 0:00.07 tscat
2357 20 0 71580 1056 928 S 0 0.1 0:09.70 brdagent
2358 20 0 31912 1084 928 S 0 0.1 0:25.94 ehmon
2359 20 0 47496 1036 908 S 0 0.1 0:01.15 chasd
2451 20 0 0 0 0 S 0 0.0 0:11.75 kjournald
2492 20 0 2900 628 572 S 0 0.1 0:03.45 crond
2503 20 0 646m 64m 63m S 0 6.7 150:48.80 useridd
2525 20 0 223m 71m 8864 S 0 7.3 45:02.84 devsrvr
2534 20 0 90584 1980 1520 S 0 0.2 0:16.09 ikemgr
2535 20 0 267m 4532 1832 S 0 0.5 2:28.18 logrcvr
2536 20 0 99744 2272 1520 S 0 0.2 0:05.44 rasmgr
2537 20 0 97720 1144 968 S 0 0.1 0:00.84 keymgr
2538 20 0 247m 2172 1532 S 0 0.2 102:25.79 varrcvr
2539 17 -3 56464 1716 1300 S 0 0.2 0:24.44 ha_agent
2540 20 0 112m 7096 1524 S 0 0.7 0:34.04 satd
2541 20 0 102m 1972 1300 S 0 0.2 0:04.48 sslmgr
2542 20 0 57136 1820 1392 S 0 0.2 0:02.48 dhcpd
2543 20 0 74708 2404 1440 S 0 0.2 0:03.97 dnsproxyd
2544 20 0 74392 1736 1356 S 0 0.2 0:04.19 pppoed
2546 20 0 141m 2708 1832 S 0 0.3 0:14.46 routed
2547 20 0 138m 4704 3540 S 0 0.5 2:16.44 authd
3796 20 0 27260 2100 1316 S 0 0.2 0:23.37 snmpd
5184 nobody 20 0 155m 6052 1552 S 0 0.6 1:49.99 appweb3
5190 nobody 20 0 122m 2208 1672 S 0 0.2 1:07.90 appweb3
16879 20 0 3744 3624 2756 S 0 0.4 0:00.02 ntpd
19653 20 0 21340 2448 2016 S 0 0.2 0:00.16 sshd
19664 admin 20 0 21476 1504 1044 S 0 0.2 0:00.03 sshd
19665 admin 20 0 97744 22m 10m S 0 2.3 0:02.97 cli
19695 20 0 2964 496 412 S 0 0.0 0:00.00 crond
19698 20 0 3720 1116 988 S 0 0.1 0:00.02 genindex_batch.
19702 20 0 33704 5176 3020 S 0 0.5 0:00.28 masterd_batch
20335 admin 20 0 2976 668 564 S 0 0.1 0:00.03 less
20337 20 0 3832 1192 1056 S 0 0.1 0:00.08 sh
20339 20 0 1940 536 464 S 0 0.1 0:00.00 sed
22550 20 0 616m 272m 3492 S 0 28.0 57:29.90 mgmtsrvr
26791 nobody 20 0 201m 24m 4296 S 0 2.5 6:35.70 appweb3
Check and see if these process are running high
mgmtsrvr ( if it at 900m and above then it is not good)
devsrvr
If the above process are high then you might have to restart.
Also check if %wa is high.
If it is high it will indicate that you have two much logging and you can try to reduce the logging.
Hopefully this helps.
Thank you
02-20-2013 12:04 PM
Just to clarify on the previous comment, it's the 5th (although it will look like 4th) column that you want to check if it's running over 900m. Its easier to read if you perform a "show system resources | match srvr".
I work by the rule of thumb that if either the management server or device server is over 850m consider restarting them when you have a chance, if either are over 950m then restart them as soon as you have a maintenance window and if either are over 1000m then restart them ASAP.
Be aware that a restart of the management server should not impact traffic throughput although you will lose the ability to manage the PAN device while the management server restarts. A restart of the device server will not impact existing sessions however any new sessions will not match any policy with users or groups in them due to there will be no user-ip cache or user-group cache until the device server has restarted. It's always a good idea to do this out of hours or in a maintenance window to reduce the impact of an unforeseeable event.
02-20-2013 12:09 PM
Also stuff that needs ssl-termination (on some models) will be affected during a reboot of the mgmtplane.
02-20-2013 12:46 PM
Thank you.
dogbert@PA-2050-Trailer-Rebuild(active)> show system resources | match srvr
2360 20 0 651m 106m 3192 S 0 10.9 535:48.96 mgmtsrvr
2381 20 0 422m 106m 9912 S 0 11.0 610:47.25 devsrvr
It sounds like this means PA2050 is underpowered for my needs. Since I have an HA pair, perhaps the better thing to do would be to completely reboot the main unit so that the passive one takes over. That way I wouldn't loose the ssl-terminations or user to IP mappings. My shop is heavy on inbound and outbound SSL termination, and just about every outbound allow rule is based on active directory user mapping.
Thank you.
02-20-2013 01:01 PM
You should only really need to restart the management and device servers if there respective memory values are high or you've been directed to by support. Those values look OK but keep an eye on them over time to see if it increases over time. Another thing you can check for is the presence of backtraces and core files by performing a "show system files" and if there is anything with a recent time stamp for any time you have issues you may want to have PAN support investigate. It could be an as yet undiscovered bug.
If you feel that your 2050s aren't cutting it you may wish to approach your account managers SE about upgrading to 3020s as while they are more expensive I believe they were doing some deals for people wanting to upgrade.
02-20-2013 01:57 PM
+1 on the PA-3000 series if you are doing a lot of QoS and SSL Decryption.
If memory recalls, the PA-2000 series does QoS and SSL decryption in software whereas the PA-3000 and PA-5000 handle this in hardware.
02-20-2013 02:11 PM
The PA-2000 series implements SSL decryption in hardware and QoS and decompression are implemented in software. If you watch the below video you will see the layout of components for both the 3020 and 3050 about 3minutes in. The PA-3020 does not have a network processor so all of the routing, QoS and NAT is all done in software. The PA-3050 does have the network processor and implements it all in hardware.
I would reiterate that I would monitor my PA-2000 to see whether there is a problem with the memory usage and then raise a support case if it continues to increase as this could potentially be a bug. I only suggest discussing with your SE replacing the PA-2000 with a PA-3000 if you feel that it is not meeting your requirements after establishing whether this is normal behaviour of the PA-2000.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!