Commit failure 4.1.10

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

Commit failure 4.1.10

L3 Networker

After my Palo Alto 2050 in HA active/passive is up for about 1 week, I begin to get errors committing policies.

Management server failed to send phase 1 abourt to client logrcvr

Management server failed to send phase 1 abourt to client sslvpn

Management server failed to send phase 1 abourt to client websrvr

commit failed

This gets worse as uptime increases.  This problem existed in 4.1.9 and 4.1.8 as well.  The problem began when I started using FQDN object, although I do not know if that is related to this issue.

If I commit, eventually it will take.   This occurs from both the web GUI as well as the CLI.

I contacted support, and they simply suggested a commit force. 

Can anyone shed any additional insight into this problem? 

Thank you.

8 REPLIES 8

L4 Transporter

The short answer is the management process is running high, and is consuming all of the memory/CPU on the management side. The command debug software restart management-server will solve the problem temporarily. It will take 10-15 minutes before you can log back into the firewall to make the commit.

PAN Support will need your tech-support files to see why it is running so high.

L5 Sessionator

Hi,

There can be few reason why  the commit is failing.

As mentioned above it is highly possible that the management plane is running high.

Running of high CPU can be because of a process (which can be any process user id,  reporting etc)

You can run the following command on the device to see if you management is running high

show system resources or

show system resources follow

Here is a sample output

admin@> show system resources

top - 11:00:16 up 22 days, 18:12,  1 user,  load average: 0.29, 0.08, 0.11

Tasks: 101 total,   2 running,  99 sleeping,   0 stopped,   0 zombie

Cpu(s):  2.2%us,  1.4%sy,  1.5%ni, 94.1%id,  0.6%wa,  0.0%hi,  0.1%si,  0.0%st

Mem:    995872k total,   915016k used,    80856k free,     4836k buffers

Swap:  2212876k total,   863728k used,  1349148k free,   165548k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

20344       30  10 47116 7336 4144 S   46  0.7   0:00.24 pan_logdb_index

20345       30  10 26984 4260 1948 R   15  0.4   0:00.08 sdb

19778       30  10  3896 1284 1112 S    4  0.1   0:02.05 genindex.sh

20338       20   0  4468 1028  800 R    4  0.1   0:00.05 top

    1       20   0  1836  564  536 S    0  0.1   0:02.48 init

    2       20   0     0    0    0 S    0  0.0   0:00.00 kthreadd

    3       RT   0     0    0    0 S    0  0.0   0:08.24 migration/0

    4       20   0     0    0    0 S    0  0.0   0:00.17 ksoftirqd/0

    5       RT   0     0    0    0 S    0  0.0   0:08.23 migration/1

    6       20   0     0    0    0 S    0  0.0   0:00.07 ksoftirqd/1

    7       20   0     0    0    0 S    0  0.0   1:46.39 events/0

    8       20   0     0    0    0 S    0  0.0   0:40.84 events/1

    9       20   0     0    0    0 S    0  0.0   0:00.02 khelper

   12       20   0     0    0    0 S    0  0.0   0:00.00 async/mgr

  112       20   0     0    0    0 S    0  0.0   0:00.00 sync_supers

  114       20   0     0    0    0 S    0  0.0   0:00.00 bdi-default

  115       20   0     0    0    0 S    0  0.0   0:11.58 kblockd/0

  116       20   0     0    0    0 S    0  0.0   0:04.81 kblockd/1

  125       20   0     0    0    0 S    0  0.0   0:00.00 ata/0

  126       20   0     0    0    0 S    0  0.0   0:00.00 ata/1

  127       20   0     0    0    0 S    0  0.0   0:00.00 ata_aux

  132       20   0     0    0    0 S    0  0.0   0:00.00 khubd

  135       20   0     0    0    0 S    0  0.0   0:00.00 kseriod

  156       20   0     0    0    0 S    0  0.0   0:00.00 rpciod/0

  157       20   0     0    0    0 S    0  0.0   0:00.00 rpciod/1

  172       20   0     0    0    0 S    0  0.0  36:30.13 kswapd0

  173       20   0     0    0    0 S    0  0.0   0:00.00 aio/0

  174       20   0     0    0    0 S    0  0.0   0:00.00 aio/1

  175       20   0     0    0    0 S    0  0.0   0:00.00 nfsiod

  732       20   0     0    0    0 S    0  0.0   0:00.04 octeon-ethernet

  760       20   0     0    0    0 S    0  0.0   0:00.00 scsi_eh_0

  765       20   0     0    0    0 S    0  0.0   0:00.99 mtdblockd

  793       20   0     0    0    0 S    0  0.0   0:00.00 usbhid_resumer

  833       20   0     0    0    0 S    0  0.0   0:41.97 kjournald

  886       16  -4  1996  404  400 S    0  0.0   0:01.23 udevd

1867       20   0     0    0    0 S    0  0.0   0:02.78 kjournald

1868       20   0     0    0    0 S    0  0.0   0:00.00 kjournald

1997       20   0     0    0    0 S    0  0.0   1:23.86 flush-8:0

2060       20   0  2008  620  572 S    0  0.1   0:10.14 syslogd

2063       20   0  1892  332  328 S    0  0.0   0:00.02 klogd

2072       20   0  1872  332  236 S    0  0.0   0:04.56 irqbalance

2080 rpc       20   0  2084  492  488 S    0  0.0   0:00.00 portmap

2098       20   0  2116  652  648 S    0  0.1   0:00.05 rpc.statd

2167       20   0  6868  584  500 S    0  0.1   0:02.87 sshd

2215       20   0  6804  388  384 S    0  0.0   0:00.00 sshd

2224       20   0  3280  620  616 S    0  0.1   0:00.03 xinetd

2243       20   0     0    0    0 S    0  0.0   0:00.00 lockd

2244       20   0     0    0    0 S    0  0.0   2:02.54 nfsd

2245       20   0     0    0    0 S    0  0.0   2:01.79 nfsd

2246       20   0     0    0    0 S    0  0.0   2:10.59 nfsd

2247       20   0     0    0    0 S    0  0.0   2:05.77 nfsd

2248       20   0     0    0    0 S    0  0.0   2:09.80 nfsd

2249       20   0     0    0    0 S    0  0.0   2:03.58 nfsd

2250       20   0     0    0    0 S    0  0.0   2:01.06 nfsd

2251       20   0     0    0    0 S    0  0.0   2:07.07 nfsd

2254       20   0  2488  672  580 S    0  0.1   0:01.55 rpc.mountd

2312        0 -20 65136 4624 1888 S    0  0.5  42:19.94 masterd_core

2315       20   0  1888  456  452 S    0  0.0   0:00.01 agetty

2322        0 -20 27864 1412 1036 S    0  0.1   7:29.42 masterd_manager

2329       15  -5 36656 2008 1216 S    0  0.2 254:45.16 sysd

2331        0 -20 32224 5084 1068 S    0  0.5  69:50.52 masterd_manager

2337       20   0 91984 2988 1676 S    0  0.3   1:53.24 dagger

2338       30  10 40568 3624 1656 S    0  0.4  59:15.43 python

2339       20   0 84284 3664 1644 S    0  0.4   0:39.58 cryptod

2340       20   0  166m 1760 1196 S    0  0.2   2:15.46 sysdagent

2354       20   0  7212  612  608 S    0  0.1   0:00.07 tscat

2357       20   0 71580 1056  928 S    0  0.1   0:09.70 brdagent

2358       20   0 31912 1084  928 S    0  0.1   0:25.94 ehmon

2359       20   0 47496 1036  908 S    0  0.1   0:01.15 chasd

2451       20   0     0    0    0 S    0  0.0   0:11.75 kjournald

2492       20   0  2900  628  572 S    0  0.1   0:03.45 crond

2503       20   0  646m  64m  63m S    0  6.7 150:48.80 useridd

2525       20   0  223m  71m 8864 S    0  7.3  45:02.84 devsrvr

2534       20   0 90584 1980 1520 S    0  0.2   0:16.09 ikemgr

2535       20   0  267m 4532 1832 S    0  0.5   2:28.18 logrcvr

2536       20   0 99744 2272 1520 S    0  0.2   0:05.44 rasmgr

2537       20   0 97720 1144  968 S    0  0.1   0:00.84 keymgr

2538       20   0  247m 2172 1532 S    0  0.2 102:25.79 varrcvr

2539       17  -3 56464 1716 1300 S    0  0.2   0:24.44 ha_agent

2540       20   0  112m 7096 1524 S    0  0.7   0:34.04 satd

2541       20   0  102m 1972 1300 S    0  0.2   0:04.48 sslmgr

2542       20   0 57136 1820 1392 S    0  0.2   0:02.48 dhcpd

2543       20   0 74708 2404 1440 S    0  0.2   0:03.97 dnsproxyd

2544       20   0 74392 1736 1356 S    0  0.2   0:04.19 pppoed

2546       20   0  141m 2708 1832 S    0  0.3   0:14.46 routed

2547       20   0  138m 4704 3540 S    0  0.5   2:16.44 authd

3796       20   0 27260 2100 1316 S    0  0.2   0:23.37 snmpd

5184 nobody    20   0  155m 6052 1552 S    0  0.6   1:49.99 appweb3

5190 nobody    20   0  122m 2208 1672 S    0  0.2   1:07.90 appweb3

16879       20   0  3744 3624 2756 S    0  0.4   0:00.02 ntpd

19653       20   0 21340 2448 2016 S    0  0.2   0:00.16 sshd

19664 admin     20   0 21476 1504 1044 S    0  0.2   0:00.03 sshd

19665 admin     20   0 97744  22m  10m S    0  2.3   0:02.97 cli

19695       20   0  2964  496  412 S    0  0.0   0:00.00 crond

19698       20   0  3720 1116  988 S    0  0.1   0:00.02 genindex_batch.

19702       20   0 33704 5176 3020 S    0  0.5   0:00.28 masterd_batch

20335 admin     20   0  2976  668  564 S    0  0.1   0:00.03 less

20337       20   0  3832 1192 1056 S    0  0.1   0:00.08 sh

20339       20   0  1940  536  464 S    0  0.1   0:00.00 sed

22550       20   0  616m 272m 3492 S    0 28.0  57:29.90 mgmtsrvr

26791 nobody    20   0  201m  24m 4296 S    0  2.5   6:35.70 appweb3

Check and see if these process are running high

mgmtsrvr ( if it at 900m and above then it is not good)

devsrvr

If the above process are high then you might have to restart.

Also check if %wa is high.

If it is high it will indicate that you have two much logging and you can try to reduce the logging.

Hopefully this helps.

Thank you

Just to clarify on the previous comment, it's the 5th (although it will look like 4th) column that you want to check if it's running over 900m. Its easier to read if you perform a "show system resources | match srvr".

I work by the rule of thumb that if either the management server or device server is over 850m consider restarting them when you have a chance, if either are over 950m then restart them as soon as you have a maintenance window and if either are over 1000m then restart them ASAP.

Be aware that a restart of the management server should not impact traffic throughput although you will lose the ability to manage the PAN device while the management server restarts. A restart of the device server will not impact existing sessions however any new sessions will not match any policy with users or groups in them due to there will be no user-ip cache or user-group cache until the device server has restarted. It's always a good idea to do this out of hours or in a maintenance window to reduce the impact of an unforeseeable event.

Also stuff that needs ssl-termination (on some models) will be affected during a reboot of the mgmtplane.

Thank you.

dogbert@PA-2050-Trailer-Rebuild(active)> show system resources | match srvr

2360       20   0  651m 106m 3192 S    0 10.9 535:48.96 mgmtsrvr

2381       20   0  422m 106m 9912 S    0 11.0 610:47.25 devsrvr

It sounds like this means PA2050 is underpowered for my needs.  Since I have an HA pair, perhaps the better thing to do would be to completely reboot the main unit so that the passive one takes over.   That way I wouldn't loose the ssl-terminations or user to IP mappings.   My shop is heavy on inbound and outbound SSL termination, and just about every outbound allow rule is based on active directory user mapping.

Thank you.

You should only really need to restart the management and device servers if there respective memory values are high or you've been directed to by support. Those values look OK but keep an eye on them over time to see if it increases over time. Another thing you can check for is the presence of backtraces and core files by performing a "show system files" and if there is anything with a recent time stamp for any time you have issues you may want to have PAN support investigate. It could be an as yet undiscovered bug.

If you feel that your 2050s aren't cutting it you may wish to approach your account managers SE about upgrading to 3020s as while they are more expensive I believe they were doing some deals for people wanting to upgrade.

+1 on the PA-3000 series if you are doing a lot of QoS and SSL Decryption.

If memory recalls, the PA-2000 series does QoS and SSL decryption in software whereas the PA-3000 and PA-5000 handle this in hardware.

The PA-2000 series implements SSL decryption in hardware and QoS and decompression are implemented in software. If you watch the below video you will see the layout of components for both the 3020 and 3050 about 3minutes in. The PA-3020 does not have a network processor so all of the routing, QoS and NAT is all done in software. The PA-3050 does have the network processor and implements it all in hardware.

Video Link : 1238

I would reiterate that I would monitor my PA-2000 to see whether there is a problem with the memory usage and then raise a support case if it continues to increase as this could potentially be a bug. I only suggest discussing with your SE replacing the PA-2000 with a PA-3000 if you feel that it is not meeting your requirements after establishing whether this is normal behaviour of the PA-2000.

  • 4289 Views
  • 8 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!