- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
01-10-2015 07:14 AM
Hi!
I'm new here, so will probably be asking lots of dumb questions ... but hopefully relatively interesting dumb questions.
Currently running PANOS 6.0.7 on a pair of PA-5060s (active-passive). Not currently live (switchover day is 17th January), and we're considering upgrading to 6.1.1 next week.
Recently (last few days) ran into an issue whereby configuration commits are being blocked ("Another commit/validate is in progress. Please try again later"). I can work around this by doing a graceful reboot of our pair of PA-5060s but this seems a little extreme and I thought I would dig into it a bit deeper (and raise a support call).
Only significant change recently (did I hear "Yeah! Right!" from the hecklers at the back?), was the addition of one of our Active Directory servers to Device-> User Identification -> Group Map Settings.
What I've been able to figure out on my own was that there seems to be a job (show jobs all) called "AddrObjRefresh" that seems to be kicking off every 10 seconds or so :-
Enqueued ID Type Status Result Completed
--------------------------------------------------------------------------
2015/01/10 14:54:20 8290 AddrObjRefresh ACT PEND 0%
2015/01/10 14:54:09 8289 AddrObjRefresh FIN OK 14:54:19
2015/01/10 14:53:59 8288 AddrObjRefresh FIN OK 14:54:08
2015/01/10 14:53:48 8287 AddrObjRefresh FIN OK 14:53:58
2015/01/10 14:53:38 8286 AddrObjRefresh FIN OK 14:53:47
2015/01/10 14:53:27 8285 AddrObjRefresh FIN OK 14:53:37
2015/01/10 14:53:17 8284 AddrObjRefresh FIN OK 14:53:26
2015/01/10 14:53:06 8283 AddrObjRefresh FIN OK 14:53:16
2015/01/10 14:52:56 8282 AddrObjRefresh FIN OK 14:53:05
2015/01/10 14:52:45 8281 AddrObjRefresh FIN OK 14:52:55
2015/01/10 14:52:35 8280 AddrObjRefresh FIN OK 14:52:44
2015/01/10 14:52:24 8279 AddrObjRefresh FIN OK 14:52:34
2015/01/10 14:52:13 8278 AddrObjRefresh FIN OK 14:52:23
2015/01/10 14:52:03 8277 AddrObjRefresh FIN OK 14:52:12
2015/01/10 14:51:52 8276 AddrObjRefresh FIN OK 14:52:02
2015/01/10 14:51:42 8275 AddrObjRefresh FIN OK 14:51:51
Now for some questions :-
Please be gentle: I'm a clueless newbie
01-10-2015 02:42 PM
MikeMeredith,
There are no 'dumb' questions. We all start somewhere, so no worries!
You're on the right path with checking the jobs. Good work.
Could you run the following command in the CLI, via SSH session, and log the session output to a file?
'tail follow yes mp-log ms.log'
This will show you the output from the ms.log management log file. This may show us some more details about what is going on when these AddrObjRefresh jobs kick off.
I'm referencing a very similar case here, with a pair of 5000 series having the exact same issue. (225414)
They also had setup User-ID recently and experienced the same behavior, so this could be the same issue, or very close.
How many profiles were setup for User-ID (ldap, etc) ?
In their scenario, the firewall was having trouble communicating to the LDAP server, due to other network issues, therefore user-id was taking up more cpu/ram than expected and constantly refreshing as it attempted to pull the group information.
I will bet you $100 that if you were to disable the recent changes with User-ID and the server profiles, the issue would stop occurring. If you get the chance, please try this out and verify while it is not in production. This will help narrow down the issue.
Let me know and we can go from there,
Thanks!
01-11-2015 12:13 AM
Thanks for the reply.
I'm also pretty sure that disabling the "AD" profile will sort things out, but that's a test for Monday unless there's a CLI command along the lines of "commit when you next have a chance to do so" 🙂
There are two LDAP server profiles, but only one is enabled (the Active Directory one). I'd be mildly surprised if there are network issues between either PA-5060 and the relevant AD server as although they're on different VLANs, there not a great deal of network between them. But I'd not be very surprised to find our Active Directory is in a bit of a state - a certain well known Python script that exports the binary database (1.6Gbytes!) to a text version took over a week to run just before Xmas.
I'll raise a support call with our reseller and mention 225414.
A short extract from the output of tail follow yes mp-log ms.log:-
2015-01-11 07:38:41.515 +0000 device server refresh triggered via sysd 2015-01-11 07:38:41.515 +0000 Aborting. Another refresh in progress2015-01-11 07:38:42.162 +0000 client useridd disabled/restarted 2015-01-11 07:38:44.171 +0000 client useridd enabled 2015-01-11 07:38:44.171 +0000 device server refresh triggered via sysd 2015-01-11 07:38:44.172 +0000 Aborting. Another refresh in progress2015-01-11 07:38:44.823 +0000 client useridd disabled/restarted 2015-01-11 07:38:45.579 +0000 client device reported Phase 1 was SUCCESSFUL 2015-01-11 07:38:46.842 +0000 client useridd enabled 2015-01-11 07:38:46.843 +0000 device server refresh triggered via sysd 2015-01-11 07:38:46.843 +0000 dnscfgmod: Main refresh function: (unknown) 2015-01-11 07:38:46.849 +0000 dnscfgmod:Fqdn refresh job 14006 scheduled 2015-01-11 07:38:47.503 +0000 client useridd disabled/restarted 2015-01-11 07:38:49.510 +0000 client useridd enabled 2015-01-11 07:38:49.511 +0000 device server refresh triggered via sysd 2015-01-11 07:38:49.511 +0000 Aborting. Another refresh in progress2015-01-11 07:38:50.187 +0000 client useridd disabled/restarted 2015-01-11 07:38:52.189 +0000 client useridd enabled 2015-01-11 07:38:52.190 +0000 device server refresh triggered via sysd 2015-01-11 07:38:52.190 +0000 Aborting. Another refresh in progress2015-01-11 07:38:52.844 +0000 client useridd disabled/restarted
01-11-2015 01:06 AM
MikeMeredith,
You're welcome.
Your log output matches the same errors referenced from the ms.log file in the case I mentioned, so it looks like this may be the same issue after all.
I'm trying to relate their solution to our situation.
While raising a support call, please reference the output from ms.log, along with generating a TechSupport file for them to look at.
Specifically we are looking to see if the RAM/CPU usage of the process 'useridd' is abnormally high.
Could you do a 'show system info' and past the output? What is the uptime?
I know this sounds like a cliche` response, but have you rebooted the device since this started happening? Perhaps something is just hanging up in the management-plane, but that would be too easy.
You mentioned this is not in production yet, but have you been authenticating many users across the firewall?
Are you pulling all the groups from the AD or just some of them? How many groups are we talking about the firewall pulling/mapping ?
In the previous scenario, the useridd process was being 'oversubscribed'.
This can be potentially mitigated by limiting what groups the firewall will try to map and configuring the access list for the zone.
We are headed the right direction here.
Let me know,
Thanks
EDIT: I'm unaware of a commit type where we can do that, but it would be highly convenient, right? :smileysilly:
01-11-2015 12:27 PM
Thanks again.
I'll get onto all the necessary details tomorrow when I log a call with support. But some quick responses :-
msm@Hula(active)> show system info
hostname: Hula
ip-address: 10.14.4.64
netmask: 255.255.254.0
default-gateway: 10.14.5.254
ipv6-address: unknown
ipv6-link-local-address: fe80::290:bff:fe37:dd0c/64
ipv6-default-gateway:
mac-address: 00:90:0b:37:dd:0c
time: Sun Jan 11 20:19:24 2015
uptime: 2 days, 13:34:16
family: 5000
model: PA-5060
serial: 001901000769
sw-version: 6.0.7
global-protect-client-package-version: 0.0.0
app-version: 480-2519
app-release-date: 2015/01/06 14:56:48
av-version: 1459-1932
av-release-date: 2015/01/08 04:00:01
threat-version: 480-2519
threat-release-date: 2015/01/06 14:56:48
wildfire-version: 0
wildfire-release-date: unknown
url-filtering-version: 0000.00.00.000
global-protect-datafile-version: 0
global-protect-datafile-release-date: unknown
logdb-version: 6.0.6
platform-family: 5000
logger_mode: False
vpn-disable-mode: off
operational-mode: normal
multi-vsys: off
01-11-2015 12:43 PM
MikeMeredith,
Do you need the firewall to pull all groups from the AD, or could we get granular and only have it pull a set of specific ones? That should keep the useridd process cpu/mem usage down.
That has to be the issue then... useridd is running above its means, so we need to optimize it I suppose.
Let me know how it goes!
Don't forget to mark any answers here as 'correct' or 'helpful'.
Thanks!
01-12-2015 01:00 AM
Do you use FQDN Objects or dynamic Objects?
What do you see if you use the following CLI command "request system fqdn show"?
You could try to stop the last commit and then commit your configuration.
>show jobs all
-remember the job id for the AddrObjRefresh commit
>delete job id <id you want to delete>
>commit
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!