Frequent Running Of AddrObjRefresh Blocking Commits?

MikeMeredith · ‎01-10-2015

Hi!

I'm new here, so will probably be asking lots of dumb questions ... but hopefully relatively interesting dumb questions.

Currently running PANOS 6.0.7 on a pair of PA-5060s (active-passive). Not currently live (switchover day is 17th January), and we're considering upgrading to 6.1.1 next week.

Recently (last few days) ran into an issue whereby configuration commits are being blocked ("Another commit/validate is in progress. Please try again later"). I can work around this by doing a graceful reboot of our pair of PA-5060s but this seems a little extreme and I thought I would dig into it a bit deeper (and raise a support call).

Only significant change recently (did I hear "Yeah! Right!" from the hecklers at the back?), was the addition of one of our Active Directory servers to Device-> User Identification -> Group Map Settings.

What I've been able to figure out on my own was that there seems to be a job (show jobs all) called "AddrObjRefresh" that seems to be kicking off every 10 seconds or so :-

Enqueued ID Type Status Result Completed

--------------------------------------------------------------------------

2015/01/10 14:54:20 8290 AddrObjRefresh ACT PEND 0%

2015/01/10 14:54:09 8289 AddrObjRefresh FIN OK 14:54:19

2015/01/10 14:53:59 8288 AddrObjRefresh FIN OK 14:54:08

2015/01/10 14:53:48 8287 AddrObjRefresh FIN OK 14:53:58

2015/01/10 14:53:38 8286 AddrObjRefresh FIN OK 14:53:47

2015/01/10 14:53:27 8285 AddrObjRefresh FIN OK 14:53:37

2015/01/10 14:53:17 8284 AddrObjRefresh FIN OK 14:53:26

2015/01/10 14:53:06 8283 AddrObjRefresh FIN OK 14:53:16

2015/01/10 14:52:56 8282 AddrObjRefresh FIN OK 14:53:05

2015/01/10 14:52:45 8281 AddrObjRefresh FIN OK 14:52:55

2015/01/10 14:52:35 8280 AddrObjRefresh FIN OK 14:52:44

2015/01/10 14:52:24 8279 AddrObjRefresh FIN OK 14:52:34

2015/01/10 14:52:13 8278 AddrObjRefresh FIN OK 14:52:23

2015/01/10 14:52:03 8277 AddrObjRefresh FIN OK 14:52:12

2015/01/10 14:51:52 8276 AddrObjRefresh FIN OK 14:52:02

2015/01/10 14:51:42 8275 AddrObjRefresh FIN OK 14:51:51

Now for some questions :-

Would you think that this might be what is interfering with our commit issue?
Is running AddrObjRefresh so frequently a normal thing? If not, is there anything I can do to diagnose what it is trying to do?
Is it possible/sensible to try and temporarily stop the AddrObjRefresh job from being scheduled?

Please be gentle: I'm a clueless newbie

mmmccorkle · ‎01-10-2015

MikeMeredith,

There are no 'dumb' questions. We all start somewhere, so no worries!

You're on the right path with checking the jobs. Good work.

Could you run the following command in the CLI, via SSH session, and log the session output to a file?

'tail follow yes mp-log ms.log'

This will show you the output from the ms.log management log file. This may show us some more details about what is going on when these AddrObjRefresh jobs kick off.

I'm referencing a very similar case here, with a pair of 5000 series having the exact same issue. (225414)

They also had setup User-ID recently and experienced the same behavior, so this could be the same issue, or very close.

How many profiles were setup for User-ID (ldap, etc) ?

In their scenario, the firewall was having trouble communicating to the LDAP server, due to other network issues, therefore user-id was taking up more cpu/ram than expected and constantly refreshing as it attempted to pull the group information.

I will bet you $100 that if you were to disable the recent changes with User-ID and the server profiles, the issue would stop occurring. If you get the chance, please try this out and verify while it is not in production. This will help narrow down the issue.

Let me know and we can go from there,

Thanks!

MikeMeredith · ‎01-11-2015

Thanks for the reply.

I'm also pretty sure that disabling the "AD" profile will sort things out, but that's a test for Monday unless there's a CLI command along the lines of "commit when you next have a chance to do so" 🙂

There are two LDAP server profiles, but only one is enabled (the Active Directory one). I'd be mildly surprised if there are network issues between either PA-5060 and the relevant AD server as although they're on different VLANs, there not a great deal of network between them. But I'd not be very surprised to find our Active Directory is in a bit of a state - a certain well known Python script that exports the binary database (1.6Gbytes!) to a text version took over a week to run just before Xmas.

I'll raise a support call with our reseller and mention 225414.

A short extract from the output of tail follow yes mp-log ms.log:-

2015-01-11 07:38:41.515 +0000 device server refresh triggered via sysd
2015-01-11 07:38:41.515 +0000 Aborting. Another refresh in progress2015-01-11 07:38:42.162 +0000 client useridd disabled/restarted
2015-01-11 07:38:44.171 +0000 client useridd enabled 
2015-01-11 07:38:44.171 +0000 device server refresh triggered via sysd
2015-01-11 07:38:44.172 +0000 Aborting. Another refresh in progress2015-01-11 07:38:44.823 +0000 client useridd disabled/restarted
2015-01-11 07:38:45.579 +0000 client device reported Phase 1 was SUCCESSFUL
2015-01-11 07:38:46.842 +0000 client useridd enabled 
2015-01-11 07:38:46.843 +0000 device server refresh triggered via sysd
2015-01-11 07:38:46.843 +0000 dnscfgmod: Main refresh function: (unknown)
2015-01-11 07:38:46.849 +0000 dnscfgmod:Fqdn refresh job 14006 scheduled
2015-01-11 07:38:47.503 +0000 client useridd disabled/restarted
2015-01-11 07:38:49.510 +0000 client useridd enabled 
2015-01-11 07:38:49.511 +0000 device server refresh triggered via sysd
2015-01-11 07:38:49.511 +0000 Aborting. Another refresh in progress2015-01-11 07:38:50.187 +0000 client useridd disabled/restarted
2015-01-11 07:38:52.189 +0000 client useridd enabled 
2015-01-11 07:38:52.190 +0000 device server refresh triggered via sysd
2015-01-11 07:38:52.190 +0000 Aborting. Another refresh in progress2015-01-11 07:38:52.844 +0000 client useridd disabled/restarted

mmmccorkle · ‎01-11-2015

MikeMeredith,

You're welcome.

Your log output matches the same errors referenced from the ms.log file in the case I mentioned, so it looks like this may be the same issue after all.

I'm trying to relate their solution to our situation.

While raising a support call, please reference the output from ms.log, along with generating a TechSupport file for them to look at.

Specifically we are looking to see if the RAM/CPU usage of the process 'useridd' is abnormally high.

Could you do a 'show system info' and past the output? What is the uptime?

I know this sounds like a cliche` response, but have you rebooted the device since this started happening? Perhaps something is just hanging up in the management-plane, but that would be too easy.

You mentioned this is not in production yet, but have you been authenticating many users across the firewall?

Are you pulling all the groups from the AD or just some of them? How many groups are we talking about the firewall pulling/mapping ?

In the previous scenario, the useridd process was being 'oversubscribed'.

This can be potentially mitigated by limiting what groups the firewall will try to map and configuring the access list for the zone.

We are headed the right direction here.

Let me know,

Thanks

EDIT: I'm unaware of a commit type where we can do that, but it would be highly convenient, right? :smileysilly:

MikeMeredith · ‎01-11-2015

Thanks again.

I'll get onto all the necessary details tomorrow when I log a call with support. But some quick responses :-

Yes the useridd process is hogging cpu at times (show system resources follow ("top" is quicker to type) shows useridd at 99% often).
Can't list the groups through the firewall presently (I get an error), but there's a fair few - I recall seeing a figure of 3,000 but don't quote me on that.
Yes the firewalls have been rebooted after the problem arose; they operated fine for a day and then we started being unable to commit changes (as you can imagine when migrating a large ruleset from an old firewall there's a few changes to be made!).
No users authenticating (or being identified) as yet. There's no real traffic passing through, although we've done some "bench tests" - before adding in the AD details!
And lastly for now :-

msm@Hula(active)> show system info

hostname: Hula

ip-address: 10.14.4.64

netmask: 255.255.254.0

default-gateway: 10.14.5.254

ipv6-address: unknown

ipv6-link-local-address: fe80::290:bff:fe37:dd0c/64

ipv6-default-gateway:

mac-address: 00:90:0b:37:dd:0c

time: Sun Jan 11 20:19:24 2015

uptime: 2 days, 13:34:16

family: 5000

model: PA-5060

serial: 001901000769

sw-version: 6.0.7

global-protect-client-package-version: 0.0.0

app-version: 480-2519

app-release-date: 2015/01/06 14:56:48

av-version: 1459-1932

av-release-date: 2015/01/08 04:00:01

threat-version: 480-2519

threat-release-date: 2015/01/06 14:56:48

wildfire-version: 0

wildfire-release-date: unknown

url-filtering-version: 0000.00.00.000

global-protect-datafile-version: 0

global-protect-datafile-release-date: unknown

logdb-version: 6.0.6

platform-family: 5000

logger_mode: False

vpn-disable-mode: off

operational-mode: normal

multi-vsys: off

mmmccorkle · ‎01-11-2015

MikeMeredith,

Do you need the firewall to pull all groups from the AD, or could we get granular and only have it pull a set of specific ones? That should keep the useridd process cpu/mem usage down.

That has to be the issue then... useridd is running above its means, so we need to optimize it I suppose.

Let me know how it goes!

Don't forget to mark any answers here as 'correct' or 'helpful'.

Thanks!

Wenar · ‎01-12-2015

Do you use FQDN Objects or dynamic Objects?

What do you see if you use the following CLI command "request system fqdn show"?

You could try to stop the last commit and then commit your configuration.

>show jobs all

-remember the job id for the AddrObjRefresh commit

>delete job id <id you want to delete>

>commit

Unlock your full community experience!

Frequent Running Of AddrObjRefresh Blocking Commits?

Frequent Running Of AddrObjRefresh Blocking Commits?

Show your appreciation!