PA2020 High CPU utilization "useridd" 100% management plane

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

PA2020 High CPU utilization "useridd" 100% management plane

L3 Networker

Dears,

My PA2020 has 2 agent working identifying my AD users... but the mgnt plane is running 100% all day long...

Any suggestion ?

pls find below the show resources output....

PA2020 running OS 5.0.2

top - 18:26:05 up 6 days,  1:33,  1 user, load average: 10.26, 11.02, 12.17  <<<<<<<<<<<<<<<< !!!!!

Tasks: 100 total,   2 running,  98 sleeping,   0 stopped,   0 zombie

Cpu(s): 51.9%us, 46.0%sy,  2.1%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:    995872k total,   901792k used,    94080k free,     5996k buffers

Swap:  2212876k total,   647316k used,  1565560k free,   179620k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

2373 root      20   0  209m  72m  63m S  140  7.5  10861:51 useridd<<<<<<<<<<<<<<<<< 140% CPU !!!!

21021 nobody    20   0  429m  51m 4808 S   37  5.3 329:34.26 appweb3

2042 root      30  10  4468  964  792 R    4  0.1   0:00.12 top

2371 root      20   0  651m 210m 4076 S    4 21.6 118:50.34 mgmtsrvr

1720 admin     20   0  4532 1164  912 R    1  0.1   0:02.64 top

2405 root      20   0  355m  89m 2192 S    1  9.2  48:59.31 logrcvr

2142 root      15  -5 39636 2920 1240 S    1  0.3 106:28.41 sysd

2151 root      30  10 40568 3644 1692 S    0  0.4  21:50.38 python

2408 root      20   0  247m 2480 1628 S    0  0.2   5:39.85 varrcvr

2415 root      20   0  141m 2640 1760 S    0  0.3   1:17.82 routed

    1 root      20   0  1836  560  536 S    0  0.1   0:02.30 init

55 REPLIES 55

5.x seems to require more management CPU utilization altogether than 4.x did. That shouldn't be surprising to anyone, given all the new features. We can hope that efficiency will improve as the 5.x code matures.

However, 5.0.2 (and 4.1.11) seem to have a very clear bug related to the User ID process consuming excessive resources.

Honestly I wish PA would slow down on the new features and beef up the stability/QA. It seems like we upgrade on support's advice to fix bugs and then after we upgrade we find other bugs... it's bug whack-a-mole.

Yes it doesn't get easier to recommend software versions for our customers.

Very disappointed that 4.1.11 seems to have the same bug, especially since they released that version some time after that this bug was known in 5.0.2.

/Jo Christian

I was thinking the exact same thing. I'm pretty annoyed to be rolling a box back from 4.1.11 to 4.1.10 tonight _after_ I rolled it back from 5.0.2 to 4.1 a couple weeks ago.

It's time and time again for us with PA... I like the "distruptive startup" nature of the company and all the features packed into the boxes they sell (and the complete lack of dealing with the insane licensing scheme of Check Point), but these QA issues are making it hard for me to make a case with my management to move forward with handing off more load to the PA boxes we have, especially since the Check Point firewalls we have in production seem to just hum along and "just work."

We're "dipping our toe" into Palo Alto slowly, and honestly these "bug whack-a-mole" issues are causing us to reconsider our firewall strategy.

Don't even get me started on the GlobalProtect client...

It is surely harder to get to get extremely high stability when dealing with some many things simultaneously, than it is just checking simply ACLs, That PAN has been able to do what it does so effectively is impressive now matter how you look at it.  I continue to be very very impressed with the product (of course as a partner I am biased I suppose), and wouldn't recommend anything else given the current threat landscape, but I will be taking a less aggressive approach to updating firmware for a while. I tend to try and keep on the current release under the idea that I _should_ be keeping my bug exposure down. I pushed our various boxes up through all the 4.1.x releases with no ill effects and was lulled into overconfidence I suppose. I suspect most people in these forums would say "what do you expect running the very latest release"...

We're sticking with 4.1 on a pair of our PA devices and we're still running into bugs. Not trivial stuff either... things like PA's implementation of DHCP doesn't work correctly (ticket open for a month and a half), GlobalProtect doesn't work correctly/crashes/throws errors (client and gateway)

We've got a ticket that's been open for two months for User-ID mapping not working correctly (on 4.1 code), where we basically can't use the 'user' column in our rulebase. That's a major feature that we can't take advantage of.

It's not "oh another customer is complaining and whining"... it's features that are advertised as working that weren't tested or that get broken by bugfixes.

I suppose your mileage may vary though.

L0 Member

Same issue here with 5.0.2 and PA 2050.

top - 07:57:29 up 25 days, 11:10,  1 user,  load average: 12.15, 11.85, 11.66

Tasks: 105 total,   2 running, 102 sleeping,   1 stopped,   0 zombie

Cpu(s): 32.3%us, 47.5%sy,  5.0%ni, 13.8%id,  0.8%wa,  0.1%hi,  0.4%si,  0.0%st

Mem:    995872k total,   964828k used,    31044k free,    20132k buffers

Swap:  2008084k total,   545768k used,  1462316k free,   535468k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           

1796       20   0  212m  73m  64m S  174  7.5   8106:37 useridd

L4 Transporter

Hello again,

Seems like an hotfix is out for 4.1.11 to fix this problem.

You need to contact support to get it.

Jo Christian

/Jo Christian

Not applicable

It seems PA dismissed its QA team (maybe in favor of copyright lawyers) and customers are responsible for all the testing now.

I can get the management backplane to calm down a bit by restarting useridd via the following command:

debug software restart user-id

Unsure how long it takes for useridd to get angry again after that.

SimasK if I could moderate your post and mark it "+5 Insightful" I would Smiley Happy

Well, you just answered my question - rolling back to 4.1.10 as I type.

> PA: How does this stuff get past the QA process?

More to the point - if it's a known issue which is being reported by lots of people, why do you have to log a fault to get access to the hotfix? Why doesn't PAN just release the hotfix for general distribution with a release note which specifies that it's only to fix the issue listed? This jumping through hoops to get fixes for known, impact-inducing bugs is extremely annoying.

And when I *did* log a case, the first thing I get back from the support partner is "We've escalated it to PAN for release of the hotfix, but why don't you update to 5.0.1 instead"?

And rolling back *again* after installing the "hotfix" 4.1.11-h1 because it bloody breaks the HA sync between my peers.

This is beyond a joke, Palo Alto. Does *nobody* QA these things in all possible environments before release?

  • 25145 Views
  • 55 replies
  • 2 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!