Cortex XDR Agent profile: content auto-update delay

RobertoPastorino · ‎06-08-2022

Hi everyone, I was wondering how the content auto-update delay feature works when a CU borks a system.

Last week we experienced a sudden spike in cpu and ram usage, and the affected machines crawled and stuttered, impacting production. Support told us to wait for a specific content update to be released, which would (as had been appened) correct the problem.

So, in a scenario like this:

Agent profile grp_CriticalServers : cu delay: 3 days.
Agent profile grp_Workstations: cu delay: none

day 0: CU 500-00001 released, applied on grp_Workstations , grp_CriticalServers still on CU 500-00000
day 1: CU 500-00001 works as expected
day 2: CU 500-00002 wreak havoc on grp_Workstations, grp_CriticalServer still on 500-0000

day 3: CU 500-00003 and CU 500-00004 released , grp_Workstations now working normally.

Question: at day 3, which content update will be served to the grp_CriticalServers? The last available 00004 ? 00001 and then after another 3 days, all the critical servers will be affected by the problematic CU 00002?

In the first case, going straight to the last available come with some risks, the latter is not acceptable if there is no way to deprecate a CU. Or there is?

The end goal is to use the vast majority of machines as canary for the critical servers.

How did you manage a situation like that?

Regards

bbarmanroy · ‎06-08-2022

Hi @RobertoPastorino you can consider using rollout delay for Content Updates to meet your needs. You will need to create a separate Agent Settings Profile and assign them to targetted endpoints.

Refer to Step 12: https://docs.paloaltonetworks.com/cortex/cortex-xdr/cortex-xdr-pro-admin/endpoint-security/customiza...

RobertoPastorino · ‎06-08-2022

Hi, thank you for your reply, but this has already been considered.

What I don't know is the way the content updates are managed in the scenario depicted.

What happens when a CU is going rogue and start to create problems to the endpoints were is deployed? In a delayed deployment , there is a method to exclude a specific content update from the available ones? This should be the ratio behind the delay , test a CU before deployment in a critical environment. But how can I be certain that a delayed, problematic update, won't be pushed to the agent?

BR

afurze · ‎06-08-2022

Hi RobertoPastorino,

In a case like this where a CU is identified as causing issues by Palo Alto Networks, the CU gets rolled back and then replaced. The endpoints that are delayed will never receive the "bad" CU, they will just get the next CU after the delay period ends. For example:

Hour 0: New CU released, agents with out delay get updated

Hour 24: CU is identified by PANW as causing issues, CU is rolled back

Hour 48: New CU is released, agents without delay get updated

* 72 hours after new CU released, so hour 120 *

Hour 120: New CU is installed to agents with delay

These are just hypothetical numbers, issue discovery, CU rollback and replacement are always dynamic and unique to a specific issue.

RobertoPastorino · ‎06-08-2022

Fantastic, this is what I hoped to be the case 🙂

So, content updates are not cached locally and are revoked / Replaced by PAN .

Thank you!

Luc_Desaulniers · ‎06-09-2022

Are you 100% sure of that?

This is what logically should happen, but I ran into the exact issue that the OP is referring to where a CU was pushed, broke things, the next day it got fixed by a new CU, but my machines that were set to a 3 days delay picked up the broken CU first.

This happened to my environment back in early March 2022.

Maybe things have changed now, but I can confirm that OP's concern is a valid one as I've seen it happen. There is no way for us customers to exclude a content update. What I ended up doing was work my way around it by changing the delay to a very long delay and then once I was sure the latest wasn't causing issues, I switched my initial group that had the delay setup to immediate content so they would pick up the latest one and then changed it back to 3 days delay once they were all to the point I wanted them.

afurze · ‎06-09-2022

Hi Luc_Desauliniers,

It does of course depend on Palo Alto support identifying a widespread issue and engineering deciding to rollback the CU, but in the case with this most recent CU issue, I can confirm it was indeed rolled back. In your case it is possible it was not rolled back by engineering. I recommend you speak with your account team to get a feature request submitted to identify a better solution for managing CUs if needed.

Unlock your full community experience!

Cortex XDR Agent profile: content auto-update delay

Cortex XDR Agent profile: content auto-update delay

Show your appreciation!