XQL query language question - removing duplicates from results

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

XQL query language question - removing duplicates from results

L1 Bithead

Just getting into querying the datasets in the Cortex Data Lake, good stuff and lots of fun.  With many of my queries I get tons of duplicate hostname results (example searching for all host running IIS - w3wp.exe - there are multiple processes running on a single host and consequently I get many rows of results for the same host) and I want to either eliminate the duplicates (using the agent_hostname column) from the results set.  Anyone know of a way to accomplish this?

 

Thanks

2 accepted solutions

Accepted Solutions

L0 Member

When querying datasets in Cortex Data Lake, especially when searching for specific processes like w3wp.exe (used by IIS), it's common to encounter multiple entries for the same host. This happens because multiple instances of the process may run on a single machine, leading to duplicate results based on the agent_hostname. To eliminate these duplicates and return only unique hostnames, you can use the dedup operator on the agent_hostname field, which filters the results to one row per hostname. Alternatively, if you want to see how many times a process appears per host, you can use a group by clause to aggregate results and count occurrences. If you need only the most recent instance for each host, combining sort on the timestamp with dedup gives you the latest entry per hostname. These techniques help streamline your results and focus your analysis more effectively.   

View solution in original post

L5 Sessionator

Hi Kenlacrosse, 

To remove duplicates on queries please have a look at the dedup stage in XQL: 

https://docs-cortex.paloaltonetworks.com/r/Cortex-XDR/Cortex-XDR-3.x-Documentation/dedup

 

If you feel this has answered your query, please let us know by clicking like and on "mark this as a Solution". Thank you.

 

KR, 

Luis

View solution in original post

2 REPLIES 2

L0 Member

When querying datasets in Cortex Data Lake, especially when searching for specific processes like w3wp.exe (used by IIS), it's common to encounter multiple entries for the same host. This happens because multiple instances of the process may run on a single machine, leading to duplicate results based on the agent_hostname. To eliminate these duplicates and return only unique hostnames, you can use the dedup operator on the agent_hostname field, which filters the results to one row per hostname. Alternatively, if you want to see how many times a process appears per host, you can use a group by clause to aggregate results and count occurrences. If you need only the most recent instance for each host, combining sort on the timestamp with dedup gives you the latest entry per hostname. These techniques help streamline your results and focus your analysis more effectively.   

L5 Sessionator

Hi Kenlacrosse, 

To remove duplicates on queries please have a look at the dedup stage in XQL: 

https://docs-cortex.paloaltonetworks.com/r/Cortex-XDR/Cortex-XDR-3.x-Documentation/dedup

 

If you feel this has answered your query, please let us know by clicking like and on "mark this as a Solution". Thank you.

 

KR, 

Luis

  • 2 accepted solutions
  • 549 Views
  • 2 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!