- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
07-29-2025 04:29 PM
Just getting into querying the datasets in the Cortex Data Lake, good stuff and lots of fun. With many of my queries I get tons of duplicate hostname results (example searching for all host running IIS - w3wp.exe - there are multiple processes running on a single host and consequently I get many rows of results for the same host) and I want to either eliminate the duplicates (using the agent_hostname column) from the results set. Anyone know of a way to accomplish this?
Thanks
07-29-2025 10:19 PM
When querying datasets in Cortex Data Lake, especially when searching for specific processes like w3wp.exe (used by IIS), it's common to encounter multiple entries for the same host. This happens because multiple instances of the process may run on a single machine, leading to duplicate results based on the agent_hostname. To eliminate these duplicates and return only unique hostnames, you can use the dedup operator on the agent_hostname field, which filters the results to one row per hostname. Alternatively, if you want to see how many times a process appears per host, you can use a group by clause to aggregate results and count occurrences. If you need only the most recent instance for each host, combining sort on the timestamp with dedup gives you the latest entry per hostname. These techniques help streamline your results and focus your analysis more effectively.
07-29-2025 10:26 PM
Hi Kenlacrosse,
To remove duplicates on queries please have a look at the dedup stage in XQL:
https://docs-cortex.paloaltonetworks.com/r/Cortex-XDR/Cortex-XDR-3.x-Documentation/dedup
If you feel this has answered your query, please let us know by clicking like and on "mark this as a Solution". Thank you.
KR,
Luis
07-29-2025 10:19 PM
When querying datasets in Cortex Data Lake, especially when searching for specific processes like w3wp.exe (used by IIS), it's common to encounter multiple entries for the same host. This happens because multiple instances of the process may run on a single machine, leading to duplicate results based on the agent_hostname. To eliminate these duplicates and return only unique hostnames, you can use the dedup operator on the agent_hostname field, which filters the results to one row per hostname. Alternatively, if you want to see how many times a process appears per host, you can use a group by clause to aggregate results and count occurrences. If you need only the most recent instance for each host, combining sort on the timestamp with dedup gives you the latest entry per hostname. These techniques help streamline your results and focus your analysis more effectively.
07-29-2025 10:26 PM
Hi Kenlacrosse,
To remove duplicates on queries please have a look at the dedup stage in XQL:
https://docs-cortex.paloaltonetworks.com/r/Cortex-XDR/Cortex-XDR-3.x-Documentation/dedup
If you feel this has answered your query, please let us know by clicking like and on "mark this as a Solution". Thank you.
KR,
Luis
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!