- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
01-23-2026 02:12 PM
Hello @K.Dadana633909 ,
Greetings for the day.
The issue of duplicate host/application sets in your results occurs because the host_inventory dataset is a log-based, point-in-time dataset. Unlike the standard endpoints dataset, which generally reflects the current state, host_inventory stores historical records of every inventory scan (which occurs by default every 24 hours). Without a deduplication stage, your query returns every historical application scan ever reported for each host in the specified timeframe.
To resolve this and ensure you are only seeing the most recent application data, follow these steps:
You must use the dedup stage to isolate the most recent scan for each host. It is most efficient to perform this deduplication before the arrayexpand and join stages to reduce the number of rows processed.
Recommended deduplication logic:
dataset = host_inventory
| dedup host_name by desc _time
This ensures you only process the single latest inventory report for each hostname. If your environment has hosts with duplicate names, deduplicate by agent_id or serial_number instead.
For application-specific queries, using the dedicated preset is often more efficient and is the recommended approach for software inventory reports. This preset is optimized for application data and reduces the need for manual JSON extraction from the applications field.
If you are running Cortex XDR/XSIAM version 3.16, be aware of a known bug (CRTX-209483) that specifically caused the host_inventory dataset to return massive amounts of duplicate historical data regardless of filters. Ensure your tenant has received the latest backend hotfixes to address this.
Based on your original query, here is an optimized version that incorporates deduplication and performance best practices:
// Set a specific timeframe to avoid scanning unnecessary historical logs
config timeframe = 7d
| dataset = host_inventory
// 1. Get the latest scan per host before expanding applications
| dedup host_name by desc _time
| filter applications != null
| arrayexpand applications
| alter
applications = json_extract(applications, "$.application_name"),
software_vendor = json_extract(applications, "$.vendor"),
software_version = json_extract(applications, "$.version")
// 2. Join with deduped endpoints to ensure one-to-one mapping
| join (dataset = endpoints | dedup endpoint_name) as EP EP.endpoint_name = host_name
| fields host_name, applications, software_version, group_names
| arrayexpand group_names limit 1
| sort asc group_names
If you feel this has answered your query, please let us know by clicking like and on "mark this as a Solution".
Happy New year!!
Thanks & Regards,
S. Subashkar Sekar