Join two data sets XQL

Showing results for 
Show  only  | Search instead for 
Did you mean: 
Please sign in to see details of an important advisory in our Customer Advisories area.

Join two data sets XQL

L2 Linker

Before I get into my failed script, please allow me to explain what Im attempting to do.


We are looking for windows and Macintosh devices with cortex xdr agent NOT installed.  To do this we installed the cortex DHCP log collector. It is reporting its findings into asset manager.  The problem is that when an IP changes, the agent doesnt show this in the endpoint manager and as a result the cortex asset manager reports devices as without the agent when they actually do (cortex please fix this).  So we have hundreds of false positives.


So, what we have to do is when DHCP logger reports a device with a new IP we have to look in endpoint manager for the name of the device. If we find it, the agent is installed, if we dont we need to audit the machine and install the agent if need be.


Now, I could export both of these into a TSV, bring them into excel and do the work (and I have),.. however, we want this to be a report in the portal and we want to send alerts..  


So, went into the xql query builder and tried a bunch of stuff..  Here is my lastest, which shows no errors in the edit box but the report finds nothing..  

dataset = microsoft_dhcp_raw
//| filter hostName != ""
| alter hn1 = split(hostName ,".")
/ dedup hostName
//| dedup ipAddress
//| union (dataset = endpoints)
//| filter endpoint_name = hn1
|join conflict_strategy = both type = inner (dataset = endpoints ) as EP EP.endpoint_name = hn1
| fields ipAddress,ip_address, hostName,hn1,endpoint_name


When this didnt work, I figured it had something to do with my alter statement and used an IP which should surely have a match:

dataset = microsoft_dhcp_raw
//| filter hostName != ""
| alter hn1 = split(hostName ,".")
|// dedup hostName
| dedup ipAddress
| union (dataset = endpoints)
//| filter endpoint_name = hn1
|join conflict_strategy = both type = inner (dataset = endpoints ) as EP EP.ip_address = ipAddress
| fields ipAddress,ip_address, hostName,hn1,endpoint_name


As you can see I played around with UNION as I have no idea what Im doing.  I left my commented out statements just to show what Ive played with.  Also, this would get me matches if it worked.. so how would I show no matches found for the join?

Both show no errors in the editor but when I run it I just get: ERROR: FAILED TO RUN with no result.


Im probably going about this all wrong, so any help would be appreciated.



L2 Linker

Just a quick note,
JOIN - combines data into new columns.
UNION - combines data into new rows

So if i understood your use case correctly , i may do it somehow like the below:

dataset = microsoft_dhcp_raw
| filter hostName != "" and ipAddress != ""
| alter hn1 = split(hostName ,".")
| join (dataset = endpoints ) as EP EP.endpoint_name = hostName
| dedup hostName
| dedup ipAddress



and this shouldn't return any run error , and if a match found by the hostname you should be able to see that in the result pane. 


I hope this helps.


Thank you   This helped.  I didnt explain why I was using the alter. It is because the hostname field has in it and the "endpoint_name" fied does not.  I was able to complete the script. 


dataset = microsoft_dhcp_raw

| alter hn1 = if (hostname contains ".domain.local",

| join (dataset = endpoints ) as EP EP.endpoint_name = hn1
| dedup hn1
| fields hostName ,hn1 ,endpoint_name,ipAddress


This gives me any hostnames that are found in both microsoft_dhcp_raw and endpoints.  Hopefully I can figure out how to find the reverse, anything in Microsoft_dhcp_raw that is not in "endpoints" as well. I think I went about this all wrong to be honest but at least I learned how to join two fields where one needs to be altered to match. 🙂


Thanks again!




L2 Linker

you are welcome, glad this at least shed some light.


L1 Bithead

Incase anyone else had the same issue, (needing to Not Match/AntiJoin in XQL), here is a way I did it without using Join. This also finishes OP's question, since the below query also shows which devices do NOT have an event, or when they do NOT have software installed:


Example for Restart event log NOT happening:

dataset = xdr_data // Using the xdr dataset
| alter conditional = if(action_evtlog_event_id = 1074,1,0) //if restart event happens, 1 is returned, anything else, then 2 is returned. Restart Event could be replaced with anything
| fields agent_hostname as hostname, conditional as restartcount // Selecting the relevant fields
| comp sum(restartcount) as totalrestarts by hostname // adds up and summarizes all "restartcount" values
| filter (totalrestarts = 0) // if 0, that means that the sum of all restart events is 0 (ie. it never restarted)


Example for chrome NOT Installed:

dataset = xdr_data // Using the xdr dataset
| filter event_type = ENUM.PROCESS and event_sub_type = ENUM.PROCESS_START //makes sure we get program logs
| alter conditional = if(action_process_image_name ~= "chrome",1,0)//filters names that include "chrome"
| fields action_process_image_path as Process_Path, action_process_image_command_line as Process_CMD, action_process_os_pid as Process_PID, actor_process_image_path as Parent_Process, actor_process_command_line as Parent_CMD, agent_hostname as hostname, conditional as runcount // Selecting some relevant fields
| comp sum(runcount) as totalruns by hostname //sums total runs by host
| filter (totalruns = 0) // like before, change this to > if you want to see the opposite results


This shows how you check for machines that never had an event/program occur. You can replace Chrome.exe or the Event ID with anything.


Edit: As shown in my next reply, you can combine the above method with Join, and use that to identify items that don't match between 2 datasets (basically an Anti Join)

L1 Bithead

Like mentioned in my previous reply, the sum method can be used to identify which Windows DHCP devices don't have Cortex, as shown below.


dataset = microsoft_dhcp_raw
| filter hostName != "" and ipAddress != "" //first few lines are same as OP
| alter FormattedName = if (hostname contains ".domain.local",replace(hostname,".domain.local",""),hostname)//replace .domain.local with your domain when running
| join conflict_strategy = left type = left (dataset = endpoints ) as ed ed.endpoint_name = FormattedName //left join ensures that all is returned from DHCP, and only matches from Endpoint
| alter conditional = if(FormattedName = endpoint_name, 1, 0)//if there is a match, it returns 1, otherwise, 0
| fields FormattedName , endpoint_name, conditional
| comp sum(conditional) as totalconnections by FormattedName // by summing on the conditional, if the sum is 0, that means there are 0 logs where DHCP matched with one of your endpoints
| filter (totalconnections = 0) // if you changed this to >0, you will get all devices in DHCP that ARE matched in the Cortex List


Just to mention, make sure you are retrieving enough data from DHCP for this to be accurate. You may need to adjust limits if you have enough DHCP logs that Cortex automatically stops too early.

  • 5 replies
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!