Parsing at Broker VM level

Showing results for 
Show  only  | Search instead for 
Did you mean: 

Parsing at Broker VM level

L1 Bithead

I'm using COLLECT parsing rule to manipulate data at broker VM level before ingestion

 Rule basically filters out on raw log that I generate specific to my test like some log line that contains text criticalevent along with some date and random machine name.

[Collect: vendor="unknown", product="unknown", target_broker=(mybroker), no_hit=drop]

filter _raw_log contains "criticalevent"

|alter a= someregex fn

|alter b=someregex fn

[Ingest:vendor="unknown", product="unknown", target_dataset="my_parsed_logs", no_hit=drop]

fields a,b,c ..


Now the resulting dataset gets all data and not the filtered data. If I put same filter condition inside ingest section then it works. But does that mean it happened at broker vm or at xdr side..


Is there something missing her

Coz, If I directly do Ingest without doing collect  and directly into the same dataset then it gives desired result. But I don't think it happens at broker. Like for e.g.


[Ingest:vendor="unknown", product="unknown", target_dataset="unknown_unknown_raw", no_hit=drop]

Filter _raw_log contains "criticalevent"



Am i missing something here in understanding it??




L5 Sessionator

Hello @Fm12345 ,


Thank you for reaching out on Live community.


Would like to clarify few things first of all.

Ingest: An INGEST section is used to define the resulting dataset.


Collect: A COLLECT section defines a rule that enables data reduction and data manipulation at the Broker VM to help avoid sending unnecessary data to the Cortex XDR server and reduces traffic, storage, and computing costs.


Below is the sample which you can refer and correct your query as per the need.

[COLLECT:vendor="Apache", product="ApacheServer", target_brokers = (bvm1, bvm2, bvm3), no_hit = drop]
alter source_log = json_extract_scalar(_raw_log, "$.source")
| filter source_log = "WebApp-Logs"
| fields source_log, _raw_log;
[INGEST:vendor="Apache", product="ApacheServer", target_dataset = "dvwa_application_log"]
alter log_timestamp = json_extract_scalar(_raw_log, "$.timestamp")
| alter log_msg = json_extract_scalar(_raw_log, "$.msg")
| alter log_remote_ip = json_extract_scalar(_raw_log, "$.Remote_IP")
| alter scanned_ip = json_extract_scalar(_raw_log, "$.Scanned_IP")
| fields log_msg ,log_remote_ip ,log_timestamp ,source_log ,scanned_ip , _raw_log;


Incase any further assistance is required, please feel free to reach out.



Ashutosh Patil
  • 1 replies
  • 78 Subscriptions
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!