- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
02-28-2019 10:09 PM - edited 02-28-2019 10:10 PM
Hi all,
The current issue I have is the export traffic logs are too large for Expedition ML and manipulation. 1 hour worth of logs is around 2GB. 2 days is around 100GB. After Machine Learning with the 100GB logs is done (around 80k rules) it fails to import to the project. An error dialog box pops up without much detail. Is there further error logs which might shed some light on the issue? Is there a limit on the number of logs/rules to be processed?
Tried ML on smaller 2GB file and it all works as expected.
We have tried to reduce the export log size by removing duplicates via Excel, however after scp copy to expedition, the new file is no longer listed as a file for M.Learning processing to a parquet format. (interestingly, even though no file is listed, pressing the 'Process Files' button the file is now seen and is processes and then it is recognised as 'Processed by User admin'. However a parquet file is not generated. no directory or data in /datastore)
reducing the duplicates reduces the log file by around 80%.
Current details.
Expedition version 1.1.7
6 vCPU
16GB RAM
200GB HDD
Any insights would be appreciated.
Thanks,
Yung
03-01-2019 01:52 AM
Given your description, it is not a matter of how big the log files are, rather than how big it is to import the rules into the project.
What I guess that the issue may be is a limit in the packet size for MySQL inserts.
Check your file
/etc/mysql/my.conf
and verify that your max_allowed_packet is large enough (make it 4 times bigger), as well as the bulk_insert_buffer_size value.
Let's see if this helps resolving your issue
03-01-2019 01:55 AM
Btw, removing duplicate entries in the log is unnecessary. The first pre-processing pass already optimizes that and other aspects as well.
For instance, makes a summary of, what you call, duplicated connections and sums bytes send, bytes received, packs, etc.
If you check the folder "connections.parquet" that is created from the CSV files (within your Temporary Data Structures Folder), you will see that those 100GB of logs may have been reduced to some MBs.
03-03-2019 09:30 PM
Thanks @dgildelaig.
I've increased the size for both packets and buffer and it did not help with the import of the large (~80k) number of rules. Also, in this state, I can't remove the 'ML Enabled' tag on the rules. It returns with the error dialog box. I deleted the project to get around this issue.
I have now set 'ML Enabled' only one of the rule to get some a managable size of rules.
Thanks,
Yung
03-03-2019 09:34 PM
You're right, the processing to a parquet file from the logs make removing duplicates entries unnecessary.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!