<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove duplicate entries to reduce log file size for ML in Expedition Discussions</title>
    <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251878#M1225</link>
    <description>&lt;P&gt;Given your description, it is not a matter of how big the log files are, rather than how big it is to import the rules into the project.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I guess that the issue may be is a limit in the packet size for MySQL inserts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Check your file&lt;/P&gt;
&lt;P&gt;/etc/mysql/my.conf&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and verify that your max_allowed_packet is large enough (make it 4 times bigger), as well as the bulk_insert_buffer_size value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's see if this helps resolving your issue&lt;/P&gt;</description>
    <pubDate>Fri, 01 Mar 2019 09:52:53 GMT</pubDate>
    <dc:creator>dgildelaig</dc:creator>
    <dc:date>2019-03-01T09:52:53Z</dc:date>
    <item>
      <title>Remove duplicate entries to reduce log file size for ML</title>
      <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251862#M1224</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The current issue I have is the export traffic logs are too large for Expedition ML and manipulation. 1 hour worth of logs is around 2GB. 2 days is around 100GB. After Machine Learning with the 100GB logs is done (around 80k rules) it fails to import to the project. An error dialog box pops up without much detail. Is there further error logs which might shed some light on the issue? Is there a limit on the number of logs/rules to be processed?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tried ML on smaller 2GB file and it all works as expected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We have tried to reduce the export log size by removing duplicates via Excel, however after scp copy to expedition, the new file is no longer listed as a file for M.Learning processing to a parquet format. (interestingly, even though no file is listed, pressing the 'Process Files' button the file is now seen and is processes and then it is recognised as 'Processed by User admin'. However a parquet file is not generated. no directory or data in /datastore)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;reducing the duplicates reduces the log file by around 80%.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Current details.&lt;/P&gt;
&lt;P&gt;Expedition version 1.1.7&lt;/P&gt;
&lt;P&gt;6 vCPU&lt;/P&gt;
&lt;P&gt;16GB RAM&lt;/P&gt;
&lt;P&gt;200GB HDD&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any insights would be appreciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yung&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2019 06:10:41 GMT</pubDate>
      <guid>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251862#M1224</guid>
      <dc:creator>YungOng</dc:creator>
      <dc:date>2019-03-01T06:10:41Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate entries to reduce log file size for ML</title>
      <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251878#M1225</link>
      <description>&lt;P&gt;Given your description, it is not a matter of how big the log files are, rather than how big it is to import the rules into the project.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I guess that the issue may be is a limit in the packet size for MySQL inserts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Check your file&lt;/P&gt;
&lt;P&gt;/etc/mysql/my.conf&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and verify that your max_allowed_packet is large enough (make it 4 times bigger), as well as the bulk_insert_buffer_size value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's see if this helps resolving your issue&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2019 09:52:53 GMT</pubDate>
      <guid>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251878#M1225</guid>
      <dc:creator>dgildelaig</dc:creator>
      <dc:date>2019-03-01T09:52:53Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate entries to reduce log file size for ML</title>
      <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251879#M1226</link>
      <description>&lt;P&gt;Btw, removing duplicate entries in the log is unnecessary. The first pre-processing pass already optimizes that and other aspects as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For instance, makes a summary of, what you call, duplicated connections and sums bytes send, bytes received, packs, etc.&lt;/P&gt;
&lt;P&gt;If you check the folder "connections.parquet" that is created from the CSV files (within your Temporary Data Structures Folder), you will see that those 100GB of logs may have been reduced to some MBs.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2019 09:55:36 GMT</pubDate>
      <guid>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/251879#M1226</guid>
      <dc:creator>dgildelaig</dc:creator>
      <dc:date>2019-03-01T09:55:36Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate entries to reduce log file size for ML</title>
      <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/252107#M1239</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;SPAN&gt;&lt;a href="https://live.paloaltonetworks.com/t5/user/viewprofilepage/user-id/36606"&gt;@dgildelaig&lt;/a&gt;. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I&lt;/SPAN&gt;&lt;SPAN&gt;'ve increased the size for both packets and buffer and it did not help with the import of the large (~80k) number of rules. Also, in this state, I can't remove the 'ML Enabled' tag on the rules. It returns with the error dialog box. I deleted the project to get around this issue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I have now set 'ML Enabled' only one of the rule to get some a managable size of rules.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yung&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2019 05:30:55 GMT</pubDate>
      <guid>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/252107#M1239</guid>
      <dc:creator>YungOng</dc:creator>
      <dc:date>2019-03-04T05:30:55Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate entries to reduce log file size for ML</title>
      <link>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/252108#M1240</link>
      <description>&lt;P&gt;You're right, the processing to a parquet file from the logs make removing duplicates entries unnecessary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2019 05:34:09 GMT</pubDate>
      <guid>https://live.paloaltonetworks.com/t5/expedition-discussions/remove-duplicate-entries-to-reduce-log-file-size-for-ml/m-p/252108#M1240</guid>
      <dc:creator>YungOng</dc:creator>
      <dc:date>2019-03-04T05:34:09Z</dc:date>
    </item>
  </channel>
</rss>

