Disk usage steadily increases and high CPU usage

DanWoodruff · ‎09-26-2016

Hi,

I integrated a few custom feeds into MineMeld that have a high number of indicators. The dashboard shows 202.4K, across 21 miners. I noticed two things that I'm hoping I can work around:

Disk usage is steadily increasing in the /opt/minemeld/local/data directory. Some of the feed directories are over 2GB and increasing. When I left the office on Friday that directory was at about 6GB after running for a few days, and today it is at 12GB. A restart of the MineMeld service clears them out. Is this expected? Is there a way to mitigate the slow creep or a set of files I can manually remove with a cron job? I can see us not restarting the service for several weeks and the drive filling up.
Also I believe because of those large feeds, the 2 CPU cores are pegged at 100% about 75% of the time. Looking through the MineMeld engine log, I'm guessing that's because every feed is polled every 1 minute for changes automatically. Do you have any recommendations for decreasing overall CPU usage? Perhaps a way to change the polling interval? We also might throw more cores at it.

Thanks for the great tool! We're excited about using it.

Dan

lmori · ‎10-04-2016

Hi Dan,

those figures are in seconds. I would increase the interval to something high, like 43200. Are you sending an hup to the node in your scripts to force the reload of the list ?

View solution in original post

lmori · ‎09-26-2016

Hi Dan,

Disk and CPU usage heavily depend on the type of Miner being used.

For disk usage:

Miners run a garbage collection cycle every X minutes. X depends on the specific Miner config.
Aged out indicators are garbage collected (deleted) and withdrawn from the downstream nodes. If the same indicator has been published by more than 1 Miner, the indicator is removed from the aggregator nodes only when it is aged out by all the Miners
Disk usage depends on the type of Miner more than on the number. There are Miners/feeds really "quiet", while others dump 100K indicators per hour. You can age out indicators more aggressively (by changing the prototype config), but that could affect effectiveness of the feed. Basically some Miners require more disk space than others.
An additional component affecting disk space is the logging subsystem stored in the /opt/minemeld/local/trace directory. More indicators => more logs. By default logs are kept for 30 days, you can change this as well.

Could you share more details about Miners you are running and which one was using 2GB of compressed DB ?

For CPU, again it depends on the Miners. More core you add (even virtual) the better. There are also some hot spots inside MineMeld we are working on optimizing, stay tuned 🙂

DanWoodruff · ‎09-27-2016

Thanks for some insight into the internal workings!

The miners that I am using are all the stdlib.list*Generic miners. I have a few of each IPv4, Domain, and URL. The largest feeds are in the Domain and URL categories - one is 10K indicators and another is a little more than 20K. These are the ones that were using > 2GB. For example, I restarted MineMeld yesterday and one of them is back to 863MB with 431 .ldb files.

These are populated from a proprietary feed that I unfortunately am only able to query via the CIF command line tool, so that's why I'm using the generic miners. I built a script running on cron every 3 hours to create the yml output, which is then written out to the appropriate files in /opt/minemeld/local/config/. Then when MineMeld polls, it pulls in any changes.

The only indications I have about how many are aged out come from the dashboard, and I'm looking at only a handful (< 100) added or removed at each refresh. So not a lot of churn.

I'll see about increasing from 2 to 4 core for the VM.

Thanks,

Dan

lmori · ‎09-27-2016

Hi Dan,

I am working on adding a CIF Miner to MineMeld to automate the queries. Would you have time for a quick call ? Just want to be sure I get your queries covered.

Just send me a message to lmori@paloaltonetworks.com if you are ok with it.

Thanks !

luigi

lmori · ‎09-27-2016

@DanWoodruff my suggestion to lower CPU and disk usage in your case is to create a new prototype based on listIPv4Generic and increase the interval and age_out interval settings from one minute to 12 or 24 hours.

If your script calls mm-console to hup the node after creating the new list, the Miner doesn't really need to monitor the file for changes.

DanWoodruff · ‎10-04-2016

Thanks, I will give this a try. Are the intervals defined in seconds? Right now in my install, stdlib.listIPv4Generic has an age_out interval of 67 and a interval of 53.

lmori · ‎10-04-2016

Hi Dan,

those figures are in seconds. I would increase the interval to something high, like 43200. Are you sending an hup to the node in your scripts to force the reload of the list ?

DanWoodruff · ‎10-04-2016

Thanks!

I'm not currently, but will send the hup and follow your advise for the high value.

Unlock your full community experience!

Disk usage steadily increases and high CPU usage

Disk usage steadily increases and high CPU usage

Show your appreciation!