Disk usage steadily increases and high CPU usage

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Disk usage steadily increases and high CPU usage

L2 Linker

Hi,

I integrated a few custom feeds into MineMeld that have a high number of indicators. The dashboard shows 202.4K, across 21 miners. I noticed two things that I'm hoping I can work around:

  1. Disk usage is steadily increasing in the /opt/minemeld/local/data directory. Some of the feed directories are over 2GB and increasing. When I left the office on Friday that directory was at about 6GB after running for a few days, and today it is at 12GB. A restart of the MineMeld service clears them out. Is this expected? Is there a way to mitigate the slow creep or a set of files I can manually remove with a cron job? I can see us not restarting the service for several weeks and the drive filling up.
  2. Also I believe because of those large feeds, the 2 CPU cores are pegged at 100% about 75% of the time. Looking through the MineMeld engine log, I'm guessing that's because every feed is polled every 1 minute for changes automatically. Do you have any recommendations for decreasing overall CPU usage? Perhaps a way to change the polling interval? We also might throw more cores at it.

Thanks for the great tool! We're excited about using it.

Dan

1 accepted solution

Accepted Solutions

Hi Dan,

those figures are in seconds. I would increase the interval to something high, like 43200. Are you sending an hup to the node in your scripts to force the reload of the list ?

View solution in original post

7 REPLIES 7

L7 Applicator

Hi Dan,

Disk and CPU usage heavily depend on the type of Miner being used. 

 

For disk usage:

  1. Miners run a garbage collection cycle every X minutes. X depends on the specific Miner config.
  2. Aged out indicators are garbage collected (deleted) and withdrawn from the downstream nodes. If the same indicator has been published by more than 1 Miner, the indicator is removed from the aggregator nodes only when it is aged out by all the Miners
  3. Disk usage depends on the type of Miner more than on the number. There are Miners/feeds really "quiet", while others dump 100K indicators per hour. You can age out indicators more aggressively (by changing the prototype config), but that could affect effectiveness of the feed. Basically some Miners require more disk space than others. 
  4. An additional component affecting disk space is the logging subsystem stored in the /opt/minemeld/local/trace directory. More indicators => more logs. By default logs are kept for 30 days, you can change this as well.

Could you share more details about Miners you are running and which one was using 2GB of compressed DB ?

 

For CPU, again it depends on the Miners. More core you add (even virtual) the better. There are also some hot spots inside MineMeld we are working on optimizing, stay tuned 🙂

Thanks for some insight into the internal workings!

 

The miners that I am using are all the stdlib.list*Generic miners. I have a few of each IPv4, Domain, and URL. The largest feeds are in the Domain and URL categories - one is 10K indicators and another is a little more than 20K. These are the ones that were using > 2GB. For example, I restarted MineMeld yesterday and one of them is back to 863MB with 431 .ldb files.

 

These are populated from a proprietary feed that I unfortunately am only able to query via the CIF command line tool, so that's why I'm using the generic miners. I built a script running on cron every 3 hours to create the yml output, which is then written out to the appropriate files in /opt/minemeld/local/config/. Then when MineMeld polls, it pulls in any changes.

 

The only indications I have about how many are aged out come from the dashboard, and I'm looking at only a handful (< 100) added or removed at each refresh. So not a lot of churn. 

 

I'll see about increasing from 2 to 4 core for the VM.

Thanks,

Dan

 

Hi Dan,

I am working on adding a CIF Miner to MineMeld to automate the queries. Would you have time for a quick call ? Just want to be sure I get your queries covered.

Just send me a message to lmori@paloaltonetworks.com if you are ok with it.

 

Thanks !

luigi

@DanWoodruff my suggestion to lower CPU and disk usage in your case is to create a new prototype based on listIPv4Generic and increase the interval and age_out interval settings from one minute to 12 or 24 hours.

If your script calls mm-console to hup the node after creating the new list, the Miner doesn't really need to monitor the file for changes.

 

Thanks, I will give this a try. Are the intervals defined in seconds? Right now in my install, stdlib.listIPv4Generic has an age_out interval of 67 and a interval of 53. 

Hi Dan,

those figures are in seconds. I would increase the interval to something high, like 43200. Are you sending an hup to the node in your scripts to force the reload of the list ?

Thanks!

 

I'm not currently, but will send the hup and follow your advise for the high value. 

  • 1 accepted solution
  • 9325 Views
  • 7 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!