Minemeld Error After Period

apackard · ‎11-30-2016

We've installed MM on Ubuntu 14.04 and everything starts and seems to work OK initially.

However, after a period of time it seems tro crash. Not really sure how log, but as an example I booted yesterday used if fine for an hour or so, and this morning it had failed.

A typical error (top right in red box) would be ERROR RETRIEVING MINEMELD CONFIG: Internal Server Error. - see screenshot attachment.

If I restart the minemeld service everything starts and all is good again for a period of time. Nothing jumps out in the logs - is there any advice you can give on things to check?

Thanks

niuk · ‎11-30-2016

I have same problem, but my minemeld on Ubuntu 14.05 is running syslog miner/analyzer with significant number of logs per second received from firewall. It crashed every day, after about 20-30h. Luigi advised to add CPU, I have now 4x4 cores (4g ram) . It's up and running since 18h , will see...

apackard · ‎11-30-2016

Thanks - will look into it.

Our deployment is fairly light - 2GB, 2vCPU - but the only processing we're doing over the default config is 2 new IP sources of ~70k addresses, so no in-line syslog processing.

Cheers

lmori · ‎11-30-2016

Hi @apackard,

70k addresses are a really low volume for MineMeld. Would you mind sending me the minemeld-engine.log file from /opt/minemeld/log ? My email address is lmori@paloaltonetworks.com

Additional things:

- could you check memory and disk of the instance to see if they are exhausted ?

- are you using one or more taxii data feed output nodes ? those are memory hungry, next release will cut memory usage of taxii data feeds by more than 75%.

Thanks,

luigi

apackard · ‎11-30-2016

lmori,

I've uploaded a couple of screenshots to show the current setup:-

Resource_Use =>Triggered a reload of the largest IP list, showing the OS level stats (htop) and MM UI reported stats. Probably a little disingenuous as CPU on the OS hits 100% but only for a few seconds (I missed it with the screenshot), and I suspect that the refresh period on the MM UI means it lags a little.

Nodes => Our nodes: we've created 2 new inputs, 1 aggregator and 1 output, plus the default ones. The inputs are based on the minemeld.ft.http.HttpFT prototype

Flows => Connections

Will attach the log to our another message as looks like 3 is max...

apackard · ‎11-30-2016

Log file (replaced any sensitive names\IP's with fake strings)

lmori · ‎11-30-2016

Hi @apackard,

the volume of indicators I see from your screenshot should be handled pretty well by MineMeld with those memory and CPU resources. Would you mind uploading also the /opt/minemeld/log/minemeld-web.log file ?

If you prefer you can send it directly to me at lmori@palo...

Thanks!

luigi

niuk · ‎11-30-2016

Mine crashed again but I monitored cpu and it was pretty low. Most probably I run out of disk space (see attached telegraf metrics). Should I rotate rsyslog more frequently that default ?

niuk · ‎12-01-2016

See attached to see what happens after reboot (about 8.10 am). I have disk and memory freed, and server is up and running again. Practically I have to schedule daily cron reboot of mm server

lmori · ‎12-01-2016

Hi @niuk,

do you just reboot the instance or do something more ?

Could you run this command before reboot to check which process is using most of the memory ?

$ top -b -n 1 -o %MEM

About the disk, are you erasing files before reboot ? I am asking because it's strange that a reboot alone could free space from disk.

apackard problem should be different, his instance is handling a pretty low volume of indicators.

niuk · ‎12-01-2016

Right, there are two different problems. I will run the command before next reboot.

I dont erase any files, really strange

apackard · ‎12-02-2016

Hi - sorry for delay. While arranging to get the file off I noted that it was flooding with these errors:-

Traceback (most recent call last):
File "/opt/minemeld/engine/0.9.28/local/lib/python2.7/site-packages/gevent/baseserver.py", line 140, in _do_read
File "/opt/minemeld/engine/0.9.28/local/lib/python2.7/site-packages/gevent/server.py", line 93, in do_read
error: [Errno 24] Too many open files
<StreamServer at 0x7fbffaa0cd90 fileno=5 address=127.0.0.1:5000 handle=<functools.partial object at 0x7fc001c0e8e8>> failed with error

I restarted and they stopped, so may be a good indicator?

Rgds

niuk · ‎12-02-2016

I can see that number of open files is bigger than max on mine Ubuntu too..I think it can be easily increased

minemeld@minemeld:/opt/minemeld/prototypes/current$ lsof | wc -l
8087
minemeld@minemeld:/opt/minemeld/prototypes/current$ ulimit -a | grep open
open files (-n) 1024

lmori · ‎12-02-2016

Hi @apackard,

thanks, that is really helpful. I checked the logs of the engine and everything was normal except for an issue with reaching ransomwaretracker.

Before increasing the number of opened files, I would like to understand if there is a leak of file descriptors, if you run

$ sudo ps -aef | grep gunicorn

You will find 2 processes. Could you dump the open files with "lsof -p <pid>" for each process and check if most of them are session to redis (port 6379) or rabbitmq (port 5672) ?

Do you have many firewalls/devices retrieving feeds from MM ?

Thanks,

luigi

apackard · ‎12-02-2016

Will do.

In terms of the question:-

We currently have an IP block list provided by a 3rd party. I have some custom PS scripts that I currently run that downloads this, produces DIFF reports, does some mangling and outputs as a file for serving up on an internal web server for our Internet facing firewalls (about a dozen).

I'm looking to replace this with MineMeld so in future it will be supporting at least 10 devices; but until we can work out why it keeps stopping we can't proceed - so right now there isn't actually any client devices etc.

I'm also hoping to use some dynamic behaviouir to get round some limitations in your dynamic blocklist max sizes and block-ip duration. As we can only serve up ~1,200 IP's (out of the 50k plus in the 3rd party IP list), and as we can only block an IP for 1 hour with THREAT block-ip action, I have a SIEM that triggers a script if it sees any of the the "non-served" IP's attacking us, or if it sees repeated block-ip actions from a common source.

This will poke an offending IP to a smaller 'active' attackers list that we can use for a dynamic blocklist that will have a lifetime of a month (ex.), once that functionality is in place we may serve up to our full estate of PA's, with is over 30.

Unlock your full community experience!

Minemeld Error After Period

Minemeld Error After Period

Show your appreciation!