Tonight I finished building this out in my lab. Centos 7 and the ansible 'stable' minemeld branch. Added the alienvault.reputation feed, and sure enough it killed off most of the four other miners when it hit ~15-18k indicators, which is about 3-5 minutes after the engine started. Restarting the engine will restart all miners, but the failure mode is consistent. The alienvault.reputation miner never advances past that number (or completes polling), despite the feed having around 60k indicators (as seen on a successful polling in my Ubuntu MM). Interesting. The feed works fine in my Ubuntu MM with just 1 GB ram and 1 core (and like 20 other feeds)
I noted that this miner uses Class: minemeld.ft.csv.CSVFT, and i wonder if it's an issue with that class on Centos. To confirm, i found another miner using this class (bambenekconsulting.c2_ipmasterlist) to see if the behavior is the same, but it completes polling successfully. Only 437 indicators, so next I'll try a feed of the same class with more indicators, closer to the scale of alienvault. This will be a bit of trial/error work.
Another option I found is to try running it in Docker, which contains all the necessary dependencies in the container: https://hub.docker.com/r/jtschichold/minemeld/ but i'm out of time tonight. I'll take a look at this later on this week, unless you want to take a first pass @hshawn
Cheers
I did try other miners of the same class, and they worked without incident, but were all small. So perhaps it's scale related. I created a new miner (not a clone) to temporarily mine the alienvault.reputation database from a website I control. This new miner crashed in the same way the included alienvault.reputation one did.
I noted minemeld-traced.log is non-zero bytes, and upon looking in there it seems the engine crashes due to unhandled exceptions.
2018-07-11T00:43:37 (8424)storage.remove_reference WARNING: Attempt to remove non existing reference: 2e5b0b9c-c47b-44Traceback (most recent call last): File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "/opt/minemeld/engine/core/minemeld/comm/amqp.py", line 561, in _ioloop conn.drain_events() File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/connection.py", line 323, in drain_events return amqp_method(channel, args) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/connection.py", line 529, in _close (class_id, method_id), ConnectionError) ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown' <Greenlet at 0x7fb4e5de5230: <bound method AMQP._ioloop of <minemeld.comm.amqp.AMQP object at 0x7fb4e5e021d0>>(0)> fai
Everthing begins to fall apart in the minemeld-engine.log here (before this timestamp things appear healthy):
2018-07-11T00:38:59 (9727)mgmtbus._merge_status ERROR: old clock: 90 > 74 - dropped 2018-07-11T00:38:59 (9727)mgmtbus._merge_status ERROR: old clock: 21 > 20 - dropped 2018-07-11T00:38:59 (9727)mgmtbus._merge_status ERROR: old clock: 20 > 19 - dropped 2018-07-11T00:38:59 (9727)mgmtbus._merge_status ERROR: old clock: 22 > 20 - dropped Traceback (most recent call last): File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "/opt/minemeld/engine/core/minemeld/comm/amqp.py", line 561, in _ioloop conn.drain_events() File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/connection.py", line 323, in drain_events return amqp_method(channel, args) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close reply_code, reply_text, (class_id, method_id), ChannelError, NotFound: Basic.publish: (404) NOT_FOUND - no exchange 'inboundaggregator' in vhost '/' <Greenlet at 0x7f7939978190: <bound method AMQP._ioloop of <minemeld.comm.amqp.AMQP object at 0x7f793b0d1d10>>(11)> fa 2018-07-11T00:39:17 (9741)amqp._ioloop_failure ERROR: _ioloop_failure: exception in ioloop Traceback (most recent call last): File "/opt/minemeld/engine/core/minemeld/comm/amqp.py", line 567, in _ioloop_failure g.get() File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gevent/greenlet.py", line 251, in get raise self._exception NotFound: Basic.publish: (404) NOT_FOUND - no exchange 'inboundaggregator' in vhost '/' 2018-07-11T00:39:17 (9741)chassis.stop INFO: chassis stop called
When comparing this to my Ubuntu MM, I noted that there are errors, but no 'clock' errors in the logs. I started poking around in the python to see how clock is used, but am now out of time again. Documenting all of this here before it slips out of my mind.
I gave the docker image a shot yesterday it seems to be still a work in progresss, it did come up but loging into minemeld was not working so I shelved it and went back to poking at the CentOS setup. It sounds like @tyreed you are going down some rabbit holes with this, don't stare at it too long, your vision may blur ;)
Hi @tyreed,
I have fixed this issue in the latest version of minemeld-ansible. Could you try reinstalling the instance with the last minemeld-ansible playbook please?
@lmori I took the following steps this morning:
* refresh git repo
* re-run ansible playbook
* reboot server
* add alienvault reputation
results: it looked like it was going to be happy but after about a minute all the nodes stopped and the alienvault poll stopped at around 17k (there are usually around 50k in there)
After removing alienvault and comitting everything is happy again (but it takes about 8 minutes to restart the engine when this happens). Is there a better way to update the system? Should I completely wipe out the previous install and redo it from zero?
TIA!
Errors in the log around the time of this event (possibly unrelated):
[2018-07-13 07:41:04 PDT] [887] [ERROR] Exception on /status/minemeld [GET] Traceback (most recent call last): File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app response = self.full_dispatch_request() File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request rv = self.handle_user_exception(e) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception reraise(exc_type, exc_value, tb) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/opt/minemeld/engine/core/minemeld/flask/aaa.py", line 125, in decorated_view return f(*args, **kwargs) File "/opt/minemeld/engine/core/minemeld/flask/aaa.py", line 135, in decorated_view return f(*args, **kwargs) File "/opt/minemeld/engine/core/minemeld/flask/statusapi.py", line 171, in get_minemeld_status status = MMMaster.status() File "/opt/minemeld/engine/core/minemeld/flask/mmrpc.py", line 51, in status return self._send_cmd('status') File "/opt/minemeld/engine/core/minemeld/flask/mmrpc.py", line 41, in _send_cmd self._open_channel() File "/opt/minemeld/engine/core/minemeld/flask/mmrpc.py", line 38, in _open_channel self.comm.start() File "/opt/minemeld/engine/core/minemeld/comm/amqp.py", line 595, in start c = amqp.connection.Connection(**self.amqp_config) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/connection.py", line 165, in __init__ self.transport = self.Transport(host, connect_timeout, ssl) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/connection.py", line 186, in Transport return create_transport(host, connect_timeout, ssl) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/transport.py", line 299, in create_transport return TCPTransport(host, connect_timeout) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/amqp/transport.py", line 95, in __init__ raise socket.error(last_err) error: [Errno 111] Connection refused [2018-07-13 07:41:05 +0000] [887] [ERROR] Error handling request Traceback (most recent call last): File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gunicorn/workers/async.py", line 52, in handle self.handle_request(listener_name, req, client, addr) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 159, in handle_request super(GeventWorker, self).handle_request(*args) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/gunicorn/workers/async.py", line 105, in handle_request respiter = self.wsgi(environ, resp.start_response) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 2000, in __call__ return self.wsgi_app(environ, start_response) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1996, in wsgi_app ctx.auto_pop(error) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/ctx.py", line 387, in auto_pop self.pop(exc) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/ctx.py", line 376, in pop app_ctx.pop(exc) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/ctx.py", line 189, in pop self.app.do_teardown_appcontext(exc) File "/opt/minemeld/engine/current/lib/python2.7/site-packages/flask/app.py", line 1898, in do_teardown_appcontext func(exc) File "/opt/minemeld/engine/core/minemeld/flask/mmrpc.py", line 207, in teardown g.MMMaster.stop() File "/opt/minemeld/engine/core/minemeld/flask/mmrpc.py", line 55, in stop self.comm.stop() File "/opt/minemeld/engine/core/minemeld/comm/amqp.py", line 685, in stop self.rpc_out_channel.close() AttributeError: 'NoneType' object has no attribute 'close'
Update: I tried not adding alienvault to the aggregator until it finished polling. The alienvault node will completely poll and finish if it is not added to the aggregator, however once it is added then that is when the nodes get killed.
Might be related but what I am seeing is all nodes will stop instead of just some of them.
OT but how did you get the ansible install to work on Ubuntu18? Mine will error out right away.
that is a conflict with RabbitMQ version installed by default on Ubuntu 18, that is also we don't support Ubuntu 18 yet in the Ansible playbook. You should install an older RabbitMQ release (3.2.X)
1)a) Add the repositories fo file /etc/apt/sources.list
deb http://us.archive.ubuntu.com/ubuntu/ bionic universe /etc/apt/sources.list
deb http://minemeld-updates.panw.io/ubuntu trusty-minemeld main
1) $ sudo apt-get update
2) $ sudo apt-get upgrade # optional
3) $ sudo apt-get install -y gcc git python-minimal python2.7-dev libffi-dev libssl-dev make
4) $ wget https://bootstrap.pypa.io/get-pip.py
5) $ sudo -H python get-pip.py
6) $ sudo -H pip install ansible
7) $ git clone https://github.com/PaloAltoNetworks/minemeld-ansible.git
8) $ cd minemeld-ansible
8)a) After setp 8), we modified configuration
I copeid the files Ubuntu-16.04.yml to Ubuntu-18.04.yml in directory structure , these files didn't exist in setup.
Current structure:
./roles/infrastructure/vars/Ubuntu-14.04.yml
./roles/infrastructure/vars/Ubuntu-16.04.yml
./roles/infrastructure/vars/Ubuntu-18.04.yml
./roles/minemeld/vars/Ubuntu-14.04.yml
./roles/minemeld/vars/Ubuntu-16.04.yml
./roles/minemeld/vars/Ubuntu-18.04.yml
./roles/minemeld/tasks/Ubuntu-18.04-post.yml
./roles/minemeld/tasks/Ubuntu-16.04-post.yml
./roles/minemeld/tasks/Ubuntu-14.04-post.yml
8)b) As indicated at the beginning of the README.MD installation manual, we have modified the local.yml file to be able to install the stable version instead of the "dev" development one.
So that the local.yml file remains this way.
-----------------------------------------
- name: minemeld playbook
hosts: 127.0.0.1
connection: local
become: true
vars:
# minemeld_version: develop
file_permissions: 'u=rwX,g=rwX,o=rX'
# uncomment the following to install stable
minemeld_version: master
group_permissions: 'u=rwX,g=rX,o=rX'
# remove comment to set custom repositories
# core_repo: "https://github.com/jtschichold/minemeld-core.git"
# prototype_repo: "https://github.com/jtschichold/minemeld-node-prototypes.git"
# webui_repo: "https://github.com/jtschichold/minemeld-webui.git"
roles:
- infrastructure
- minemeld
-------------------------------------------
9) $ ansible-playbook -K -i 127.0.0.1, local.yml
10) $ usermod -a -G minemeld # add your user to minemeld group, useful for development
How do i do it?
thank you
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!