Hi,
we have an issue on our Minemeld instance in production. Similar to the issue reported in https://live.paloaltonetworks.com/t5/minemeld-discussions/minemeld-crashing/td-p/289998, minemeld randomly crashes with the following results:
- the green loading bar keeps running across the screen
- the nodes page won't load
- TAXII output prototype is giving a bad gateway 502 to TAXII clients.
- ‘timeout RPC’ exception generated from minemeld-web.log
A reboot of the server seems to resolve the issue just temporarely as it reoccurs approx after 48 hours.
From the logs (and UI interface) it seems that the engine (minemeld-engine.log) is not revealing any issues at all:
2020-10-14T11:35:22 (1784)table._query_by_index INFO: Deleted in scan of _age_out: 0
2020-10-14T11:35:25 (1784)basepoller._actor_loop INFO: ML_URL_PIS_openphish_feed_txt_MCG_30dSD - command: 1602668125764 age_out
2020-10-14T11:35:25 (1784)table._query_by_index INFO: Deleted in scan of _age_out: 0
2020-10-14T11:35:30 (1785)basepoller._huppable_wait INFO: hup is clear: False
2020-10-14T11:35:30 (1785)basepoller._actor_loop INFO: MC_IPv4_PIS_dshield_blocklist_txt_HCG_nilSD_NOVRFY - command: 1602668130987 poll
2020-10-14T11:35:30 (1785)basepoller._polling_loop INFO: Polling MC_IPv4_PIS_dshield_blocklist_txt_HCG_nilSD_NOVRFY
2020-10-14T11:35:31 (1785)basepoller._actor_loop INFO: MC_IPv4_PIS_dshield_blocklist_txt_HCG_nilSD_NOVRFY - command: 1602668130987 sudden_death
2020-10-14T11:35:31 (1785)table._query_by_index INFO: Deleted in scan of _last_run: 20
2020-10-14T11:35:31 (1785)basepoller._actor_loop INFO: MC_IPv4_PIS_dshield_blocklist_txt_HCG_nilSD_NOVRFY - command: 1602668130987 age_out
2020-10-14T11:35:31 (1785)table._query_by_index INFO: Deleted in scan of _age_out: 0
2020-10-14T11:35:31 (1785)basepoller._actor_loop INFO: MC_IPv4_PIS_dshield_blocklist_txt_HCG_nilSD_NOVRFY - command: 1602668130987 gc
2020-10-14T11:35:31 (1785)table._query_by_index INFO: Deleted in scan of _withdrawn: 0
2020-10-14T11:35:44 (1784)basepoller._actor_loop INFO: wl_URL_generic - command: 1602668144137 age_out
while on the web side (minemeld-web.log) it reports the following error:
[2020-10-14 11:35:20 CEST] [1671] [DEBUG] redis session connection pool: in use: 0 available: 5
[2020-10-14 11:35:20 CEST] [1671] [DEBUG] RPC sent to @mbus:master:rpc for method status
[2020-10-14 11:35:29 CEST] [1671] [DEBUG] 0
[2020-10-14 11:35:29 CEST] [1671] [ERROR] Exception on /feeds/OL_domain_MAL [GET]
Traceback (most recent call last):
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 125, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 135, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/feedredis.py", line 532, in get_feed_content
status = MMMaster.status()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 49, in status
return self._send_cmd('status')
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 45, in _send_cmd
timeout=500.0
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/comm/zmqredis.py", line 695, in send_rpc
raise RuntimeError('Timeout in RPC')
RuntimeError: Timeout in RPC
[2020-10-14 11:35:29 CEST] [1671] [DEBUG] 0
[2020-10-14 11:35:29 CEST] [1671] [ERROR] Exception on /feeds/OL_IPv4_MAL [GET]
Traceback (most recent call last):
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 125, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 135, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/feedredis.py", line 532, in get_feed_content
status = MMMaster.status()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 49, in status
return self._send_cmd('status')
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 45, in _send_cmd
timeout=500.0
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/comm/zmqredis.py", line 695, in send_rpc
raise RuntimeError('Timeout in RPC')
RuntimeError: Timeout in RPC
[2020-10-14 11:35:29 CEST] [1671] [DEBUG] 0
[2020-10-14 11:35:29 CEST] [1671] [ERROR] Exception on /feeds/OL_URL_MAL [GET]
Traceback (most recent call last):
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 125, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/aaa.py", line 135, in decorated_view
return f(*args, **kwargs)
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/feedredis.py", line 532, in get_feed_content
status = MMMaster.status()
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 49, in status
return self._send_cmd('status')
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py", line 45, in _send_cmd
timeout=500.0
File "/opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/comm/zmqredis.py", line 695, in send_rpc
raise RuntimeError('Timeout in RPC')
RuntimeError: Timeout in RPC
[2020-10-14 11:35:30 CEST] [1671] [DEBUG] redis session connection pool: in use: 0 available: 5
Doing some debugging it seems that the function returns 0 bytes from the socket.poll method in the file /opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/comm/zmqredis.py line 686:
if timeout is not None:
# zmq green does not support RCVTIMEO
retcode = socket.poll(flags=zmq.POLLIN, timeout=int(timeout*1000))
LOG.debug("{}".format(retcode)) # i added this one, resulting to 0 output
if retcode != 0:
result = socket.recv_json(flags=zmq.NOBLOCK)
else:
socket.close(linger=0)
raise RuntimeError('Timeout in RPC')
else:
result = socket.recv_json()
the Timeout RPC exception is raised, but it's not actually an RPC issue as i've tried to modify it on to 500 seconds, file /opt/minemeld/engine/0.9.70.post1/local/lib/python2.7/site-packages/minemeld/flask/mmrpc.py line 45.
Maybe the issue is related to the MISP extensions as well as CIF and TAXII prototypes to pull in feeds.
Till now i've tried the following to address the issue:
Allocate more resources to the server,
Set kernel parameter overcommit memory to 1,
Modify the timeout RPC Redis parameter in the code to 500 seconds,
Add confidence scores to our MISP indicators imported by the extension
my next test is to reduce the number of indicators in the taxii output feed since is maxed out (1000000 indicators).
Please, if you have any suggestions or you can help me it would be great,
Thanks for your attention,
V.E.
... View more