Outputs Limit! Service restart loop @ 30+

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Outputs Limit! Service restart loop @ 30+

L1 Bithead

So the title is a slight misnomer.

Have a dev server with 59 miners, 42 procs, and 32 outputs, works fine.

Have a prod server with 58 miners 41 procs and 29 outputs, does not work fine.

 

The two devices are set up with "identical" configs the dev server having a few extra test nodes.

On the prod box, as soon as I add a 30th output node the service goes into a restart loop.  This does not happen on the dev box.

Delete the 30th output node it functions as intended.

 

Is there any known limit to the number of outputs?  (assume not, or at least not 30, as the dev functions perfectly)

Not sure where to start troubleshooting this one.

 

1 accepted solution

Accepted Solutions

Hi @0isac0,

 

it looks like a lack of resources issue to me. Are you monitoring the CPU load and Memory usage? How many CPU cores have you dedicated to the VM?

 

Xavi

View solution in original post

5 REPLIES 5

L5 Sessionator

@0isac0,

 

there is no hardcoded limit on the number of nodes you can deploy. The only limit is the capacity of the underlying platform (DRAM, CPU cores, disk space ...). Different engines running the same configuration can have a completely different set of indicators (memory consumption) based on the uptime (indicator aging out policy)

 

The cause of the engine restart should be in his log (/opt/minemeld/log/minemeld-engine.log). Do you mind sharing it for us to take it a quick look?

Thanks for the assist.

 

Looks like there is a node reset error

mgmtbus._send_node_cmd ERROR: Timeout in reset to node ETOpen_compromisedIPs
2018-01-25T16:28:44 (10772)launcher.main ERROR: Exception initializing graph
Traceback (most recent call last):
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/run/launcher.py", line 288, in main
    mbusmaster.init_graph(config)
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/mgmtbus.py", line 234, in init_graph
    self._send_node_cmd(node, command)
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/mgmtbus.py", line 210, in _send_node_cmd
    raise RuntimeError(msg)
RuntimeError: Timeout in reset to node ETOpen_compromisedIPs

 

 

Followed by a not IDLE message for each node before the service crashes.

 

2018-01-25T16:28:44 (10785)base.stop ERROR: stop on not IDLE or STARTED FT
2018-01-25T16:28:44 (10785)chassis.stop ERROR: Error stopping office365_officeOnline
Traceback (most recent call last):
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/chassis.py", line 212, in stop
    ft.stop()
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/ft/basepoller.py", line 960, in stop
    super(BasePollerFT, self).stop()
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/ft/base.py", line 763, in stop
    raise AssertionError("stop on not IDLE or STARTED FT")
AssertionError: stop on not IDLE or STARTED FT
2018-01-25T16:28:44 (10785)base.stop ERROR: stop on not IDLE or STARTED FT

 

 

During the following startup each node throws a no checkpoint found message.

 

2018-01-25T16:29:11 (10824)base.read_checkpoint ERROR: office365_officeOnline - Error reading last checkpoint
Traceback (most recent call last):
  File "/opt/minemeld/engine/0.9.44/local/lib/python2.7/site-packages/minemeld/ft/base.py", line 255, in read_checkpoint
    with open(self.name+'.chkp', 'r') as f:
IOError: [Errno 2] No such file or directory: 'office365_officeOnline.chkp'
2018-01-25T16:29:11 (10824)base.state INFO: office365_officeOnline - transitioning to state 1
2018-01-25T16:29:11 (10824)loader.load INFO: Loading minemeld_nodes:minemeld.ft.redis.RedisSet

 

And it repeats about each minute.

 

Hi @0isac0,

 

it looks like a lack of resources issue to me. Are you monitoring the CPU load and Memory usage? How many CPU cores have you dedicated to the VM?

 

Xavi

xhoms,

 

Thanks again.

Had 2 proc, with 4GB of RAM.

Bumped it up to 8GB RAM, no change

Bumped it up to 4 Proc and started processing as intended.

Hi,


I have case almost similar. I run minemeld (standalone) in a virtual machine with 2 cpu, 6GB RAM and 40GB hd. My config has 56 miners (mainly youtube miners and ransomware trackers), 12 aggregators and 37 output nodes. The miners start the job, but when it reach the band "87k-95k indicators", the minemeld stop mining. The miners get the status "started". It doesn't restart the service each x sec, it just stop mining. Some time later the dashbord shows 0 indicators. Could someone give me some tip, advice?

 

Thank you in advanced.

  • 1 accepted solution
  • 4768 Views
  • 5 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!