Single Incident/Playbook is killing the whole platform

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Content translations are temporarily unavailable due to site maintenance. We apologize for any inconvenience. Visit our blog to learn more.

Single Incident/Playbook is killing the whole platform

L2 Linker

Hi,

 

I built a playbook to pull some nested data (~8 MB in total) which then is used in a looped subplaybook for additional data extraction.The subplaybook is relatively simple, uses for each input loop, starts with deletecontext (all=yes) and returns couple of small parameters through Outputs. It supposed to have ~600 iterations in total. Subplaybook has some built in JSON parsing as well as some regex conditions. Once the subplaybook kicks in, at some point the platform significantly slows down, console becomes unresponsive. Even the ssh shell becomes extremely slow, it took like ~5 min to login, each letter I type takes some time to appear in a terminal screen. At some point I was able to load workers/status, which showed only few tasks running (including the troubled one), more than a half of the workers were still available. The System Diagnostics page (which also took ages to load) showed 80%CPU, 80%Memory use (normally cpu is under 10%, memory under 60%). The top command in backend also showed quite significant usage of CPU. The playbook is mostly build on defaults, nothing much customized. 

 

Eventually the ssh stopped responding, after hour of waiting i just cold rebooted the server and then closed the incident before it could continue.

 

What could be the cause of that behavior? I believe i have much more complex and intensive playbooks which do not cause such issues. Where to look for the clues?

How is it possible, that a single task/playbook kills the whole platform? Why it takes so many resources from the host, shouldn't resource allocation be restricted to a worker? Isn't this a goal of having podman/docker?

 

I do not have a test environment to play around the playbook, so replicating the issue is quite costly.

 

Appreciate for any ideas.

 

Thanks,

Antanas

Curious Fellow
1 REPLY 1

L2 Linker

funnily enough, Support suggested to close a case that I raised for this and post a question here 🙂

Curious Fellow
  • 323 Views
  • 1 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!