- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
Enhanced Security Measures in Place: To ensure a safer experience, we’ve implemented additional, temporary security measures for all users.
08-10-2022 04:22 AM
Hello!
Machine Learning analysis is failing. In GUI, error is "Failed". In /tmp/error_SecRulesLearn logs I see this error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 26 in stage 89.0 failed 1 times, most recent failure: Lost task 26.0 in stage 89.0 (TID 6323, localhost, executor driver): java.io.FileNotFoundException: /tmp/blockmgr-9a58c978-4463-4f0c-a09d-e4bcc41c3691/25/temp_shuffle_1ae06377-b4dd-4d10-a389-7e656f2e5fd5 (Too many open files)
Did anybody has this error? Please help.
Thank you and regards,
Maja
08-11-2022 03:18 AM
Hi,
I have not experienced this issue before.
However, by the description of the error, it seems that the process is having too many files open and the ulimit (going technical here) may have been crossed.
To reduce the number of open files, one thing you could try is to decrease the number of executors. In Spark, which it is what Expedition is using, we could reduce the number of executors by defining the number of CPUs that Expedition is using for Spark.
You can find this value in /home/userSpace/environmentParameters.php
Try to see if the number of used CPUs (NumCPUs) could be decreased a little bit.
I hope this helps
08-11-2022 03:18 AM
Hi,
I have not experienced this issue before.
However, by the description of the error, it seems that the process is having too many files open and the ulimit (going technical here) may have been crossed.
To reduce the number of open files, one thing you could try is to decrease the number of executors. In Spark, which it is what Expedition is using, we could reduce the number of executors by defining the number of CPUs that Expedition is using for Spark.
You can find this value in /home/userSpace/environmentParameters.php
Try to see if the number of used CPUs (NumCPUs) could be decreased a little bit.
I hope this helps
08-13-2022 03:03 AM
Hello!
I decreased number of CPUs from 51 to 40 and now Machine Learning is working. Thank you very much for your help.
Regards,
Maja
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!