-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Starting trained models with a ridiculously large queue_capacity crashes the node #89555
Comments
Pinging @elastic/ml-core (Team:ML) |
I suspect the same issue exists for the |
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Aug 24, 2022
When starting a trained model deployment, a queue is created. If the queue_capacity is too large, it can lead to OOM and a node crash. This commit adds validation that the queue_capacity cannot be more than 1M. Closes elastic#89555
dimitris-athanasiou
added a commit
that referenced
this issue
Aug 24, 2022
When starting a trained model deployment, a queue is created. If the queue_capacity is too large, it can lead to OOM and a node crash. This commit adds validation that the queue_capacity cannot be more than 1M. Closes #89555
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Aug 25, 2022
When starting a trained model deployment, a queue is created. If the queue_capacity is too large, it can lead to OOM and a node crash. This commit adds validation that the queue_capacity cannot be more than 1M. Closes elastic#89555
dimitris-athanasiou
added a commit
that referenced
this issue
Aug 25, 2022
) When starting a trained model deployment, a queue is created. If the queue_capacity is too large, it can lead to OOM and a node crash. This commit adds validation that the queue_capacity cannot be more than 1M. Closes #89555
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Elasticsearch Version
Found in 8.4.0
Installed Plugins
No response
Java Version
bundled
OS Version
Darwin Kernel Version 21.6.0
Problem Description
Although unlikely, if a user tries to start a trained model deployment with a ridiculously large
queue_capacity
(for example9999999999999
, the node that tries to allocate the model will immediately crash.Furthermore, the node will continue to crash when restarted, as soon as the node attempts to allocate the trained model deployment.
Steps to Reproduce
queue_capacity=9999999999999
Logs (if relevant)
The Elasticsearch instance crashes immediately after logging the attempt to start the trained model deployment. No logs after that.
The text was updated successfully, but these errors were encountered: