-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML is causing a scale up when its actually requesting for a scale down #74709
Labels
>bug
:Distributed Coordination/Autoscaling
:ml
Machine learning
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Team:ML
Meta label for the ML team
Comments
elasticmachine
added
Team:ML
Meta label for the ML team
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
labels
Jun 29, 2021
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/es-distributed (Team:Distributed) |
benwtrent
added a commit
that referenced
this issue
Jun 30, 2021
… and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
benwtrent
added a commit
to benwtrent/elasticsearch
that referenced
this issue
Jun 30, 2021
… and improve scaling size estimations (elastic#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: elastic#74709
benwtrent
added a commit
to benwtrent/elasticsearch
that referenced
this issue
Jun 30, 2021
… and improve scaling size estimations (elastic#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: elastic#74709
benwtrent
added a commit
that referenced
this issue
Jul 1, 2021
…g down and improve scaling size estimations (#74691) (#74780) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
benwtrent
added a commit
that referenced
this issue
Jul 1, 2021
…ng down and improve scaling size estimations (#74691) (#74781) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
benwtrent
added a commit
that referenced
this issue
Jul 1, 2021
…ng down and improve scaling size estimations (#74691) (#74782) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed Coordination/Autoscaling
:ml
Machine learning
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Team:ML
Meta label for the ML team
Issue
Versions: 7.11-7.13
Fixed in: 7.14+
Due to poor estimations, it is possible that a scale down request accidentally requires a scale up.
Here is a response that epitomizes the scenario:
Note how the current size is actually 2GB (
2147483648
), but ML's estimation is off due to rounding values inappropriately (2520765440
). This actually caused a scale up instead of a scale down.Work around
If you have an Elasticsearch version that suffers from this and the scenario occurs, it is possible to statically set the minimum and maximum autoscaling sizes for ML inside of elastic cloud.
The text was updated successfully, but these errors were encountered: