-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Write downloaded model parts async #111684
Conversation
Hi @davidkyle, I've created a changelog YAML for you. |
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
Pinging @elastic/ml-core (Team:ML) |
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
In classic cloud this change has taken the model download & install time down from 30 seconds to [7 - 10] seconds with the total time to download and deploy ELSER optimised at 14 seconds. In serverless the download & install time is down to 21 seconds and the total time to download and deploy ELSER optimised 31 seconds. Those severless numbers aren't good enough, I will try another approach |
@elasticmachine update branch |
…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream
#112859) Uses the range header to split the model download into multiple streams using a separate thread for each stream
…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
…lastic#111684)" This reverts commit 13bd6c0.
…lastic#111684) (elastic#112859)" This reverts commit 4fe2851.
…2992) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
…2992) (#113514) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
…2992) (#113710) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
It has been observed that downloading and installing the built in
.elser_model_2
and.multilingual-e5-small
models is much slower than expected. The cause is in theModelImporter
class which downloads the model definition in 1MB chunks then blocks as the model part is written to the index.The download server supports the Range header, to speed up the download and install multiple connections are made to the server each asking for a separate range. A dedicated thread handle downloading and index the parts in each range. 5 connections are used in this PR, reading a 1MB chunk at a time to limit the amount of memory used.
The final part of the model definition must be written last as it causes an index refresh making the full model definition visible, if the refresh occurs before all parts are written and not all the parts are visible then deploying the model will fail. This is achieved by indexing the final part only once all the other streams have completed.
There is a problem with calculating the SHA 256 Message Digest of the downloaded model. For one the MessageDigest is not thread safe, more problematically the model parts are not downloaded sequentially and the resulting digest changes depending on the order in which the parts are downloaded.