Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Write downloaded model parts async #111684

Merged
merged 65 commits into from
Sep 13, 2024
Merged

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Aug 7, 2024

It has been observed that downloading and installing the built in .elser_model_2 and .multilingual-e5-small models is much slower than expected. The cause is in the ModelImporter class which downloads the model definition in 1MB chunks then blocks as the model part is written to the index.

The download server supports the Range header, to speed up the download and install multiple connections are made to the server each asking for a separate range. A dedicated thread handle downloading and index the parts in each range. 5 connections are used in this PR, reading a 1MB chunk at a time to limit the amount of memory used.

The final part of the model definition must be written last as it causes an index refresh making the full model definition visible, if the refresh occurs before all parts are written and not all the parts are visible then deploying the model will fail. This is achieved by indexing the final part only once all the other streams have completed.

There is a problem with calculating the SHA 256 Message Digest of the downloaded model. For one the MessageDigest is not thread safe, more problematically the model parts are not downloaded sequentially and the resulting digest changes depending on the order in which the parts are downloaded.

@davidkyle davidkyle added >enhancement :ml Machine learning cloud-deploy Publish cloud docker image for Cloud-First-Testing v8.16.0 labels Aug 7, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @davidkyle, I've created a changelog YAML for you.

@davidkyle
Copy link
Member Author

@elasticmachine update branch

@davidkyle davidkyle marked this pull request as ready for review August 14, 2024 15:59
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Aug 14, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle
Copy link
Member Author

@elasticmachine update branch

@davidkyle
Copy link
Member Author

In classic cloud this change has taken the model download & install time down from 30 seconds to [7 - 10] seconds with the total time to download and deploy ELSER optimised at 14 seconds.

In serverless the download & install time is down to 21 seconds and the total time to download and deploy ELSER optimised 31 seconds.

Those severless numbers aren't good enough, I will try another approach

@davidkyle
Copy link
Member Author

@elasticmachine update branch

@davidkyle davidkyle merged commit 13bd6c0 into main Sep 13, 2024
17 checks passed
@davidkyle davidkyle deleted the background-download-write branch September 13, 2024 09:30
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 13, 2024
…#111684)

Uses the range header to split the model download into multiple streams
using a separate thread for each stream
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2024
#112859)

Uses the range header to split the model download into multiple streams
using a separate thread for each stream
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 13, 2024
…#111684)

Uses the range header to split the model download into multiple streams
using a separate thread for each stream
# Conflicts:
#	x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
#	x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2024
…112869)

Manual backport of 

- [ML] Downloaded and write model parts using multiple streams (#111684)
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 16, 2024
davidkyle added a commit that referenced this pull request Sep 17, 2024
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 17, 2024
elasticsearchmachine pushed a commit that referenced this pull request Sep 17, 2024
elasticsearchmachine pushed a commit that referenced this pull request Sep 17, 2024
…csearch.xpack.test.rest.XPackRestIT #111944" (#113037)

…csearch.xpack.test.rest.XPackRestIT #111944"

The tests failed because of
#111684 which has since
been reverted
davidkyle added a commit that referenced this pull request Sep 25, 2024
…2992)

Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 25, 2024
…stic#112992)

Restores the changes from elastic#111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
elasticsearchmachine pushed a commit that referenced this pull request Sep 25, 2024
…2992) (#113514)

Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 27, 2024
…stic#112992)

Restores the changes from elastic#111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to
store the model definition.
# Conflicts:
#	x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
#	x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
elasticsearchmachine pushed a commit that referenced this pull request Sep 27, 2024
…2992) (#113710)

Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to
store the model definition.
# Conflicts:
#	x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
#	x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending cloud-deploy Publish cloud docker image for Cloud-First-Testing >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.15.2 v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.