[ML] Write downloaded model parts async #111684

davidkyle · 2024-08-07T16:55:45Z

It has been observed that downloading and installing the built in .elser_model_2 and .multilingual-e5-small models is much slower than expected. The cause is in the ModelImporter class which downloads the model definition in 1MB chunks then blocks as the model part is written to the index.

The download server supports the Range header, to speed up the download and install multiple connections are made to the server each asking for a separate range. A dedicated thread handle downloading and index the parts in each range. 5 connections are used in this PR, reading a 1MB chunk at a time to limit the amount of memory used.

The final part of the model definition must be written last as it causes an index refresh making the full model definition visible, if the refresh occurs before all parts are written and not all the parts are visible then deploying the model will fail. This is achieved by indexing the final part only once all the other streams have completed.

There is a problem with calculating the SHA 256 Message Digest of the downloaded model. For one the MessageDigest is not thread safe, more problematically the model parts are not downloaded sequentially and the resulting digest changes depending on the order in which the parts are downloaded.

elasticsearchmachine · 2024-08-07T16:56:09Z

Hi @davidkyle, I've created a changelog YAML for you.

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java

davidkyle · 2024-08-14T15:35:29Z

@elasticmachine update branch

elasticsearchmachine · 2024-08-14T16:00:11Z

Pinging @elastic/ml-core (Team:ML)

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java

davidkyle · 2024-08-21T19:41:49Z

@elasticmachine update branch

davidkyle · 2024-08-22T12:42:38Z

In classic cloud this change has taken the model download & install time down from 30 seconds to [7 - 10] seconds with the total time to download and deploy ELSER optimised at 14 seconds.

In serverless the download & install time is down to 21 seconds and the total time to download and deploy ELSER optimised 31 seconds.

Those severless numbers aren't good enough, I will try another approach

davidkyle · 2024-09-13T08:08:07Z

@elasticmachine update branch

…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream

#112859) Uses the range header to split the model download into multiple streams using a separate thread for each stream

…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java

…112869) Manual backport of - [ML] Downloaded and write model parts using multiple streams (#111684)

…lastic#111684)" This reverts commit 13bd6c0.

…111684)" (#112961) …(#111684)" This reverts commit 13bd6c0.

…streams (#111684)" (#112961)" This reverts commit 7e04d8e.

…lastic#111684) (elastic#112859)" This reverts commit 4fe2851.

…111684) (#112859)" (#113016) …(#111684) (#112859)" This reverts commit 4fe2851. Manual backport of #112961

…csearch.xpack.test.rest.XPackRestIT #111944" (#113037) …csearch.xpack.test.rest.XPackRestIT #111944" The tests failed because of #111684 which has since been reverted

…2992) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.

…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.

…2992) (#113514) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.

…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java

…2992) (#113710) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java

Write model parts async

2e854e4

davidkyle added >enhancement :ml Machine learning cloud-deploy Publish cloud docker image for Cloud-First-Testing v8.16.0 labels Aug 7, 2024

Update docs/changelog/111684.yaml

125c822

jimczi reviewed Aug 8, 2024

View reviewed changes

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java Outdated Show resolved Hide resolved

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java Outdated Show resolved Hide resolved

davidkyle added 5 commits August 14, 2024 13:16

Pass a listener to import

be39082

Ref counting WIP

5310a08

use ref counting listener

3a57e14

tidying

56f5e1c

add tests

9f98dbe

Merge branch 'main' into background-download-write

e434499

davidkyle marked this pull request as ready for review August 14, 2024 15:59

elasticsearchmachine added the Team:ML Meta label for the ML team label Aug 14, 2024

davidkyle added 3 commits August 19, 2024 17:15

Add download threadpool

9d9a0e6

less blocking

669909b

tidy up

fcc66b4

jimczi reviewed Aug 20, 2024

View reviewed changes

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java Outdated Show resolved Hide resolved

...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java Outdated Show resolved Hide resolved

more tests

1921812

elasticmachine and others added 4 commits August 22, 2024 05:41

Merge branch 'main' into background-download-write

63bdff6

remove unused

84aec83

fix the tests

c4ea146

5 in flight requests

413a3ab

use another threadpool for writes

70ffe07

Merge branch 'main' into background-download-write

81ce3f1

davidkyle added v8.16.0 v8.15.2 auto-backport-and-merge labels Sep 13, 2024

davidkyle merged commit 13bd6c0 into main Sep 13, 2024
17 checks passed

davidkyle deleted the background-download-write branch September 13, 2024 09:30

davidkyle mentioned this pull request Sep 13, 2024

[8.x] [ML] Downloaded and write model parts using multiple streams (#111684) #112859

Merged

elasticsearchmachine added the backport pending label Sep 13, 2024

elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2024

[ML] Downloaded and write model parts using multiple streams (#111684) (

4fe2851

#112859) Uses the range header to split the model download into multiple streams using a separate thread for each stream

davidkyle mentioned this pull request Sep 13, 2024

[8.15] [ML] Downloaded and write model parts using multiple streams #112869

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2024

[8.15] [ML] Downloaded and write model parts using multiple streams (#…

f48a1c6

…112869) Manual backport of - [ML] Downloaded and write model parts using multiple streams (#111684)

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 16, 2024

Revert "[ML] Downloaded and write model parts using multiple streams (e…

5fc5943

…lastic#111684)" This reverts commit 13bd6c0.

davidkyle mentioned this pull request Sep 16, 2024

Revert "[ML] Downloaded and write model parts using multiple streams … #112961

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 16, 2024

Revert "[ML] Downloaded and write model parts using multiple streams (#…

7e04d8e

…111684)" (#112961) …(#111684)" This reverts commit 13bd6c0.

davidkyle added a commit that referenced this pull request Sep 17, 2024

Revert "Revert "[ML] Downloaded and write model parts using multiple …

85375f6

…streams (#111684)" (#112961)" This reverts commit 7e04d8e.

davidkyle mentioned this pull request Sep 17, 2024

[ML] Limit in flight requests when indexing model download parts #112992

Merged

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 17, 2024

Revert "[ML] Downloaded and write model parts using multiple streams (e…

0ba98b5

…lastic#111684) (elastic#112859)" This reverts commit 4fe2851.

This was referenced Sep 17, 2024

Revert "[ML] Downloaded and write model parts using multiple streams … #113016

Merged

[ML] Revert "Mute org.elasticsearch.xpack.test.rest.XPackRestIT org.elasti… #113021

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 17, 2024

Revert "[ML] Downloaded and write model parts using multiple streams (#…

afad2ee

…111684) (#112859)" (#113016) …(#111684) (#112859)" This reverts commit 4fe2851. Manual backport of #112961

davidkyle mentioned this pull request Sep 17, 2024

[ML} Revert "Mute org.elasticsearch.xpack.test.rest.XPackRestIT org.elasti… #113037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Write downloaded model parts async #111684

[ML] Write downloaded model parts async #111684

davidkyle commented Aug 7, 2024 •

edited

Loading

elasticsearchmachine commented Aug 7, 2024

davidkyle commented Aug 14, 2024

elasticsearchmachine commented Aug 14, 2024

davidkyle commented Aug 21, 2024

davidkyle commented Aug 22, 2024

davidkyle commented Sep 13, 2024

[ML] Write downloaded model parts async #111684

[ML] Write downloaded model parts async #111684

Conversation

davidkyle commented Aug 7, 2024 • edited Loading

elasticsearchmachine commented Aug 7, 2024

davidkyle commented Aug 14, 2024

elasticsearchmachine commented Aug 14, 2024

davidkyle commented Aug 21, 2024

davidkyle commented Aug 22, 2024

davidkyle commented Sep 13, 2024

davidkyle commented Aug 7, 2024 •

edited

Loading