Simplify Shard Snapshot Upload Code #48155

original-brownbear · 2019-10-16T17:26:28Z

The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.

The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.

elasticmachine · 2019-10-16T17:26:30Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2019-10-16T17:42:40Z

@tlrx as complained about here #47560 (comment) :)

original-brownbear · 2019-10-16T18:34:45Z

test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java

@@ -309,7 +313,7 @@ public ThreadPoolInfo info() {

            @Override
            public Info info(String name) {
-                throw new UnsupportedOperationException();
+                return infos.computeIfAbsent(name, n -> new Info(n, ThreadPoolType.FIXED, random.nextInt(10) + 1));


Motivation for the map complication here: by using a random size for this pool here we still get the same kind of upload-ordering coverage we had before in the SnapshotResiliencyTests

tlrx · 2019-10-21T16:18:14Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-                                    } finally {
-                                        store.decRef();
-                                    }
-                                } else if (snapshotStatus.isAborted()) {


Should we check this before incrementing the store refcount? And avoid to have to open the file and read a first byte to detect the snapshot got aborted?

I would say it doesn't matter really. With the changed execution from this it's super unlikely that another upload worker is started and actually polls a task from the queue for an aborted snapshot (since one of the workers will run into the abort anyway and drain the files queue). I don't think it's worth the additional code and complication.

tlrx

LGTM, thanks Armin!

original-brownbear · 2019-10-22T11:15:26Z

Thanks Tanguy!

The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.

* elastic/master: [Docs] Fix opType options in IndexRequest API example. (elastic#48290) Simplify Shard Snapshot Upload Code (elastic#48155) Mute ClassificationIT tests (elastic#48338) Reenable azure repository tests and remove some randomization in http servers (elastic#48283) Use an env var for the classpath of jar hell task (elastic#48240) Refactor FIPS BootstrapChecks to simple checks (elastic#47499) Add "format" to "range" queries resulted from optimizing a logical AND (elastic#48073) [DOCS][Transform] document limitation regarding rolling upgrade with 7.2, 7.3 (elastic#48118) Fail with a better error when if there are no ingest nodes (elastic#48272) Fix executing enrich policies stats (elastic#48132) Use MultiFileTransfer in CCR remote recovery (elastic#44514) Make BytesReference an interface (elastic#48171) Also validate source index at put enrich policy time. (elastic#48254) Add 'javadoc' task to lifecycle check tasks (elastic#48214) Remove option to enable direct buffer pooling (elastic#47956) [DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (elastic#48297) Add Enrich Origin (elastic#48098) fix incorrect comparison (elastic#48208)

The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.

original-brownbear added 2 commits October 16, 2019 19:20

fix mistake

852efe3

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 labels Oct 16, 2019

simpler

77b6ab2

original-brownbear requested a review from tlrx October 16, 2019 17:42

original-brownbear added 3 commits October 16, 2019 19:59

fix test

f5e18c7

smarter snapshot info

cea3e85

cleaner

087aeeb

original-brownbear removed the request for review from tlrx October 16, 2019 18:21

original-brownbear added the WIP label Oct 16, 2019

original-brownbear added 2 commits October 16, 2019 20:25

shorter

3bb0683

more randomness

ea7f6ce

original-brownbear commented Oct 16, 2019

View reviewed changes

original-brownbear removed the WIP label Oct 16, 2019

original-brownbear requested review from tlrx and ywelsch October 16, 2019 20:03

original-brownbear mentioned this pull request Oct 19, 2019

Restore from Individual Shard Snapshot Files in Parallel #48110

Merged

tlrx reviewed Oct 21, 2019

View reviewed changes

original-brownbear requested a review from tlrx October 21, 2019 16:22

tlrx approved these changes Oct 22, 2019

View reviewed changes

original-brownbear merged commit 5b3ebea into elastic:master Oct 22, 2019

original-brownbear deleted the better-concurrency-snapshot-upload branch October 22, 2019 11:15

original-brownbear mentioned this pull request Oct 22, 2019

Simplify Shard Snapshot Upload Code (#48155) #48345

Merged

original-brownbear added v7.6.0 and removed v7.5.0 labels Oct 22, 2019

original-brownbear restored the better-concurrency-snapshot-upload branch January 6, 2021 14:00

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Shard Snapshot Upload Code #48155

Simplify Shard Snapshot Upload Code #48155

original-brownbear commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

original-brownbear commented Oct 16, 2019

original-brownbear Oct 16, 2019

tlrx Oct 21, 2019

original-brownbear Oct 21, 2019

tlrx Oct 22, 2019

tlrx left a comment

original-brownbear commented Oct 22, 2019

Simplify Shard Snapshot Upload Code #48155

Simplify Shard Snapshot Upload Code #48155

Conversation

original-brownbear commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

original-brownbear commented Oct 16, 2019

original-brownbear Oct 16, 2019

Choose a reason for hiding this comment

tlrx Oct 21, 2019

Choose a reason for hiding this comment

original-brownbear Oct 21, 2019

Choose a reason for hiding this comment

tlrx Oct 22, 2019

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

original-brownbear commented Oct 22, 2019