-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify Shard Snapshot Upload Code #48155
Simplify Shard Snapshot Upload Code #48155
Conversation
The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@tlrx as complained about here #47560 (comment) :) |
@@ -309,7 +313,7 @@ public ThreadPoolInfo info() { | |||
|
|||
@Override | |||
public Info info(String name) { | |||
throw new UnsupportedOperationException(); | |||
return infos.computeIfAbsent(name, n -> new Info(n, ThreadPoolType.FIXED, random.nextInt(10) + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivation for the map complication here: by using a random size for this pool here we still get the same kind of upload-ordering coverage we had before in the SnapshotResiliencyTests
} finally { | ||
store.decRef(); | ||
} | ||
} else if (snapshotStatus.isAborted()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check this before incrementing the store refcount? And avoid to have to open the file and read a first byte to detect the snapshot got aborted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it doesn't matter really. With the changed execution from this it's super unlikely that another upload worker is started and actually polls a task from the queue for an aborted snapshot (since one of the workers will run into the abort anyway and drain the files queue). I don't think it's worth the additional code and complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Armin!
Thanks Tanguy! |
The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.
* elastic/master: [Docs] Fix opType options in IndexRequest API example. (elastic#48290) Simplify Shard Snapshot Upload Code (elastic#48155) Mute ClassificationIT tests (elastic#48338) Reenable azure repository tests and remove some randomization in http servers (elastic#48283) Use an env var for the classpath of jar hell task (elastic#48240) Refactor FIPS BootstrapChecks to simple checks (elastic#47499) Add "format" to "range" queries resulted from optimizing a logical AND (elastic#48073) [DOCS][Transform] document limitation regarding rolling upgrade with 7.2, 7.3 (elastic#48118) Fail with a better error when if there are no ingest nodes (elastic#48272) Fix executing enrich policies stats (elastic#48132) Use MultiFileTransfer in CCR remote recovery (elastic#44514) Make BytesReference an interface (elastic#48171) Also validate source index at put enrich policy time. (elastic#48254) Add 'javadoc' task to lifecycle check tasks (elastic#48214) Remove option to enable direct buffer pooling (elastic#47956) [DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (elastic#48297) Add Enrich Origin (elastic#48098) fix incorrect comparison (elastic#48208)
The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.
Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.