Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Shard Snapshot Upload Code #48155

Conversation

original-brownbear
Copy link
Member

The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.

The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Member Author

@tlrx as complained about here #47560 (comment) :)

@original-brownbear original-brownbear removed the request for review from tlrx October 16, 2019 18:21
@@ -309,7 +313,7 @@ public ThreadPoolInfo info() {

@Override
public Info info(String name) {
throw new UnsupportedOperationException();
return infos.computeIfAbsent(name, n -> new Info(n, ThreadPoolType.FIXED, random.nextInt(10) + 1));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation for the map complication here: by using a random size for this pool here we still get the same kind of upload-ordering coverage we had before in the SnapshotResiliencyTests

} finally {
store.decRef();
}
} else if (snapshotStatus.isAborted()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this before incrementing the store refcount? And avoid to have to open the file and read a first byte to detect the snapshot got aborted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it doesn't matter really. With the changed execution from this it's super unlikely that another upload worker is started and actually polls a task from the queue for an aborted snapshot (since one of the workers will run into the abort anyway and drain the files queue). I don't think it's worth the additional code and complication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Armin!

@original-brownbear
Copy link
Member Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit 5b3ebea into elastic:master Oct 22, 2019
@original-brownbear original-brownbear deleted the better-concurrency-snapshot-upload branch October 22, 2019 11:15
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 22, 2019
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 22, 2019
* elastic/master:
  [Docs] Fix opType options in IndexRequest API example. (elastic#48290)
  Simplify Shard Snapshot Upload Code (elastic#48155)
  Mute ClassificationIT tests (elastic#48338)
  Reenable azure repository tests and remove some randomization in http servers  (elastic#48283)
  Use an env var for the classpath of jar hell task (elastic#48240)
  Refactor FIPS BootstrapChecks to simple checks (elastic#47499)
  Add "format" to "range" queries resulted from optimizing a logical AND (elastic#48073)
  [DOCS][Transform] document limitation regarding rolling upgrade with 7.2, 7.3 (elastic#48118)
  Fail with a better error when if there are no ingest nodes (elastic#48272)
  Fix executing enrich policies stats (elastic#48132)
  Use MultiFileTransfer in CCR remote recovery (elastic#44514)
  Make BytesReference an interface (elastic#48171)
  Also validate source index at put enrich policy time. (elastic#48254)
  Add 'javadoc' task to lifecycle check tasks (elastic#48214)
  Remove option to enable direct buffer pooling (elastic#47956)
  [DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (elastic#48297)
  Add Enrich Origin (elastic#48098)
  fix incorrect comparison (elastic#48208)
original-brownbear added a commit that referenced this pull request Oct 22, 2019
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
@original-brownbear original-brownbear restored the better-concurrency-snapshot-upload branch January 6, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants