Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging in parallel #157

Merged
merged 4 commits into from
Dec 12, 2022
Merged

Staging in parallel #157

merged 4 commits into from
Dec 12, 2022

Conversation

honnix
Copy link
Member

@honnix honnix commented Dec 12, 2022

TL;DR

Staging artifacts in parallel.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

The happy path is covered by integration tests.

Complete description

Currently artifacts are staged sequentially, which is very slow due to IO blocking. By staging them in multiple threads, the throughput can be largely improved.

Tracking Issue

Closes flyteorg/flyte#3146

Follow-up issue

NA

Signed-off-by: Hongxin Liang <[email protected]>
@honnix honnix requested a review from narape December 12, 2022 11:06
@@ -73,7 +74,8 @@ public Integer call() {

try (FlyteAdminClient adminClient =
FlyteAdminClient.create(config.platformUrl(), config.platformInsecure(), tokenSource)) {
Supplier<ArtifactStager> stagerSupplier = () -> ArtifactStager.create(config, modules);
Supplier<ArtifactStager> stagerSupplier =
() -> ArtifactStager.create(config, modules, new ForkJoinPool());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good enough to use default configuration of ForkJoinPool.

@@ -116,6 +147,7 @@ void stageArtifact(Artifact artifact, ByteSource content) {
throw new UncheckedIOException(e);
}
} else {
LOG.info("[{}] already staged to [{}]", artifact.name(), artifact.location());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion that users might think jflyte always stages files no matter they exist at the destination or not.

Comment on lines 101 to 113
for (int i = 0; i < futures.size(); ++i) {
try {
artifacts.add(futures.get(i).get());
} catch (InterruptedException | ExecutionException e) {
if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
for (int j = i; j < futures.size(); ++j) {
futures.get(j).cancel(true);
}

throw new RuntimeException(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i = 0; i < futures.size(); ++i) {
try {
artifacts.add(futures.get(i).get());
} catch (InterruptedException | ExecutionException e) {
if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
for (int j = i; j < futures.size(); ++j) {
futures.get(j).cancel(true);
}
throw new RuntimeException(e);
}
List<Artifact> artifacts = getAll(futures);

Can we hide all the Future get and exception handling inside and auxiliary method? Maybe CompletableFutures.getAll to mimic the Spotify library.

Is that we are mingling all that threading low level code inside the method that deals with files and artifacts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that getAll thing but then stageFiles became almost empty then I thought it didn't increase readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use CompletableFutures because we want to do cancellation.

Signed-off-by: Hongxin Liang <[email protected]>
Signed-off-by: Hongxin Liang <[email protected]>
Signed-off-by: Hongxin Liang <[email protected]>
@honnix honnix merged commit 072aa7b into master Dec 12, 2022
@honnix honnix deleted the staging-in-parallel branch December 12, 2022 15:24
@honnix honnix mentioned this pull request Dec 14, 2022
8 tasks
andresgomezfrr pushed a commit that referenced this pull request Jan 24, 2023
* Staging in parallel

Signed-off-by: Hongxin Liang <[email protected]>
Signed-off-by: Andres Gomez Ferrer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] flytekit-java supports staging artifacts in parallel
2 participants