-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly configure executorService for s3 multipartuploads in native s3filesystem #22209
Explicitly configure executorService for s3 multipartuploads in native s3filesystem #22209
Conversation
lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3FileSystemConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to make this configurable. As you mentioned, the old implementation used a cached thread pool, and to my knowledge this has never been a problem for anyone.
The concurrency of uploads is naturally limited by the number of output streams, which should already be limited to a relatively small number.
@@ -120,12 +126,21 @@ public S3FileSystemFactory(OpenTelemetry openTelemetry, S3FileSystemConfig confi | |||
config.getSseKmsKeyId(), | |||
Optional.empty(), | |||
config.getCannedAcl()); | |||
|
|||
this.uploadExecutor = new ThreadPoolExecutor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do what you expect. If you very carefully read the Javadoc for ThreadPoolExecutor
, you'll see that this will result in tasks being rejected when the maximum pool size is reached. An easier way to see why this is true is that there is no queue to hold the non-runnable tasks, so the only possible behavior would be to block the caller or reject the task, and it doesn't block the caller.
If you spend a very long time thinking about ThreadPoolExecutor
and reading the documentation many times, you may eventually realize that the cached
and fixed
are the only useful configurations. There's a reason that we don't use it directly in Trino.
If you want to limit concurrency without having a fixed number of threads, the alternative is to use BoundedExecutor
, which we do use in various places. Once we can safely use virtual threads, all of this complexity goes away -- just use a Semaphore
.
lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3FileSystemFactory.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@PreDestroy | ||
public void destroy() | ||
{ | ||
client.close(); | ||
uploadExecutor.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close()
will block up to a day before interrupting tasks.
Since we probably want to use a shorter time frame, we should use MoreExecutors.shutdownAndAwaitTermination()
, which does the same thing and lets us provide the same time frame. I recommend 10s as a default we commonly use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, close()
was added for structured concurrency with virtual threads. The behavior makes sense for that use case, but not for server shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, GcsFileSystemFactory
also uses an executor service so I followed the existing pattern of how it shuts down. executorService.shutdownNow()
which interrupts each thread and moves on without waiting for termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erichwang good point. btw this applies to quite many existing thread pools.
We will need static code analysis rules to cope with that.
we should use
MoreExecutors.shutdownAndAwaitTermination()
,
for connector/server shutdown, we should use shutdownNow() without delay, so shutdownAndAwaitTermination
isn't exactly perfect
498de32
to
59317d6
Compare
Potentially related failure: https://github.com/trinodb/trino/actions/runs/9320114775/job/25656271154?pr=22209
|
S3 client is shared within the S3FileSystemFactory. By allowing more concurrency with a cachedthreadpool (instead of the forkjoin common pool) we could be exhausting max http connections in the s3 client more easily leading to the above error. I'm not sure if this is actually the case for that test failure though. The legacy TrinoS3FileSystem had a default max connections of 500: https://github.com/trinodb/trino/blob/master/lib/trino-hdfs/src/main/java/io/trino/hdfs/s3/HiveS3Config.java#L65 Currently S3FileSystemConfig IMO I think we should increase the default max connections to 500. WDYT @electrum |
The test failure in |
Increasing the max connections sounds good to me. |
776087c
to
2e2a125
Compare
Previously used forkjoin common pool meant for cpu bound operations
Aligns native S3FileSystem with legacy TrinoS3FileSystem
2e2a125
to
c7eb64e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems fine to me
@@ -120,12 +124,15 @@ public S3FileSystemFactory(OpenTelemetry openTelemetry, S3FileSystemConfig confi | |||
config.getSseKmsKeyId(), | |||
Optional.empty(), | |||
config.getCannedAcl()); | |||
|
|||
this.uploadExecutor = newCachedThreadPool(daemonThreadsNamed("s3-upload-%s")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no constructor provided args, you can inline this into the field declaration.
@@ -90,7 +90,7 @@ public static software.amazon.awssdk.core.retry.RetryMode getRetryMode(RetryMode | |||
private String sseKmsKeyId; | |||
private DataSize streamingPartSize = DataSize.of(16, MEGABYTE); | |||
private boolean requesterPays; | |||
private Integer maxConnections; | |||
private Integer maxConnections = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a comment why this number. (i guess the reason is more than just "it's what we used to have in the legacy", otherwise we wouldn't make this change -- there are other things we used to have and we wanted not to carry on)
Description
Currently the native s3filesystem
S3OutputStream
performs MPU usingsupplyAsync
which defaults to using the forkjoin common pool. This common pool is meant for CPU bound operations and shouldn't be used for IO bound tasks like S3 MPU.Additional context and related issues
Approach:
Decided to copy that approach for the native filesystem, main difference is allowing configuring max number of uploads threads per S3FileSystemConfig withs3.streaming.max-upload-threads
Considerations:
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: