-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved threading capabilities of S3+parquet #5451
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
malhotrashivam
added
parquet
Related to the Parquet integration
DocumentationNeeded
ReleaseNotesNeeded
Release notes are needed
s3
labels
May 2, 2024
malhotrashivam
requested review from
chipkent,
jmao-denver and
rcaudy
as code owners
May 2, 2024 22:32
malhotrashivam
commented
May 6, 2024
...rc/main/java/io/deephaven/engine/table/impl/locations/local/FileKeyValuePartitionLayout.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
rcaudy
previously approved these changes
May 7, 2024
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetInstructions.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/SeekableChannelsProviderFactory.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/SeekableChannelsProviderFactory.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/ParquetFileReader.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/ParquetFileReader.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
malhotrashivam
added
NoReleaseNotesNeeded
No release notes are needed.
and removed
ReleaseNotesNeeded
Release notes are needed
labels
May 7, 2024
rcaudy
reviewed
May 8, 2024
...ne/table/src/main/java/io/deephaven/engine/table/impl/OperationInitializationThreadPool.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3AsyncClientFactory.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
rcaudy
reviewed
May 8, 2024
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
rcaudy
approved these changes
May 8, 2024
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#209 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
DocumentationNeeded
NoReleaseNotesNeeded
No release notes are needed.
parquet
Related to the Parquet integration
s3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@devinrsmith pointed out an issue with very large number of threads being spawned when reading partitioned parquet data from S3. This was happening because the codebase was creating a new instance of S3AsyncClient for each partition file discovered and each instance internally by default creates large number of threads.
As part of this PR,
File
objects in favor of those acceptingURI
sDocumentation Update:
Added two new config parameters:
S3.numFutureCompletionThreads
: The number of threads used to complete the futures returned by the async aws s3 client. By default, this is set as the number of processors on the system.S3.numScheduledExecutorThreads
: The number of threads used for scheduling tasks such as async retry attempts and timeout task with the aws s3 client. By default, this is set as 5.