HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore #7214

steveloughran · 2024-12-06T18:38:26Z

HADOOP-19354

Factory interface with a parameter object creation method
Base class AbstractS3AInputStream for all streams to create
S3AInputStream subclasses that and has a factory
Production and test code to use it
Input stream callbacks pushed down to S3Store
S3Store to dynamically choose factory at startup, stop in close()
S3Store to implement the factory interface, completing final binding operations (callbacks, stats)

How was this patch tested?

S3 london

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran · 2024-12-06T18:44:34Z

test failure from me pushing disk allocator down into store and test case not setting the store up

tion
[ERROR] testInterruptSimplePut[disk-2](org.apache.hadoop.fs.s3a.scale.ITestS3ABlockOutputStreamInterruption)  Time elapsed: 2.421 s  <<< ERROR!
java.lang.NullPointerException
        at org.apache.hadoop.fs.s3a.impl.ErrorTranslation.maybeExtractChannelException(ErrorTranslation.java:267)
        at org.apache.hadoop.fs.s3a.impl.ErrorTranslation.maybeExtractIOException(ErrorTranslation.java:189)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:212)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347)
        at org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:207)
        at org.apache.hadoop.fs.s3a.WriteOperationHelper.putObject(WriteOperationHelper.java:525)
        at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.putObject(S3ABlockOutputStream.java:708)
        at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:500)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:410)
        at org.apache.hadoop.fs.s3a.scale.ITestS3ABlockOutputStreamInterruption.expectCloseInterrupted(ITestS3ABlockOutputStreamInterruption.java:406)
        at org.apache.hadoop.fs.s3a.scale.ITestS3ABlockOutputStreamInterruption.testInterruptSimplePut(ITestS3ABlockOutputStreamInterruption.java:386)

hadoop-yetus · 2025-01-03T19:51:16Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 18 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	39m 58s		trunk passed
+1 💚	compile	0m 45s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	compile	0m 35s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	checkstyle	0m 33s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 41s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	spotbugs	1m 8s		trunk passed
+1 💚	shadedclient	37m 24s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	37m 45s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 21s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 1 new + 25 unchanged - 0 fixed = 26 total (was 25)
+1 💚	mvnsite	0m 31s		the patch passed
-1 ❌	javadoc	0m 30s	/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-1 ❌	javadoc	0m 25s	/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚	spotbugs	1m 6s		the patch passed
+1 💚	shadedclient	37m 39s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 47s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		130m 4s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/8/artifact/out/Dockerfile
GITHUB PR	#7214
Optional Tests	dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname	Linux 5978404f578e 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `0f01d61`
Default Java	Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/8/testReport/
Max. process+thread count	623 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/8/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2025-01-07T16:48:39Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 52s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 18 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	38m 11s		trunk passed
+1 💚	compile	0m 46s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	checkstyle	0m 32s		trunk passed
+1 💚	mvnsite	0m 41s		trunk passed
+1 💚	javadoc	0m 41s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	spotbugs	1m 10s		trunk passed
+1 💚	shadedclient	37m 49s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	38m 11s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 32s		the patch passed
+1 💚	compile	0m 40s		the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javac	0m 40s		the patch passed
+1 💚	compile	0m 28s		the patch passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	javac	0m 28s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 11 new + 25 unchanged - 0 fixed = 36 total (was 25)
+1 💚	mvnsite	0m 35s		the patch passed
-1 ❌	javadoc	0m 30s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javadoc	0m 26s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
+1 💚	spotbugs	1m 13s		the patch passed
+1 💚	shadedclient	39m 4s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 49s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		130m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/9/artifact/out/Dockerfile
GITHUB PR	#7214
Optional Tests	dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname	Linux 7aa7731515a7 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `e7e454c`
Default Java	Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/9/testReport/
Max. process+thread count	529 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/9/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

mukund-thakur

Overall I like the design and refactoring.
One thought, can we make minimal prefetching changes in this PR and only focus on the interface and ClassicInputStream and create a separate PR for all prefetching stuff?

mukund-thakur · 2025-01-06T21:04:28Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -993,7 +983,7 @@ private void initThreadPools(Configuration conf) {
    unboundedThreadPool.allowCoreThreadTimeOut(true);
    executorCapacity = intOption(conf,
        EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 1);
-    if (prefetchEnabled) {
+    if (requirements.createFuturePool()) {


change the name to prefetchRequirements.

there's more requirements than just prefetching, e.g if vector IO support is needed then some extra threads are added to the pool passed down.

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/ObjectInputStreamCallbacks.java

steveloughran · 2025-01-08T13:36:10Z

I'm just setting this up so it is ready for the analytics stream work...making sure that prefetch is also covered is my way to validate the factory model, and that the options need to include things like the options to ask for a shared thread pool and stream thread pool, with the intent that analytics will use that too.

And once I do that, they all need a single base stream class.

For my vector IO resilience PR, once I have this PR in, I'm going to go back to #7105 and make it something which works with all object input streams

probe the stream for being "all in memory"; if so just do the reads sequentially, no need to parallelize.
if "partially in memory", give implementation that list of ranges and have them split into "all in memory" and "needs retrieval". again, in memory blocks can be filled in immediately (needs a lock on removing cache items)
range coalesce
sort by largest range first (stops the tail being the bottleneck)
queue for reading

read failure

single range: retry
merged range: complete successfully read parts
and incomplete parts are split into their originals, reread individually in same thread, with retries on them

the read failure stuff is essentially in my PR, so maybe we can rebase onto this, merge in and then pull up. Goal: analytics stream gets vector IO.

hadoop-yetus · 2025-01-08T20:41:01Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 18 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	39m 17s		trunk passed
+1 💚	compile	0m 44s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	compile	0m 35s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	checkstyle	0m 31s		trunk passed
+1 💚	mvnsite	0m 41s		trunk passed
+1 💚	javadoc	0m 41s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	spotbugs	1m 8s		trunk passed
+1 💚	shadedclient	37m 31s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	37m 53s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 21s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 11 new + 25 unchanged - 0 fixed = 36 total (was 25)
+1 💚	mvnsite	0m 32s		the patch passed
-1 ❌	javadoc	0m 29s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javadoc	0m 25s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
+1 💚	spotbugs	1m 7s		the patch passed
+1 💚	shadedclient	37m 7s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 45s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		129m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/10/artifact/out/Dockerfile
GITHUB PR	#7214
Optional Tests	dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname	Linux 6f6ef8b7b272 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c35c915`
Default Java	Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/10/testReport/
Max. process+thread count	608 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/10/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

rajdchak · 2025-01-15T15:39:51Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/ObjectInputStream.java

+    this.ioStatistics = streamStatistics.getIOStatistics();
+    this.inputPolicy = context.getInputPolicy();
+    streamStatistics.inputPolicySet(inputPolicy.ordinal());
+    this.boundedThreadPool = parameters.getBoundedThreadPool();


I see boundedThreadPool is used in S3AInputStream but not in S3APrefetchingInputStream, can we keep boundedThreadPool local to S3AInputStream?

each stream can declare what it wants thread-pool wise and we will allocate those to them. If they don't want it, they don't get it.
That bounded thread pool passed down is the semaphore pool we also use in uploads. It takes a subset of the shared pool, has its own pending queue and blocks the caller thread when that pending queue is full.

If the analytics stream doesn't currently need it -don't ask for any

But I do want to have the vector IO code to be moved out of S3AInputStream so it can work with the superclass, so all streams get it. These also want a bounded number of threads

rajdchak · 2025-01-15T15:49:23Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/ObjectInputStream.java

+
+/**
+ * A stream of data from an S3 object.
+ * The blase class includes common methods, stores


Nit: spelling base

rajdchak · 2025-01-16T00:56:21Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

   * This must be re-invoked after replacing the S3Client during test
   * runs.
+   * <p>
+   * It requires the S3Store to have been instantiated.
   * @param conf configuration.


@param conf is no longer required

rajdchak · 2025-01-16T01:02:42Z

...ools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/StreamThreadOptions.java

+   * @param sharedThreads Number of shared threads to included in the bounded pool.
+   * @param streamThreads How many threads per stream, ignoring vector IO requirements.
+   * @param createFuturePool Flag to enable creation of a future pool around the bounded thread pool.
+   */


@param vectorSupported missing

rajdchak · 2025-01-16T01:17:22Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -845,7 +826,7 @@ private S3AFileSystemOperations createFileSystemHandler() {
  @VisibleForTesting
  protected S3AStore createS3AStore(final ClientManager clientManager,
      final int rateLimitCapacity) {
-    return new S3AStoreBuilder()
+    final S3AStore st = new S3AStoreBuilder()


Nit: rename variable to meaningful name

steveloughran · 2025-01-16T10:26:47Z

@rajdchak thanks for the comments, will address

I do want to pull up the vector IO support, with integration with prefetch and cacheing.

For prefetch/caching stream we'd ask for a the requested ranges to be split up into

ranges which were wholly in memory: satisfy immediately in current thread (or copier thread?)
ranges which have an active prefetch to wholly satisfy the request: somehow wire prefetching up so as soon as it arrives, range gets the data.
other ranges (not cached, prefetched or only partially in cache): coalesce as needed, then retrieve. +notify stream that these ranges are being fetched, so no need to prefetch

It'd be good to collect stats on cache hit/miss here, to assess integration of vector reads with ranges. When a list of ranges comes down, there is less need to infer the next range and prefetch, and I'm not actually sure how important cacheing becomes. This is why setting parquet up to use vector IO already appears to give speedups comparable to the analytics stream benchmarks published.

what I want is best of both worlds: prefetch of rowgroups from stream inference -and when vector reads come in, statisfy those by returning current/active prefetches, or retrieve new ranges through ranged GET requests.

#7105 is where that will go; I've halted that until this is in. And I'll only worry about that integration with prefetched/cached blocks with the analytics stream.

ahmarsuhail

Thanks @steveloughran, looks good to me overall. Just need to allow for the ClientManager to be passed into the factory.

ahmarsuhail · 2025-01-21T10:10:40Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+            : 0);
+    // create an executor which is a subset of the
+    // bounded thread pool.
+    final SemaphoredDelegatingExecutor pool = new SemaphoredDelegatingExecutor(


Just a clarifying question, what is the benefit of creating a new SemaphoredDelegatingExecutor per stream vs just creating this once?

ok I think I get it, this is basically a way to ensure a single stream instance does not use up too many threads.

ahmarsuhail · 2025-01-21T10:28:07Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/StreamIntegration.java

+  public static ObjectInputStreamFactory createStreamFactory(final Configuration conf) {
+    // choose the default input stream type
+    InputStreamType defaultStream = InputStreamType.DEFAULT_STREAM_TYPE;
+    if (conf.getBoolean(PREFETCH_ENABLED_KEY, false)) {


We're saying PREFETCH_ENABLED_KEY deprecated, but still setting the stream type to prefetch. Is this something we want? If yes, we should make the message clearer to say "we're going to deprecate this in the future, but it works for now"

I'm trying to say "if you set it, we will tell you not to but still take the setting as the default...so it can be overridden by the new option"

ahmarsuhail · 2025-01-22T11:04:48Z

...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/InputStreamType.java

+ * Each enum value contains the factory function actually used to create
+ * the factory.
+ */
+public enum InputStreamType {


As discussed in #7295, the S3SeekableInputStreamFactory requires a client to be passed in. For this, we need a way to pass in the ClientManager here.

yeah. will do that. after Service.init() we will pass down a reference to the client manager, -though that won't be ready to use until Service.start().

Also, client manager should declare whether CRT is used or not, even before the client is instantiated (avoids launch-performance hit). Then the analyitics stream can just fail fast in start() based on that flag alone

steveloughran

/* sorry, had commented back on others but hadn't pressed the submit button. doing it now */

hadoop-tools/hadoop-aws/src/test/resources/log4j.properties

steveloughran · 2025-01-08T18:17:37Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -993,7 +983,7 @@ private void initThreadPools(Configuration conf) {
    unboundedThreadPool.allowCoreThreadTimeOut(true);
    executorCapacity = intOption(conf,
        EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 1);
-    if (prefetchEnabled) {
+    if (requirements.createFuturePool()) {


there's more requirements than just prefetching, e.g if vector IO support is needed then some extra threads are added to the pool passed down.

steveloughran · 2025-01-16T10:19:38Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/ObjectInputStream.java

+    this.ioStatistics = streamStatistics.getIOStatistics();
+    this.inputPolicy = context.getInputPolicy();
+    streamStatistics.inputPolicySet(inputPolicy.ordinal());
+    this.boundedThreadPool = parameters.getBoundedThreadPool();


each stream can declare what it wants thread-pool wise and we will allocate those to them. If they don't want it, they don't get it.
That bounded thread pool passed down is the semaphore pool we also use in uploads. It takes a subset of the shared pool, has its own pending queue and blocks the caller thread when that pending queue is full.

If the analytics stream doesn't currently need it -don't ask for any

But I do want to have the vector IO code to be moved out of S3AInputStream so it can work with the superclass, so all streams get it. These also want a bounded number of threads

steveloughran · 2025-01-23T11:27:58Z

(just had to rebase as it wouldn't merge with the directory marker changes. going to make backporting to branch-3.4 harder. FWIW I'm wondering if we should make the leap to a 3.5.0 release with java17 the baseline and keep 3.4.x the maintenance branch with CVE and jar updates only. Not discussed that on the mail lists yet though...)

hadoop-yetus · 2025-01-23T12:06:44Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	25m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 18 new or modified test files.
			_ trunk Compile Tests _
-1 ❌	mvninstall	0m 24s	/branch-mvninstall-root.txt	root in trunk failed.
-1 ❌	compile	0m 24s	/branch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in trunk failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	compile	0m 23s	/branch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in trunk failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
-0 ⚠️	checkstyle	0m 22s	/buildtool-branch-checkstyle-hadoop-tools_hadoop-aws.txt	The patch fails to run checkstyle in hadoop-aws
-1 ❌	mvnsite	0m 23s	/branch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in trunk failed.
-1 ❌	javadoc	0m 24s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in trunk failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javadoc	0m 24s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in trunk failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
-1 ❌	spotbugs	0m 24s	/branch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in trunk failed.
+1 💚	shadedclient	2m 53s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	3m 18s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
-1 ❌	mvninstall	0m 25s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	0m 25s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javac	0m 25s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	compile	0m 24s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
-1 ❌	javac	0m 24s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 22s	/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt	The patch fails to run checkstyle in hadoop-aws
-1 ❌	mvnsite	0m 25s	/patch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	javadoc	0m 23s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javadoc	0m 24s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
-1 ❌	spotbugs	0m 41s	/patch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	shadedclient	4m 59s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	0m 10s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 43s		The patch does not generate ASF License warnings.
		39m 8s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/12/artifact/out/Dockerfile
GITHUB PR	#7214
Optional Tests	dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname	Linux 1507c98d6d1f 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `88ee1d2`
Default Java	Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/12/testReport/
Max. process+thread count	46 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/12/console
versions	git=2.25.1 maven=3.6.3
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

ahmarsuhail · 2025-01-24T14:07:27Z

Thanks @steveloughran, this looks good now. We've just done an initial rebase on this here and we're able to integrate successfully. I will merge this into the feature branch, and then follow up with our changes.

steveloughran · 2025-01-27T14:41:39Z

@ahmarsuhail will look at it. just rebase and review of this; last failure seems VM rather than code

ahmarsuhail · 2025-01-27T15:34:38Z

@steveloughran do you want to merge this PR into trunk? Or do you want this to go in via our feature branch?

I created this PR: Creates S3AInputStream with a factory under S3AStore #7323 - This is just this current PR: HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore #7214, and I will merge it into our feature branch.
Then on the feature branch, we will merge integration code for analytics-accelerator-library. And will then create a PR of feature branch to trunk.

So either this PR goes into trunk directly, or it can go in as part of the feature branch.

@steveloughran

Original contributor: @steveloughran in PR #7214

hadoop-yetus · 2025-01-27T20:45:49Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	18m 11s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 18 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	41m 21s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	checkstyle	0m 33s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 41s		trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	spotbugs	1m 7s		trunk passed
+1 💚	shadedclient	38m 24s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	38m 45s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 37s		the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚	javac	0m 37s		the patch passed
+1 💚	compile	0m 28s		the patch passed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
+1 💚	javac	0m 28s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 1 new + 14 unchanged - 11 fixed = 15 total (was 25)
+1 💚	mvnsite	0m 32s		the patch passed
-1 ❌	javadoc	0m 30s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌	javadoc	0m 25s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga.
+1 💚	spotbugs	1m 7s		the patch passed
+1 💚	shadedclient	38m 19s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	0m 32s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		148m 18s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/13/artifact/out/Dockerfile
GITHUB PR	#7214
Optional Tests	dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname	Linux 908740910b43 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `b5346a1`
Default Java	Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~~us1-0ubuntu2~~20.04-ga
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/13/testReport/
Max. process+thread count	607 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7214/13/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2025-01-28T12:02:38Z

Do you think we should fallback if a stream factory fails to load? as if they depend on 3rd party libraries those libs may not be deployed across the cluster

Good: something works
Bad: you don't know what you've got.

We can/should add an iostats gauge to indicate which indicates which stream is in use -serve it up in FS and stream

ahmarsuhail · 2025-01-28T12:48:24Z

@steveloughran personally think we should throw the failure and not have the fallback. Users of both prefetching input stream and AAL will expect performance benefits from using them, and if the failures are not visible, it'll lead to people thinking those streams aren't any faster.

steveloughran · 2025-01-28T14:23:25Z

@ahmarsuhail +1

now, unrelated issue. It looks to me like the jersey update's associated junit stuff has stopped tests being discovered in hadoop-aws. I'm rebasing this PR onto the commit before that one just so I can make progress.

Can you check out and build trunk and tell me if your run of the hadoop-aws unit tests run any tests -or is it my setup (across both git clones i have of the repo)

ahmarsuhail · 2025-01-28T14:45:08Z

@steveloughran I just hit the same issue on my CRT PR, unable to run tests :(

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
[INFO] --- failsafe:3.0.0-M1:integration-test (sequential-integration-tests) @ hadoop-aws ---
[INFO]
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
[INFO] --- enforcer:3.5.0:enforce (depcheck) @ hadoop-aws ---
[INFO] Rule 0: org.apache.maven.enforcer.rules.dependency.DependencyConvergence passed
[INFO] Rule 1: org.apache.maven.enforcer.rules.dependency.BannedDependencies passed
[INFO]
[INFO] --- failsafe:3.0.0-M1:verify (default-integration-test) @ hadoop-aws ---
[INFO]
[INFO] --- failsafe:3.0.0-M1:verify (sequential-integration-tests) @ hadoop-aws ---

S3 InputStreams are created by a factory class, with the choice of factory dynamically chosen by the option fs.s3a.input.stream.type Supported values: classic, prefetching, analytics. S3AStore * Manages the creation and service lifecycle of the chosen factory, as well as forwarding stream construction requests to the chosen factory. * Provides the callbacks needed by both the factories and input streams. * StreamCapabilities.hasCapability(), which is relayed to the active factory. This avoids the FS having to know what capabilities are available in the stream.

Ability to create custom streams (type = custom), which reads class from "fs.s3a.input.stream.custom.factory". This is mainly for testing, especially CNFE and similar. Unit test TestStreamFactories for this. ObjectInputStreams save and export stream type to assist these tests too, as it enables assertions on the generated stream type. Simplified that logic related to the old prefetch enabled flag If fs.s3a.prefetch.enabled is true, the prefetch stream is returned, the stream.type option is not used at all. Simpler logic, simpler docs, fewer support calls. Parameters supplied to ObjectInputStreamFactory.bind converted to a parameter object. Allows for more parameters to be added later if ever required. ObjectInputStreamFactory returns more requirements to the store/fs. For this reason StreamThreadOptions threadRequirements(); is renamed StreamFactoryRequirements factoryRequirements() VectorIO context changes * Returned in factoryRequirements() * exiting configuration reading code moved into StreamIntegration.populateVectoredIOContext() * Streams which don't have custom vector IO, e.g. prefetching can return a minimum seek range of 0. This disables range merging on the default PositionedReadable implementation, so ensures that they will only get asked for data which will be read...leaving prefetch/cache code to know exactly what is needed. Other * Draft docs. * Stream capability declares stream type & is exported through FS too. (todo: test, document, add to bucket-info) * ConfigurationHelper.resolveEnum() supercedes Configuration.getEnum() with - case independence - fallback is a supplier<Enum> rather than a simple value. Change-Id: I2e59300af48042df8173de61d0b3d6139a0ae7fe

steveloughran · 2025-01-30T12:33:49Z

big new version, lots of changes
not compatible with anyone rebasing -but I think this is stabilising now. Sorry!
this PR is based on the last hadoop-trunk where the tests ran

Not fully tested yet. I want to have the stream type passed down as a -D option

steveloughran requested review from ahmarsuhail and mukund-thakur December 6, 2024 18:38

steveloughran marked this pull request as draft December 6, 2024 18:38

github-actions bot added trunk TOOLS AWS labels Dec 6, 2024

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from 5a32f16 to 7d76047 Compare December 6, 2024 18:45

ahmarsuhail mentioned this pull request Dec 20, 2024

HADOOP-19354. S3AInputStream to be created by factory under S3AStore. #7237

Open

apache deleted a comment from hadoop-yetus Jan 1, 2025

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from a944b86 to 0f01d61 Compare January 3, 2025 17:39

steveloughran marked this pull request as ready for review January 3, 2025 18:08

apache deleted a comment from hadoop-yetus Jan 3, 2025

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from 0f01d61 to e7e454c Compare January 7, 2025 14:36

mukund-thakur reviewed Jan 7, 2025

View reviewed changes

rajdchak reviewed Jan 15, 2025

View reviewed changes

rajdchak reviewed Jan 16, 2025

View reviewed changes

ahmarsuhail requested changes Jan 22, 2025

View reviewed changes

steveloughran commented Jan 22, 2025

View reviewed changes

ahmarsuhail mentioned this pull request Jan 22, 2025

S3A S3Seekable stream refactor + move S3AInputStream creation to factory under S3AStore #7295

Open

4 tasks

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from 5f62a1d to 88ee1d2 Compare January 23, 2025 11:26

This was referenced Jan 24, 2025

HADOOP-19354. S3AInputStream to be created by factory under S3AStore #7322

Closed

Creates S3AInputStream with a factory under S3AStore #7323

Merged

ahmarsuhail added a commit that referenced this pull request Jan 27, 2025

Creates S3AInputStream with a factory under S3AStore (#7323)

4601be6

Original contributor: @steveloughran in PR #7214

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from 88ee1d2 to b5346a1 Compare January 27, 2025 18:16

steveloughran changed the title ~~HADOOP-19354. S3AInputStream to be created by factory under S3AStore~~ HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore Jan 28, 2025

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from b5346a1 to 745492d Compare January 28, 2025 16:32

steveloughran force-pushed the s3/HADOOP-19354-s3a-inputstream-factory branch from 745492d to 9c8e753 Compare January 28, 2025 16:33

github-actions bot added the Common label Jan 30, 2025

apache deleted a comment from hadoop-yetus Jan 30, 2025

HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore #7214

Are you sure you want to change the base?

HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore #7214

Conversation

steveloughran commented Dec 6, 2024 • edited Loading

How was this patch tested?

For code changes:

steveloughran commented Dec 6, 2024

hadoop-yetus commented Jan 3, 2025

hadoop-yetus commented Jan 7, 2025

mukund-thakur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Jan 8, 2025

hadoop-yetus commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rajdchak Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Jan 16, 2025

ahmarsuhail left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Jan 23, 2025

hadoop-yetus commented Jan 23, 2025

ahmarsuhail commented Jan 24, 2025

steveloughran commented Jan 27, 2025

ahmarsuhail commented Jan 27, 2025

hadoop-yetus commented Jan 27, 2025

steveloughran commented Jan 28, 2025

ahmarsuhail commented Jan 28, 2025

steveloughran commented Jan 28, 2025

ahmarsuhail commented Jan 28, 2025

steveloughran commented Jan 30, 2025

steveloughran commented Dec 6, 2024 •

edited

Loading

rajdchak Jan 16, 2025 •

edited

Loading