HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

modithah · 2023-06-01T10:59:47Z

Description of PR

The ThreadPoolExecutor doesn't create more threads than coreThreads if an unbounded queue is used. This leads to no the driver only committing with a single thread

How was this patch tested?

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

modithah · 2023-06-01T11:59:42Z

@steveloughran Could you have a look into this

hadoop-yetus · 2023-06-01T12:38:30Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 34s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	41m 35s		trunk passed
+1 💚	compile	0m 38s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 36s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 33s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 34s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 14s		trunk passed
+1 💚	shadedclient	20m 29s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 27s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 30s		the patch passed
+1 💚	compile	0m 25s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 25s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 20s		the patch passed
+1 💚	mvnsite	0m 29s		the patch passed
+1 💚	javadoc	0m 16s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 3s		the patch passed
+1 💚	shadedclient	20m 32s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 28s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 39s		The patch does not generate ASF License warnings.
		97m 14s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/artifact/out/Dockerfile
GITHUB PR	#5706
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 1df3474dcf51 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `49d97f0`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/testReport/
Max. process+thread count	656 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2023-06-01T12:43:55Z

which s3 endpoint did you test against, and what parameters did you use on the build. For anything related to the committers, -Dscale will include the terasorts, so I'd like that.

modithah · 2023-06-02T13:14:56Z

Tested in us-east-1 same tests fail on the trunk (some due to my setup without sts). attaching the logs below

verify-18757.log
verify-trunk.log

steveloughran · 2023-06-02T16:09:10Z

interesting failures...it's where having everyone test in their own config helps find many config-releated issues

ITestS3AFileSystemStatistic

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.393 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[ERROR] testBytesReadWithStream(org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic)  Time elapsed: 2.392 s  <<< FAILURE!
java.lang.AssertionError: Mismatch in number of FS bytes read by InputStreams expected:<2048> but was:<69944985>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic.testBytesReadWithStream(ITestS3AFileSystemStatistic.java:72)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

really bizarre that one, as we are doing two full reads of 1 KB file.
if it keeps happening, when you do a standalone test run, it'd be interesting to see the whole debug level logs which will be collected under target/ somewhere.

ITestS3ATemporaryCredentials

looks like it needs to be something which can be disabled, maybe by setting the endpoint to something special like "none". why not create a new JIRA for that for you to work on later...

ITestStagingCommitProtocolFailure

that shouldn't happen; as it looks like a setup expecting a failure is now passing. Probably needs fixing in the test, but we need to understand why the test is failing first.

this is the one which needs attention.

org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[ERROR] testCreateCommitter(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure)  Time elapsed: 0.97 s  <<< FAILURE!
java.lang.AssertionError: Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, but got the result: : StagingCommitter{AbstractS3ACommitter{role=Task committer attempt__0000_r_000000_0, name=staging, outputPath=s3a://hadoop-test-s3a-li/test/testCreateCommitter, workPath=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/target/build/test/job__0000-attempt__0000_r_000000_0/_temporary/0/_temporary/attempt__0000_r_000000_0, uuid='job__0000', uuid source=JobUUIDSource{text='JobID'}}, commitsDirectory=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/tmp/staging/mhewasinghage/job__0000/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND, uploadPartSize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job__0000}; taskId=attempt__0000_r_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@36e80411}; outputPath=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/tmp/staging/mhewasinghage/job__0000/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}}
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384)
	at org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure.testCreateCommitter(ITestStagingCommitProtocolFailure.java:66)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:829)

modithah · 2023-06-23T09:13:25Z

Sorry i couldn't get back to this earlier

ITestS3AFileSystemStatistic

The test succeeds when its run by itself. I suppose it could be becuase the fs might be reused between tests and we count the fsStats.getBytesRead()

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.51 s - in org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13.086 s

ITestStagingCommitProtocolFailure

Also works when running on its own. I suppose this could also be due to some shared objects.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.866 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11.339 s

steveloughran · 2023-06-26T14:43:06Z

ITestS3AFileSystemStatistic

The test succeeds when its run by itself. I suppose it could be becuase the fs might be reused between tests and we count the fsStats.getBytesRead()

seems likely. can you create a new hadoop JIRA under HADOOP-18477 which can be used to track this.

ITestStagingCommitProtocolFailure

Also works when running on its own. I suppose this could also be due to some shared objects.

how about this one, as it is committer related, you fix by adding

  disableFilesystemCaching(conf);

at the end of createConfiguration()

modithah · 2023-06-27T12:56:34Z

Fixed ITestStagingCommitProtocolFailure. Created the following tasks

https://issues.apache.org/jira/browse/HADOOP-18784 - ITestS3AFileSystemStatistic
https://issues.apache.org/jira/browse/HADOOP-18785 - ITestS3ATemporaryCredentials

…isabling caching

hadoop-yetus · 2023-06-27T16:40:36Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 58s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	52m 54s		trunk passed
+1 💚	compile	0m 40s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 33s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 34s		trunk passed
+1 💚	mvnsite	0m 39s		trunk passed
+1 💚	javadoc	0m 30s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 15s		trunk passed
+1 💚	shadedclient	39m 48s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 19s		the patch passed
+1 💚	mvnsite	0m 30s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 4s		the patch passed
+1 💚	shadedclient	38m 45s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 24s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		147m 19s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/artifact/out/Dockerfile
GITHUB PR	#5706
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 045a73b50b04 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `cb1c7b6`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/testReport/
Max. process+thread count	530 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

modithah · 2023-07-04T09:53:25Z

@steveloughran Could you take a second look? As far as I can tell this makes both 3.3.5 and 3.3.6 unusable with s3 without providing an alternative committer code.

steveloughran · 2023-07-11T13:07:18Z

will do; i was offline for a few days

steveloughran

code good, just imports to tune.

+1 pending this change

...a/org/apache/hadoop/fs/s3a/commit/staging/integration/ITestStagingCommitProtocolFailure.java

This reverts commit e2a25b2.

hadoop-yetus · 2023-07-18T09:34:11Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 41s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	51m 4s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 39s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 37s		trunk passed
+1 💚	mvnsite	0m 45s		trunk passed
+1 💚	javadoc	0m 33s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 39s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 12s		trunk passed
+1 💚	shadedclient	33m 49s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 32s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 28s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 28s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 21s		the patch passed
+1 💚	mvnsite	0m 30s		the patch passed
+1 💚	javadoc	0m 17s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 27s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 4s		the patch passed
+1 💚	shadedclient	33m 48s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 39s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 43s		The patch does not generate ASF License warnings.
		136m 42s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/artifact/out/Dockerfile
GITHUB PR	#5706
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux dc14e12aa5cc 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `1591e2d`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/testReport/
Max. process+thread count	753 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran

LGTM
+1

…read (apache#5706) Contributed by Moditha Hewasinghage  ### Description of PR ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?

…read (#5706) Contributed by Moditha Hewasinghage

steveloughran · 2023-07-19T11:33:03Z

merged to trunk; cherrypicked to branch-3.3 and ran the scale tests against s3 london; all good.

thanks!

dongjoon-hyun · 2023-08-02T16:24:10Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java

@@ -236,7 +236,7 @@ private ExecutorService buildThreadPool(
        .setDaemon(true)
        .setNameFormat(THREAD_PREFIX + jobId + "-%d")
        .build();
-    return new HadoopThreadPoolExecutor(0, numThreads,
+    return new HadoopThreadPoolExecutor(numThreads, numThreads,


Hi, @steveloughran . In this case, it seems that there is no workaround in Apache Hadoop 3.3.5 and 3.3.6. Do you have any recommendation for this issue? Currently, Apache Spark 3.5.0 RC1 tag is using Apache Hadoop 3.3.6.

https://github.com/apache/spark/releases/tag/v3.5.0-rc1

[SPARK-44197][BUILD] Upgrade Hadoop to 3.3.6 [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

you are going to have to go with it for now; i think we should be thinking about a 3.3.7 before long as we have some other abfs and s3a issues causing pain...

Thank you, @steveloughran .

a cve rollup on protobuf, guava etc... not jackson, sadly

### What changes were proposed in this pull request? This PR aims to downgrade the Apache Hadoop dependency to 3.3.4 in `Apache Spark 3.5` in order to prevent any regression from `Apache Spark 3.4.x`. In other words, although `Apache Spark 3.5.x` will lose many bug fixes of Apache Hadoop 3.3.5 and 3.3.6, it will be in the same situation with `Apache Spark 3.4.x`. - SPARK-44197 Upgrade Hadoop to 3.3.6 (#41744) - SPARK-42913 Upgrade Hadoop to 3.3.5 (#39124) - SPARK-43448 Remove dummy dependency `hadoop-openstack` (#41133) On top of reverting SPARK-44197 and SPARK-42913, this PR has additional dependency exclusion change due to the following. - SPARK-43880 Organize `hadoop-cloud` in standard maven project structure (#41380) ### Why are the changes needed? There is a community report on S3A committer performance regression. Although it's one liner fix, there is no available Hadoop release with that fix at this time. - HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer (apache/hadoop#5706) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #42345 from dongjoon-hyun/SPARK-44678. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…read (apache#5706) Contributed by Moditha Hewasinghage

…its in a single thread (apache#5706) Contributed by Moditha Hewasinghage

…its in a single thread (apache#5706) (#52) Contributed by Moditha Hewasinghage Co-authored-by: Moditha Hewasinghage <[email protected]>

Bump corePoolSize of HadoopThreadPoolExecutor

49d97f0

modithah marked this pull request as ready for review June 1, 2023 11:55

HADOOP-18757: Fix ITestStagingCommitProtocolFailure on testSuite by d…

cb1c7b6

…isabling caching

steveloughran reviewed Jul 11, 2023

View reviewed changes

...a/org/apache/hadoop/fs/s3a/commit/staging/integration/ITestStagingCommitProtocolFailure.java Outdated Show resolved Hide resolved

modithah and others added 3 commits July 11, 2023 18:52

Fix imports

5e061f0

Test to trigger pipeline

e2a25b2

Revert "Test to trigger pipeline"

1591e2d

This reverts commit e2a25b2.

modithah requested a review from steveloughran July 18, 2023 10:47

steveloughran approved these changes Jul 19, 2023

View reviewed changes

steveloughran merged commit b6b2590 into apache:trunk Jul 19, 2023

asfgit pushed a commit that referenced this pull request Jul 19, 2023

HADOOP-18757. S3A Committer only finalizes the commits in a single th…

236b9aa

…read (#5706) Contributed by Moditha Hewasinghage

dongjoon-hyun reviewed Aug 2, 2023

View reviewed changes

dongjoon-hyun mentioned this pull request Aug 4, 2023

[SPARK-44678][BUILD][3.5] Downgrade Hadoop to 3.3.4 apache/spark#42345

Closed

bjornjorgensen mentioned this pull request Aug 5, 2023

Update to latest Hadoop 3.3.6 jupyter/docker-stacks#1937

Closed

1 task

bjornjorgensen mentioned this pull request Sep 16, 2023

Upgrade Apache Spark to 3.5.0 jupyter/docker-stacks#1995

Merged

4 tasks

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024

HADOOP-18757. S3A Committer only finalizes the commits in a single th…

c520937

…read (apache#5706) Contributed by Moditha Hewasinghage

jbewing pushed a commit to HubSpot/hadoop that referenced this pull request May 28, 2024

HubSpot Backport: HADOOP-18757. S3A Committer only finalizes the comm…

11ce663

…its in a single thread (apache#5706) Contributed by Moditha Hewasinghage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

modithah commented Jun 1, 2023 •

edited

Loading

modithah commented Jun 1, 2023

hadoop-yetus commented Jun 1, 2023

steveloughran commented Jun 1, 2023

modithah commented Jun 2, 2023

steveloughran commented Jun 2, 2023

modithah commented Jun 23, 2023

steveloughran commented Jun 26, 2023

modithah commented Jun 27, 2023

hadoop-yetus commented Jun 27, 2023

modithah commented Jul 4, 2023

steveloughran commented Jul 11, 2023

steveloughran left a comment

hadoop-yetus commented Jul 18, 2023

steveloughran left a comment

steveloughran commented Jul 19, 2023

dongjoon-hyun Aug 2, 2023

steveloughran Aug 8, 2023

dongjoon-hyun Aug 10, 2023

steveloughran Aug 22, 2023

HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

Conversation

modithah commented Jun 1, 2023 • edited Loading

Description of PR

How was this patch tested?

For code changes:

modithah commented Jun 1, 2023

hadoop-yetus commented Jun 1, 2023

steveloughran commented Jun 1, 2023

modithah commented Jun 2, 2023

steveloughran commented Jun 2, 2023

ITestS3AFileSystemStatistic

ITestS3ATemporaryCredentials

ITestStagingCommitProtocolFailure

modithah commented Jun 23, 2023

ITestS3AFileSystemStatistic

ITestStagingCommitProtocolFailure

steveloughran commented Jun 26, 2023

ITestS3AFileSystemStatistic

ITestStagingCommitProtocolFailure

modithah commented Jun 27, 2023

hadoop-yetus commented Jun 27, 2023

modithah commented Jul 4, 2023

steveloughran commented Jul 11, 2023

steveloughran left a comment

Choose a reason for hiding this comment

hadoop-yetus commented Jul 18, 2023

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran commented Jul 19, 2023

dongjoon-hyun Aug 2, 2023

Choose a reason for hiding this comment

steveloughran Aug 8, 2023

Choose a reason for hiding this comment

dongjoon-hyun Aug 10, 2023

Choose a reason for hiding this comment

steveloughran Aug 22, 2023

Choose a reason for hiding this comment

modithah commented Jun 1, 2023 •

edited

Loading