Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer #5706

Merged
merged 5 commits into from
Jul 19, 2023

Conversation

modithah
Copy link
Contributor

@modithah modithah commented Jun 1, 2023

Description of PR

The ThreadPoolExecutor doesn't create more threads than coreThreads if an unbounded queue is used. This leads to no the driver only committing with a single thread

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@modithah modithah marked this pull request as ready for review June 1, 2023 11:55
@modithah
Copy link
Contributor Author

modithah commented Jun 1, 2023

@steveloughran Could you have a look into this

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 35s trunk passed
+1 💚 compile 0m 38s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 33s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 14s trunk passed
+1 💚 shadedclient 20m 29s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 27s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 3s the patch passed
+1 💚 shadedclient 20m 32s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 28s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
97m 14s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/artifact/out/Dockerfile
GITHUB PR #5706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1df3474dcf51 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 49d97f0
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/testReport/
Max. process+thread count 656 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

which s3 endpoint did you test against, and what parameters did you use on the build. For anything related to the committers, -Dscale will include the terasorts, so I'd like that.

@modithah
Copy link
Contributor Author

modithah commented Jun 2, 2023

Tested in us-east-1 same tests fail on the trunk (some due to my setup without sts). attaching the logs below

verify-18757.log
verify-trunk.log

@steveloughran
Copy link
Contributor

interesting failures...it's where having everyone test in their own config helps find many config-releated issues

ITestS3AFileSystemStatistic

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.393 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[ERROR] testBytesReadWithStream(org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic)  Time elapsed: 2.392 s  <<< FAILURE!
java.lang.AssertionError: Mismatch in number of FS bytes read by InputStreams expected:<2048> but was:<69944985>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic.testBytesReadWithStream(ITestS3AFileSystemStatistic.java:72)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

really bizarre that one, as we are doing two full reads of 1 KB file.
if it keeps happening, when you do a standalone test run, it'd be interesting to see the whole debug level logs which will be collected under target/ somewhere.

ITestS3ATemporaryCredentials

looks like it needs to be something which can be disabled, maybe by setting the endpoint to something special like "none". why not create a new JIRA for that for you to work on later...

ITestStagingCommitProtocolFailure

that shouldn't happen; as it looks like a setup expecting a failure is now passing. Probably needs fixing in the test, but we need to understand why the test is failing first.

this is the one which needs attention.

org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[ERROR] testCreateCommitter(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure)  Time elapsed: 0.97 s  <<< FAILURE!
java.lang.AssertionError: Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, but got the result: : StagingCommitter{AbstractS3ACommitter{role=Task committer attempt__0000_r_000000_0, name=staging, outputPath=s3a://hadoop-test-s3a-li/test/testCreateCommitter, workPath=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/target/build/test/job__0000-attempt__0000_r_000000_0/_temporary/0/_temporary/attempt__0000_r_000000_0, uuid='job__0000', uuid source=JobUUIDSource{text='JobID'}}, commitsDirectory=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/tmp/staging/mhewasinghage/job__0000/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND, uploadPartSize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job__0000}; taskId=attempt__0000_r_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@36e80411}; outputPath=file:/Users/mhewasinghage/Liveintent/repo/hadoop-fork/hadoop-tools/hadoop-aws/tmp/staging/mhewasinghage/job__0000/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}}
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:499)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384)
	at org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure.testCreateCommitter(ITestStagingCommitProtocolFailure.java:66)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:829)

@modithah
Copy link
Contributor Author

Sorry i couldn't get back to this earlier

ITestS3AFileSystemStatistic

The test succeeds when its run by itself. I suppose it could be becuase the fs might be reused between tests and we count the fsStats.getBytesRead()

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.51 s - in org.apache.hadoop.fs.s3a.statistics.ITestS3AFileSystemStatistic
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13.086 s

ITestStagingCommitProtocolFailure

Also works when running on its own. I suppose this could also be due to some shared objects.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.866 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11.339 s

@steveloughran
Copy link
Contributor

ITestS3AFileSystemStatistic

The test succeeds when its run by itself. I suppose it could be becuase the fs might be reused between tests and we count the fsStats.getBytesRead()

seems likely. can you create a new hadoop JIRA under HADOOP-18477 which can be used to track this.

ITestStagingCommitProtocolFailure

Also works when running on its own. I suppose this could also be due to some shared objects.

how about this one, as it is committer related, you fix by adding

  disableFilesystemCaching(conf);

at the end of createConfiguration()

@modithah
Copy link
Contributor Author

Fixed ITestStagingCommitProtocolFailure. Created the following tasks

https://issues.apache.org/jira/browse/HADOOP-18784 - ITestS3AFileSystemStatistic
https://issues.apache.org/jira/browse/HADOOP-18785 - ITestS3ATemporaryCredentials

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 52m 54s trunk passed
+1 💚 compile 0m 40s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 15s trunk passed
+1 💚 shadedclient 39m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 38m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 24s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
147m 19s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/artifact/out/Dockerfile
GITHUB PR #5706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 045a73b50b04 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cb1c7b6
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@modithah
Copy link
Contributor Author

modithah commented Jul 4, 2023

@steveloughran Could you take a second look? As far as I can tell this makes both 3.3.5 and 3.3.6 unusable with s3 without providing an alternative committer code.

@steveloughran
Copy link
Contributor

will do; i was offline for a few days

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code good, just imports to tune.

+1 pending this change

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 51m 4s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 37s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 33s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 39s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 12s trunk passed
+1 💚 shadedclient 33m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 33m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 43s The patch does not generate ASF License warnings.
136m 42s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/artifact/out/Dockerfile
GITHUB PR #5706
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux dc14e12aa5cc 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1591e2d
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/testReport/
Max. process+thread count 753 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5706/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@modithah modithah requested a review from steveloughran July 18, 2023 10:47
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
+1

@steveloughran steveloughran merged commit b6b2590 into apache:trunk Jul 19, 2023
yuyanlei-8130 added a commit to yuyanlei-8130/hadoop that referenced this pull request Jul 19, 2023
…read (apache#5706)

Contributed by Moditha Hewasinghage

<!--
  Thanks for sending a pull request!
    1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
    2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'.
-->

### Description of PR

### How was this patch tested?

### For code changes:

- [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
asfgit pushed a commit that referenced this pull request Jul 19, 2023
@steveloughran
Copy link
Contributor

merged to trunk; cherrypicked to branch-3.3 and ran the scale tests against s3 london; all good.

thanks!

@@ -236,7 +236,7 @@ private ExecutorService buildThreadPool(
.setDaemon(true)
.setNameFormat(THREAD_PREFIX + jobId + "-%d")
.build();
return new HadoopThreadPoolExecutor(0, numThreads,
return new HadoopThreadPoolExecutor(numThreads, numThreads,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @steveloughran . In this case, it seems that there is no workaround in Apache Hadoop 3.3.5 and 3.3.6. Do you have any recommendation for this issue? Currently, Apache Spark 3.5.0 RC1 tag is using Apache Hadoop 3.3.6.

[SPARK-44197][BUILD] Upgrade Hadoop to 3.3.6
[SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are going to have to go with it for now; i think we should be thinking about a 3.3.7 before long as we have some other abfs and s3a issues causing pain...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @steveloughran .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a cve rollup on protobuf, guava etc... not jackson, sadly

dongjoon-hyun added a commit to apache/spark that referenced this pull request Aug 4, 2023
### What changes were proposed in this pull request?

This PR aims to downgrade the Apache Hadoop dependency to 3.3.4 in `Apache Spark 3.5` in order to prevent any regression from `Apache Spark 3.4.x`. In other words, although `Apache Spark 3.5.x` will lose many bug fixes of Apache Hadoop 3.3.5 and 3.3.6, it will be in the same situation with `Apache Spark 3.4.x`.
- SPARK-44197 Upgrade Hadoop to 3.3.6 (#41744)
- SPARK-42913 Upgrade Hadoop to 3.3.5 (#39124)
- SPARK-43448 Remove dummy dependency `hadoop-openstack` (#41133)

On top of reverting SPARK-44197 and SPARK-42913, this PR has additional dependency exclusion change due to the following.
- SPARK-43880 Organize `hadoop-cloud` in standard maven project structure (#41380)

### Why are the changes needed?

There is a community report on S3A committer performance regression. Although it's one liner fix, there is no available Hadoop release with that fix at this time.
- HADOOP-18757: Bump corePoolSize of HadoopThreadPoolExecutor in s3a committer (apache/hadoop#5706)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #42345 from dongjoon-hyun/SPARK-44678.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
jbewing pushed a commit to HubSpot/hadoop that referenced this pull request May 28, 2024
…its in a single thread (apache#5706)

Contributed by Moditha Hewasinghage
johnnysohn pushed a commit to HubSpot/hadoop that referenced this pull request May 28, 2024
…its in a single thread (apache#5706) (#52)

Contributed by Moditha Hewasinghage

Co-authored-by: Moditha Hewasinghage <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants