Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits #6468

Merged
merged 6 commits into from
Mar 26, 2024

Conversation

shameersss1
Copy link
Contributor

@shameersss1 shameersss1 commented Jan 19, 2024

Description

The following are the operations which happens within a Task when it uses S3A Magic Committer.

During closing of stream

  1. A 0-byte file with a same name of the original file is uploaded to S3 using PUT operation. Refer here for more information. This is done so that the downstream application like Spark could get the size of the file which is being written.

  2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer here for more information.

During TaskCommit

  1. All the MPU metadata which the task wrote to S3 (There will be 'x' number of metadata file in S3 if a single task writes to 'x' files) are read and rewritten to S3 as a single metadata file. Refer here for more information

Since these operations happens with the Task JVM, We could optimize as well as save cost by storing these information in memory when Task memory usage is not a constraint. Hence the proposal here is to introduce a new MagicCommit Tracker called "InMemoryMagicCommitTracker" which will store the

  1. Metadata of MPU in memory till the Task is committed
  2. Store the size of the file which can be used by the downstream application to get the file size before it is committed/visible to the output path.

This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call given a Task writes only 1 file.

Testing

  1. Ran S3A integration test in us-west-1 region using the following command
    mvn -Dparallel-tests clean verify -Dit.test=ITestMagicCommitProtocol,ITestS3ACommitterMRJob,ITestMagicCommitProtocolFailure,ITestS3AHugeMagicCommits,ITestCommitOperationCost,ITestCommitOperations -Dtest=none -DtestsThreadCount=7

  2. Manual verfication

  3. Added Parameterized UnitTest in ITestMagicCommitProtocol

@shameersss1 shameersss1 marked this pull request as ready for review January 26, 2024 03:07
@shameersss1
Copy link
Contributor Author

@steveloughran - I have converted draft PR to final one. Could you please review the changes.
Thanks

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 33s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 37m 22s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 12 new + 5 unchanged - 0 fixed = 17 total (was 5)
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
-1 ❌ javadoc 0m 24s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌ spotbugs 1m 10s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
+1 💚 shadedclient 37m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 2m 54s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
140m 50s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToBytesWritten isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 52]
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToMpuMetdadataMap isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 49]
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToPath isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 55]
Failed junit tests hadoop.fs.s3a.commit.TestMagicCommitTrackerUtils
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 9ff01a4a6fb0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ac49002
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/testReport/
Max. process+thread count 528 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 45m 57s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 25s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 8s trunk passed
+1 💚 shadedclient 37m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
-1 ❌ javadoc 0m 24s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-120.04-b08 with JDK Private Build-1.8.0_392-8u392-ga-120.04-b08 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 1m 11s the patch passed
+1 💚 shadedclient 39m 5s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 4s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
140m 2s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 3dde19b6ae38 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a5b27c5
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/testReport/
Max. process+thread count 616 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 42s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 26s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 7s trunk passed
+1 💚 shadedclient 37m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 35s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 37m 55s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 53s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
138m 54s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 629db3c4380a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 02fc9f1
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/testReport/
Max. process+thread count 527 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Could you please review the changes?

@steveloughran steveloughran self-assigned this Feb 1, 2024
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the InMemoryMagicCommitTracker

It is using a static map of path to metadata. This will grow without constraint on a long live process. And you are left with the question "what if two jobs ever use the same path"

I would be happier if the static structures (which I can see why are needed) mapped job id -> task attempt id to something to track all pending files for that TA...the static map would be a weak ref to something held strongly by the actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd, there will be an automatic cleanup. Oh, and the static structures should be per fs instances, so when an fs is cleaned up: everything goes. things like hive to call .closeAllforUGI() to get rid of all filesystems for a given user in a long-lived process.

I'm also worried about how a job could abort a task attempt on a different process which has failed. Before worrying about that too much, why don't you look in spark to see how it calls abort. I'm not worried about MapReduce except for testing -so how do itself calls the committee isn't so important. For example: we don't care about recovery from a failed attempt as spark itself cannot do this.

It seems to me that the key costs of using S3 as as the store are:

  1. file write: extra overhead of probes, need to always use MPU and write of two files
  2. task commit: scan and read of all .pending files, write of .pendingset.
  3. job commit: scan and read of .pendingset files

How important are the operations of phase #1? as writing the .pending file as today would allow for task abort on different process -task commit (the normal path) doesn't need it, though there will be extra list and delete overhead in job commmit.

Why don't you look into the spark code and see how it does it abort and therefore how important being able to support task abort from a separate process is. I think it probably is part of the cleanup.

@@ -52,6 +52,7 @@
import java.util.concurrent.atomic.AtomicBoolean;
import javax.annotation.Nullable;

import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to same group as rest of apache imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

import org.apache.hadoop.fs.statistics.IOStatistics;
import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
import software.amazon.awssdk.services.s3.model.CompletedPart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review import ordering and grouping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I will import Code formatter xml is present here: https://github.com/apache/hadoop/tree/trunk/dev-support/code-formatter . IntelliJ users can directly import hadoop_idea_formatter.xml

return pendingSet;
}

private List<SinglePendingCommit> loadPendingCommitsFromMemory(TaskAttemptContext context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
@Retries.RetryTranslated
public FileStatus getFileStatus(final Path f) throws IOException {
Path path = qualify(f);
if (isTrackMagicCommitsInMemoryEnabled(getConf()) && isMagicCommitPath(path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit of a hack. not saying that's bad, just wondering if there is a more elegant solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this an ugly hack. I couldn't find any better alternative. This will be used by downstream application like Spark which wants to get the file size of the file written by the task. This is supposed to be used in the same process which writes the file/initiated and upload the MPU.

@steveloughran
Copy link
Contributor

I've thought about this some more. Here are some things which I believe we need

  1. Marker files at the end of each path so that spark status reporting on different processes can get an update on an active job.
  2. A way to abort all uploads of a failed task attempt -even from a different process. Probably also a way to abort the entire job.
  3. Confidence that the inner memory store of pending uploads Will not grow it definitely.

Ignoring item number #3 for now, remember that we have #1 solved by adding a 0 byte marker with a header of "final length"; spark has some special handling zero byte files to use getXattr() and fall back to the probe for this -at the expense of a second HEAD request. Generating a modified FileStatus response from a single HEAD/getObjectMetadata() call Wood actually eliminate the need for that I wish I'd thought of it myself. Yes, we do break that guarantee that files listed are the same size as the files opened… but magic paths are, well, magic. We break a lot of guarantees there already.

The existing design should be retained even in memory; the calculation of final length something which can be done for all.

But: we do not need to save the .pending files just for task abort. All we need to do is be able to enumerate the upload IDs of all the files from that task attempt and cancel them. We can do that just by adding another header to the marker file. Task committee uses the memory data; task abort will need a deep scan of the task attempt, and all zero bite files with the proposed new header used to initiate water operations. This is only for task board an outlier case. For normal task commit there is no need to Scan the directory pause the pending files then generate a new pending set file for later pause commit. It is probably the Jason on the marshalling which is as much a performance killer here as the listing operation.

What do you think?

@shameersss1
Copy link
Contributor Author

shameersss1 commented Feb 5, 2024

@steveloughran - Thanks a lot a for a detailed review and some amazing question, The following are my thoughts on the different asks.

  1. Marker files at the end of each path so that spark status reporting on different processes can get an update on an active job.

As far i know (Please correct me if i am wrong)

  1. 0-size marker files was specially added for Spark's use case.
  2. After writing files, Spark tries to get the size of the files written for the statistic purpose (like showing the output bytes written) in the Spark History server UI.
  3. This operation is being done as part of the BasicWriteStatsTracker class in Spark.
  4. As i could see in my experiment, BasicWriteStatsTracker#getFileSize is called in the executor process itself.

That being said, Since the same process is calling BasicWriteStatsTracker#getFileSize is it still required to have 0 marker file? I have solved this by adding a check in FileStatus method by returing the file size corresponding to the magic path/file.


  1. A way to abort all uploads of a failed task attempt -even from a different process. Probably also a way to abort the entire job.

Thinking from Spark's perspective,

  1. When a taskAttempt fails (gracefully), abortTask operation is called. This is operation is called within the same process and hence we can fetch the MPU metadata from the memory itself.

  2. If a taskAttempt fails (ungracefully and all retries) are exhausted, When abortJob operation is called which will internally invoke cleanup which lists all the pending multi part upload and aborts them.

That being said, I am not sure if there is any such use case of abortingTask from another process. In such cases, The abortJob will handle it i guess.


  1. Confidence that the inner memory store of pending uploads Will not grow it definitely.
  1. The static map entry is removed is during taskCommit or abortTask operations and hence it guaranteed that there is no memory leak (unless there is some unexplored corner case).

  2. The only case when it grows large, is when there are large number of concurrent jobs reusing the same executor JVM, Since we don't enable the "inmemory" by default we should be good. That being said, maybe we should call this out in the documentation.


Doe it make sense? Or am i missing anything?

@shameersss1
Copy link
Contributor Author

  1. static map of path to metadata. This will grow without constraint on a long live process.

The entries to the Map are removed during commitTask or abortTask operation to keep memory under control.


  1. Two jobs writing to same path will it corrupt the Map ?

No, The path (complete) is guaranteed to be unique The paths stored here as part of private static Map<String, List<Path>> taskAttemptIdToPath = new ConcurrentHashMap<>(); is the magic path, Eventhough the file name might be same, The magic path for two different jobs will be different since the jobId is included in the path.


  1. the static map would be a weak ref to something held strongly by the actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd,

Since the entries from the HashMap are removed during commitTask or abortTask operation is WeakHashMap still required?


  1. static structures should be per fs instances, so when an fs is cleaned up

I am not sure why it should be scoped under fs object. For a simiar behaviour with storing in s3, Shouldn't the static structure be available to the whole JVM ? I mean shouldn't we able to access static structure irrespective of the fs object.


  1. 'm also worried about how a job could abort a task attempt on a different process which has failed. Before worrying about that too much, why don't you look in spark to see how it calls abort. I'm not worried about MapReduce except for testing -so how do itself calls the committee isn't so important. For example: we don't care about recovery from a failed attempt as spark itself cannot do this.

I have covered this as part of the comment here.


@shameersss1
Copy link
Contributor Author

@steveloughran - Thanks a lot a detailed review as well as amazing follow up question. I have addressed your comments, Please let me know your thoughts.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 54s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 43s trunk passed
+1 💚 javadoc 0m 26s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 39m 5s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 22s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 12s the patch passed
+1 💚 shadedclient 38m 24s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 55s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
141m 44s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux f10aab5720a0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7d21300
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/testReport/
Max. process+thread count 528 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Gentle reminder for review

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, minor comments and i've looked at where hadoop mr and spark both look at progress -it does seem to be in the same process.

Now, one more test: ITestTerasortOnS3A

it'd be great to add a new parameter of magic + in memory; this test actually uses a yarn minicluster and so really does run across processes. only runs with -Dscale, but it is the real test. will even let us compare the two options

@@ -20,17 +20,19 @@

import java.util.List;

import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goes into the hadoop block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

*/
public class InMemoryMagicCommitTracker extends MagicCommitTracker {

// stores taskAttemptId to commit data mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

public class InMemoryMagicCommitTracker extends MagicCommitTracker {

// stores taskAttemptId to commit data mapping
private static Map<String, List<SinglePendingCommit>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make these all final. I do think they should use weak/soft references, somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack for final.

I do think they should use weak/soft references,

Is this required ? Given that we proactively remove the entries from HashMap when the task commits or aborts. Since it is not referenced any where, when gc happens it reclaims the memory.

* @param bytesWritten bytes written
* @param iostatistics nullable IO statistics
* @return false, indicating that the commit must fail.
* @throws IOException any IO problem.
* @throws IOException any IO problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@shameersss1
Copy link
Contributor Author

@steveloughran - Thanks a lot for review comments, I have addressed the comments with the new commit d8db729e5568df5dc920604eff8167a575a5894c

  1. Added test in ITestTerasortOnS3A
  2. Fixed import ordering
  3. Addressed other minor comments

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 39s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 26s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 37m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 5s the patch passed
+1 💚 shadedclient 37m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 48s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
138m 33s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 29f0f609a7d0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d8db729
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/testReport/
Max. process+thread count 526 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Gentle reminder to review the changes.
Thanks

@steveloughran
Copy link
Contributor

@shameersss1 I'm working on assisting getting the 3.4.0 release out right now. Anything you can do to assist testing would be wonderful, as I'm only worrying about release blockers.

@shameersss1
Copy link
Contributor Author

@steveloughran - I am glad to assist with the testing. Is there any release candidate branch for the same? Could you please share the wiki on what tests needs to be done?

@shameersss1
Copy link
Contributor Author

@steveloughran - Gentle reminder for review
Thanks

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a typo in a method name.

+1 pending that.

note, you will need to a followup in the docs -but we can get this in and tested while you do that...

}


public static Map<String, List<SinglePendingCommit>> getTaskAttemptIdToMpuMetdadata() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo in method name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@shameersss1
Copy link
Contributor Author

@steveloughran - Thanks a lot for the detailed review. I have addressed your comments.

note, you will need to a followup in the docs -but we can get this in and tested while you do that...
What docs are we mentioning here? I have added the details in the committer.md file though.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. typo to fix on CommitOperations
  2. sorry, and missed the docs; only was looking at the more recent changes.

@@ -584,7 +584,7 @@ public SinglePendingCommit uploadFileToPendingCommit(File localFile,
destKey,
uploadId,
partNumber,
size).build();
size).build();x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 35s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 33s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 8s trunk passed
+1 💚 shadedclient 38m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 15s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 0m 16s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ javac 0m 16s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ compile 0m 16s /patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌ javac 0m 16s /patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 15s /buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt The patch fails to run checkstyle in hadoop-aws
-1 ❌ mvnsite 0m 17s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
-1 ❌ javadoc 0m 17s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌ spotbugs 0m 16s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 40m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 20s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
139m 53s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 68ca386861b1 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7239b2f
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/testReport/
Max. process+thread count 603 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - I have addressed your comments.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
+1

@steveloughran steveloughran merged commit 032796a into apache:trunk Mar 26, 2024
1 of 2 checks passed
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 49m 2s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 28s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 8s trunk passed
+1 💚 shadedclient 38m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13)
+1 💚 mvnsite 0m 32s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 38m 51s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 49s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
144m 22s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/artifact/out/Dockerfile
GITHUB PR #6468
JIRA Issue HADOOP-19047
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 6a4507cb7348 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8d739be
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/testReport/
Max. process+thread count 527 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran pushed a commit that referenced this pull request Mar 26, 2024
…6468)

If the option fs.s3a.committer.magic.track.commits.in.memory.enabled
is set to true, then rather than save data about in-progress uploads
to S3, this information is cached in memory.

If the number of files being committed is low, this will save network IO
in both the generation of .pending and marker files, and in the scanning
of task attempt directory trees during task commit.

Contributed by Syed Shameerur Rahman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants