HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits #6468

shameersss1 · 2024-01-19T10:22:15Z

Description

The following are the operations which happens within a Task when it uses S3A Magic Committer.

During closing of stream

A 0-byte file with a same name of the original file is uploaded to S3 using PUT operation. Refer here for more information. This is done so that the downstream application like Spark could get the size of the file which is being written.
MultiPartUpload(MPU) metadata is uploaded to S3. Refer here for more information.

During TaskCommit

All the MPU metadata which the task wrote to S3 (There will be 'x' number of metadata file in S3 if a single task writes to 'x' files) are read and rewritten to S3 as a single metadata file. Refer here for more information

Since these operations happens with the Task JVM, We could optimize as well as save cost by storing these information in memory when Task memory usage is not a constraint. Hence the proposal here is to introduce a new MagicCommit Tracker called "InMemoryMagicCommitTracker" which will store the

Metadata of MPU in memory till the Task is committed
Store the size of the file which can be used by the downstream application to get the file size before it is committed/visible to the output path.

This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call given a Task writes only 1 file.

Testing

Ran S3A integration test in us-west-1 region using the following command
mvn -Dparallel-tests clean verify -Dit.test=ITestMagicCommitProtocol,ITestS3ACommitterMRJob,ITestMagicCommitProtocolFailure,ITestS3AHugeMagicCommits,ITestCommitOperationCost,ITestCommitOperations -Dtest=none -DtestsThreadCount=7
Manual verfication
Added Parameterized UnitTest in ITestMagicCommitProtocol

...ols/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java

shameersss1 · 2024-01-26T03:13:24Z

@steveloughran - I have converted draft PR to final one. Could you please review the changes.
Thanks

hadoop-yetus · 2024-01-26T05:28:58Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 49s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 33s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 39s		trunk passed
+1 💚	javadoc	0m 27s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 9s		trunk passed
+1 💚	shadedclient	37m 22s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 12 new + 5 unchanged - 0 fixed = 17 total (was 5)
+1 💚	mvnsite	0m 30s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
-1 ❌	javadoc	0m 24s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌	spotbugs	1m 10s	/new-spotbugs-hadoop-tools_hadoop-aws.html	hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
+1 💚	shadedclient	37m 49s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	2m 54s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 33s		The patch does not generate ASF License warnings.
		140m 50s

Reason	Tests
SpotBugs	module:hadoop-tools/hadoop-aws
	org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToBytesWritten isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 52]
	org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToMpuMetdadataMap isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 49]
	org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToPath isn't final but should be At InMemoryMagicCommitTracker.java:be At InMemoryMagicCommitTracker.java:[line 55]
Failed junit tests	hadoop.fs.s3a.commit.TestMagicCommitTrackerUtils

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 9ff01a4a6fb0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `ac49002`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/testReport/
Max. process+thread count	528 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-01-26T08:47:24Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 53s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	45m 57s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 33s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 31s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 25s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 8s		trunk passed
+1 💚	shadedclient	37m 58s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 20s		the patch passed
+1 💚	mvnsite	0m 33s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
-1 ❌	javadoc	0m 24s	/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~~20.04-b08 with JDK Private Build-1.8.0_392-8u392-ga-1~~20.04-b08 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚	spotbugs	1m 11s		the patch passed
+1 💚	shadedclient	39m 5s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	3m 4s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 34s		The patch does not generate ASF License warnings.
		140m 2s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 3dde19b6ae38 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `a5b27c5`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/testReport/
Max. process+thread count	616 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-01-27T16:45:30Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 47s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 42s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 31s		trunk passed
+1 💚	mvnsite	0m 41s		trunk passed
+1 💚	javadoc	0m 26s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 7s		trunk passed
+1 💚	shadedclient	37m 40s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 35s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 35s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 20s		the patch passed
+1 💚	mvnsite	0m 31s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 8s		the patch passed
+1 💚	shadedclient	37m 55s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 53s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 33s		The patch does not generate ASF License warnings.
		138m 54s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 629db3c4380a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `02fc9f1`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/testReport/
Max. process+thread count	527 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

shameersss1 · 2024-01-31T15:56:10Z

@steveloughran - Could you please review the changes?

steveloughran

I don't like the InMemoryMagicCommitTracker

It is using a static map of path to metadata. This will grow without constraint on a long live process. And you are left with the question "what if two jobs ever use the same path"

I would be happier if the static structures (which I can see why are needed) mapped job id -> task attempt id to something to track all pending files for that TA...the static map would be a weak ref to something held strongly by the actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd, there will be an automatic cleanup. Oh, and the static structures should be per fs instances, so when an fs is cleaned up: everything goes. things like hive to call .closeAllforUGI() to get rid of all filesystems for a given user in a long-lived process.

I'm also worried about how a job could abort a task attempt on a different process which has failed. Before worrying about that too much, why don't you look in spark to see how it calls abort. I'm not worried about MapReduce except for testing -so how do itself calls the committee isn't so important. For example: we don't care about recovery from a failed attempt as spark itself cannot do this.

It seems to me that the key costs of using S3 as as the store are:

file write: extra overhead of probes, need to always use MPU and write of two files
task commit: scan and read of all .pending files, write of .pendingset.
job commit: scan and read of .pendingset files

How important are the operations of phase #1? as writing the .pending file as today would allow for task abort on different process -task commit (the normal path) doesn't need it, though there will be extra list and delete overhead in job commmit.

Why don't you look into the spark code and see how it does it abort and therefore how important being able to support task abort from a separate process is. I think it probably is part of the cleanup.

steveloughran · 2024-02-02T17:52:32Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -52,6 +52,7 @@
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;

+import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;


move to same group as rest of apache imports

steveloughran · 2024-02-02T17:58:19Z

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java

+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
+import software.amazon.awssdk.services.s3.model.CompletedPart;


review import ordering and grouping.

Ack. I will import Code formatter xml is present here: https://github.com/apache/hadoop/tree/trunk/dev-support/code-formatter . IntelliJ users can directly import hadoop_idea_formatter.xml

...ols/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java

...hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITestMagicCommitProtocol.java

steveloughran · 2024-02-02T18:07:35Z

...ls/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java

+    return pendingSet;
+  }
+
+  private List<SinglePendingCommit> loadPendingCommitsFromMemory(TaskAttemptContext context)


nit, javadocs

...ls/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java

steveloughran · 2024-02-02T18:31:22Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
  @Retries.RetryTranslated
  public FileStatus getFileStatus(final Path f) throws IOException {
    Path path = qualify(f);
+    if (isTrackMagicCommitsInMemoryEnabled(getConf()) && isMagicCommitPath(path)) {


this is a bit of a hack. not saying that's bad, just wondering if there is a more elegant solution.

I agree this an ugly hack. I couldn't find any better alternative. This will be used by downstream application like Spark which wants to get the file size of the file written by the task. This is supposed to be used in the same process which writes the file/initiated and upload the MPU.

steveloughran · 2024-02-03T14:20:05Z

I've thought about this some more. Here are some things which I believe we need

Marker files at the end of each path so that spark status reporting on different processes can get an update on an active job.
A way to abort all uploads of a failed task attempt -even from a different process. Probably also a way to abort the entire job.
Confidence that the inner memory store of pending uploads Will not grow it definitely.

Ignoring item number #3 for now, remember that we have #1 solved by adding a 0 byte marker with a header of "final length"; spark has some special handling zero byte files to use getXattr() and fall back to the probe for this -at the expense of a second HEAD request. Generating a modified FileStatus response from a single HEAD/getObjectMetadata() call Wood actually eliminate the need for that I wish I'd thought of it myself. Yes, we do break that guarantee that files listed are the same size as the files opened… but magic paths are, well, magic. We break a lot of guarantees there already.

The existing design should be retained even in memory; the calculation of final length something which can be done for all.

But: we do not need to save the .pending files just for task abort. All we need to do is be able to enumerate the upload IDs of all the files from that task attempt and cancel them. We can do that just by adding another header to the marker file. Task committee uses the memory data; task abort will need a deep scan of the task attempt, and all zero bite files with the proposed new header used to initiate water operations. This is only for task board an outlier case. For normal task commit there is no need to Scan the directory pause the pending files then generate a new pending set file for later pause commit. It is probably the Jason on the marshalling which is as much a performance killer here as the listing operation.

What do you think?

shameersss1 · 2024-02-05T06:21:33Z

@steveloughran - Thanks a lot a for a detailed review and some amazing question, The following are my thoughts on the different asks.

Marker files at the end of each path so that spark status reporting on different processes can get an update on an active job.

As far i know (Please correct me if i am wrong)

0-size marker files was specially added for Spark's use case.
After writing files, Spark tries to get the size of the files written for the statistic purpose (like showing the output bytes written) in the Spark History server UI.
This operation is being done as part of the BasicWriteStatsTracker class in Spark.
As i could see in my experiment, BasicWriteStatsTracker#getFileSize is called in the executor process itself.

That being said, Since the same process is calling BasicWriteStatsTracker#getFileSize is it still required to have 0 marker file? I have solved this by adding a check in FileStatus method by returing the file size corresponding to the magic path/file.

A way to abort all uploads of a failed task attempt -even from a different process. Probably also a way to abort the entire job.

Thinking from Spark's perspective,

When a taskAttempt fails (gracefully), abortTask operation is called. This is operation is called within the same process and hence we can fetch the MPU metadata from the memory itself.
If a taskAttempt fails (ungracefully and all retries) are exhausted, When abortJob operation is called which will internally invoke cleanup which lists all the pending multi part upload and aborts them.

That being said, I am not sure if there is any such use case of abortingTask from another process. In such cases, The abortJob will handle it i guess.

Confidence that the inner memory store of pending uploads Will not grow it definitely.

The static map entry is removed is during taskCommit or abortTask operations and hence it guaranteed that there is no memory leak (unless there is some unexplored corner case).
The only case when it grows large, is when there are large number of concurrent jobs reusing the same executor JVM, Since we don't enable the "inmemory" by default we should be good. That being said, maybe we should call this out in the documentation.

Doe it make sense? Or am i missing anything?

shameersss1 · 2024-02-05T07:06:29Z

static map of path to metadata. This will grow without constraint on a long live process.

The entries to the Map are removed during commitTask or abortTask operation to keep memory under control.

Two jobs writing to same path will it corrupt the Map ?

No, The path (complete) is guaranteed to be unique The paths stored here as part of private static Map<String, List<Path>> taskAttemptIdToPath = new ConcurrentHashMap<>(); is the magic path, Eventhough the file name might be same, The magic path for two different jobs will be different since the jobId is included in the path.

the static map would be a weak ref to something held strongly by the actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd,

Since the entries from the HashMap are removed during commitTask or abortTask operation is WeakHashMap still required?

static structures should be per fs instances, so when an fs is cleaned up

I am not sure why it should be scoped under fs object. For a simiar behaviour with storing in s3, Shouldn't the static structure be available to the whole JVM ? I mean shouldn't we able to access static structure irrespective of the fs object.

'm also worried about how a job could abort a task attempt on a different process which has failed. Before worrying about that too much, why don't you look in spark to see how it calls abort. I'm not worried about MapReduce except for testing -so how do itself calls the committee isn't so important. For example: we don't care about recovery from a failed attempt as spark itself cannot do this.

I have covered this as part of the comment here.

shameersss1 · 2024-02-05T09:05:08Z

@steveloughran - Thanks a lot a detailed review as well as amazing follow up question. I have addressed your comments, Please let me know your thoughts.

hadoop-yetus · 2024-02-05T11:26:26Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 51s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 54s		trunk passed
+1 💚	compile	0m 44s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 33s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 32s		trunk passed
+1 💚	mvnsite	0m 43s		trunk passed
+1 💚	javadoc	0m 26s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 9s		trunk passed
+1 💚	shadedclient	39m 5s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 32s		the patch passed
+1 💚	compile	0m 38s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 38s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 22s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
+1 💚	mvnsite	0m 37s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 27s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 12s		the patch passed
+1 💚	shadedclient	38m 24s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 55s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		141m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux f10aab5720a0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `7d21300`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/testReport/
Max. process+thread count	528 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

shameersss1 · 2024-02-08T14:17:17Z

@steveloughran - Gentle reminder for review

steveloughran

ok, minor comments and i've looked at where hadoop mr and spark both look at progress -it does seem to be in the same process.

Now, one more test: ITestTerasortOnS3A

it'd be great to add a new parameter of magic + in memory; this test actually uses a yarn minicluster and so really does run across processes. only runs with -Dscale, but it is the real test. will even let us compare the two options

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java

steveloughran · 2024-02-09T20:13:13Z

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/MagicCommitIntegration.java

@@ -20,17 +20,19 @@

 import java.util.List;

+import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;


goes into the hadoop block

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java

steveloughran · 2024-02-09T20:15:02Z

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java

+ */
+public class InMemoryMagicCommitTracker extends MagicCommitTracker {
+
+  // stores taskAttemptId to commit data mapping


make javadocs

steveloughran · 2024-02-09T20:16:07Z

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java

+public class InMemoryMagicCommitTracker extends MagicCommitTracker {
+
+  // stores taskAttemptId to commit data mapping
+  private static Map<String, List<SinglePendingCommit>>


and make these all final. I do think they should use weak/soft references, somehow

ack for final.

I do think they should use weak/soft references,

Is this required ? Given that we proactively remove the entries from HashMap when the task commits or aborts. Since it is not referenced any where, when gc happens it reclaims the memory.

steveloughran · 2024-02-09T20:16:28Z

...tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java

   * @param bytesWritten bytes written
   * @param iostatistics nullable IO statistics
   * @return false, indicating that the commit must fail.
-   * @throws IOException any IO problem.
+   * @throws IOException              any IO problem.


.../hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTrackerUtils.java

shameersss1 · 2024-02-12T07:10:45Z

@steveloughran - Thanks a lot for review comments, I have addressed the comments with the new commit d8db729e5568df5dc920604eff8167a575a5894c

Added test in ITestTerasortOnS3A
Fixed import ordering
Addressed other minor comments

hadoop-yetus · 2024-02-12T09:28:15Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 51s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 4 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 39s		trunk passed
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 32s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 26s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 9s		trunk passed
+1 💚	shadedclient	37m 54s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 28s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 25s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 25s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 20s		the patch passed
+1 💚	mvnsite	0m 30s		the patch passed
+1 💚	javadoc	0m 15s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 5s		the patch passed
+1 💚	shadedclient	37m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 48s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 33s		The patch does not generate ASF License warnings.
		138m 33s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 29f0f609a7d0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d8db729`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/testReport/
Max. process+thread count	526 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

shameersss1 · 2024-02-19T05:26:33Z

@steveloughran - Gentle reminder to review the changes.
Thanks

steveloughran · 2024-02-20T15:27:58Z

@shameersss1 I'm working on assisting getting the 3.4.0 release out right now. Anything you can do to assist testing would be wonderful, as I'm only worrying about release blockers.

shameersss1 · 2024-02-20T17:00:51Z

@steveloughran - I am glad to assist with the testing. Is there any release candidate branch for the same? Could you please share the wiki on what tests needs to be done?

shameersss1 · 2024-03-21T07:43:02Z

@steveloughran - Gentle reminder for review
Thanks

steveloughran

looks good, just a typo in a method name.

+1 pending that.

note, you will need to a followup in the docs -but we can get this in and tested while you do that...

steveloughran · 2024-03-25T16:49:58Z

...doop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java

+  }
+
+
+  public static Map<String, List<SinglePendingCommit>> getTaskAttemptIdToMpuMetdadata() {


nit: typo in method name

shameersss1 · 2024-03-26T09:37:12Z

@steveloughran - Thanks a lot for the detailed review. I have addressed your comments.

note, you will need to a followup in the docs -but we can get this in and tested while you do that...
What docs are we mentioning here? I have added the details in the committer.md file though.

steveloughran

typo to fix on CommitOperations
sorry, and missed the docs; only was looking at the more recent changes.

steveloughran · 2024-03-26T11:12:07Z

...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitOperations.java

@@ -584,7 +584,7 @@ public SinglePendingCommit uploadFileToPendingCommit(File localFile,
              destKey,
              uploadId,
              partNumber,
-              size).build();
+              size).build();x


hadoop-yetus · 2024-03-26T11:57:09Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 50s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 4 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	49m 35s		trunk passed
+1 💚	compile	0m 43s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	checkstyle	0m 33s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	javadoc	0m 27s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 35s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	1m 8s		trunk passed
+1 💚	shadedclient	38m 34s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
-1 ❌	mvninstall	0m 15s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	0m 16s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	javac	0m 16s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	compile	0m 16s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌	javac	0m 16s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 15s	/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt	The patch fails to run checkstyle in hadoop-aws
-1 ❌	mvnsite	0m 17s	/patch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	javadoc	0m 16s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
-1 ❌	javadoc	0m 17s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌	spotbugs	0m 16s	/patch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	shadedclient	40m 7s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	0m 20s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		139m 53s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 68ca386861b1 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `7239b2f`
Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/testReport/
Max. process+thread count	603 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

shameersss1 · 2024-03-26T14:14:35Z

@steveloughran - I have addressed your comments.

steveloughran

LGTM
+1

hadoop-yetus · 2024-03-26T16:40:05Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 51s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 4 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	49m 2s		trunk passed
+1 💚	compile	0m 45s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	checkstyle	0m 32s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	javadoc	0m 28s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 35s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	1m 8s		trunk passed
+1 💚	shadedclient	38m 32s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 37s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javac	0m 37s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13)
+1 💚	mvnsite	0m 32s		the patch passed
+1 💚	javadoc	0m 16s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 25s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	1m 9s		the patch passed
+1 💚	shadedclient	38m 51s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 49s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 38s		The patch does not generate ASF License warnings.
		144m 22s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/artifact/out/Dockerfile
GITHUB PR	#6468
JIRA Issue	HADOOP-19047
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 6a4507cb7348 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `8d739be`
Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/testReport/
Max. process+thread count	527 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

…6468) If the option fs.s3a.committer.magic.track.commits.in.memory.enabled is set to true, then rather than save data about in-progress uploads to S3, this information is cached in memory. If the number of files being committed is low, this will save network IO in both the generation of .pending and marker files, and in the scanning of task attempt directory trees during task commit. Contributed by Syed Shameerur Rahman

github-actions bot added trunk TOOLS AWS labels Jan 19, 2024

steveloughran reviewed Jan 21, 2024

View reviewed changes

...ols/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java Show resolved Hide resolved

shameersss1 force-pushed the HADOOP-19047 branch from 3d0489f to ac49002 Compare January 26, 2024 03:06

shameersss1 marked this pull request as ready for review January 26, 2024 03:07

shameersss1 requested a review from steveloughran January 26, 2024 03:09

shameersss1 force-pushed the HADOOP-19047 branch from ac49002 to a5b27c5 Compare January 26, 2024 06:26

shameersss1 force-pushed the HADOOP-19047 branch from a5b27c5 to 02fc9f1 Compare January 27, 2024 14:25

steveloughran self-assigned this Feb 1, 2024

steveloughran requested changes Feb 2, 2024

View reviewed changes

shameersss1 requested a review from steveloughran February 5, 2024 09:03

steveloughran requested changes Feb 9, 2024

View reviewed changes

shameersss1 requested a review from steveloughran February 12, 2024 08:08

steveloughran requested changes Mar 25, 2024

View reviewed changes

shameersss1 requested a review from steveloughran March 26, 2024 09:41

steveloughran requested changes Mar 26, 2024

View reviewed changes

shameersss1 added 6 commits March 26, 2024 17:08

HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits

325306d

Address PR review commnents

e43fa54

Fix import ordering, Test in ITestTerasortOnS3A and other minor fix

dcf2b12

HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits

85361f1

Fix PR comments

f8e9a1e

remove unwanted changes

8d739be

shameersss1 force-pushed the HADOOP-19047 branch from 7239b2f to 8d739be Compare March 26, 2024 14:14

shameersss1 requested a review from steveloughran March 26, 2024 14:14

steveloughran approved these changes Mar 26, 2024

View reviewed changes

steveloughran merged commit 032796a into apache:trunk Mar 26, 2024
1 of 2 checks passed

		@@ -20,17 +20,19 @@

		import java.util.List;

		import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;

		}


		public static Map<String, List<SinglePendingCommit>> getTaskAttemptIdToMpuMetdadata() {

HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits #6468

HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits #6468

Conversation

shameersss1 commented Jan 19, 2024 • edited Loading

shameersss1 commented Jan 26, 2024

hadoop-yetus commented Jan 26, 2024

hadoop-yetus commented Jan 26, 2024

hadoop-yetus commented Jan 27, 2024

shameersss1 commented Jan 31, 2024

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Feb 3, 2024

shameersss1 commented Feb 5, 2024 • edited Loading

shameersss1 commented Feb 5, 2024

shameersss1 commented Feb 5, 2024

hadoop-yetus commented Feb 5, 2024

shameersss1 commented Feb 8, 2024

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shameersss1 commented Feb 12, 2024

hadoop-yetus commented Feb 12, 2024

shameersss1 commented Feb 19, 2024

steveloughran commented Feb 20, 2024

shameersss1 commented Feb 20, 2024

shameersss1 commented Mar 21, 2024

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shameersss1 commented Mar 26, 2024

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Mar 26, 2024

shameersss1 commented Mar 26, 2024

steveloughran left a comment

Choose a reason for hiding this comment

hadoop-yetus commented Mar 26, 2024

shameersss1 commented Jan 19, 2024 •

edited

Loading

shameersss1 commented Feb 5, 2024 •

edited

Loading