HDFS-13616. Batch listing of multiple directories #1725

sunchao · 2019-11-19T18:35:13Z

One of the dominant workloads for external metadata services is listing of partition directories. This can end up being bottlenecked on RTT time when partition directories contain a small number of files. This is fairly common, since fine-grained partitioning is used for partition pruning by the query engines.

A batched listing API that takes multiple paths amortizes the RTT cost. Initial benchmarks show a 10-20x improvement in metadata loading performance.

sunchao · 2019-11-19T18:36:54Z

A rebase of the original patch by @umbrant . There are not too many conflicts when doing the rebasing. IMO the original patch is already in a very good shape. I'll start from here and see if people have any comments.

cc @jojochuang

umbrant · 2019-11-19T22:13:14Z

Glad that this is being picked back up! Thanks Chao!

hadoop-yetus · 2019-11-19T22:35:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 35s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	prototool	0m 0s	prototool was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 4 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	0m 25s	Maven dependency ordering for branch
+1 💚	mvninstall	18m 6s	trunk passed
+1 💚	compile	16m 57s	trunk passed
+1 💚	checkstyle	3m 8s	trunk passed
+1 💚	mvnsite	4m 38s	trunk passed
+1 💚	shadedclient	21m 50s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 52s	trunk passed
+0 🆗	spotbugs	1m 20s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	8m 38s	trunk passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 11s	the patch passed
+1 💚	compile	16m 18s	the patch passed
-1 ❌	cc	16m 18s	root generated 2 new + 24 unchanged - 2 fixed = 26 total (was 26)
-1 ❌	javac	16m 18s	root generated 1 new + 1869 unchanged - 0 fixed = 1870 total (was 1869)
-0 ⚠️	checkstyle	3m 8s	root: The patch generated 15 new + 1199 unchanged - 0 fixed = 1214 total (was 1199)
+1 💚	mvnsite	4m 40s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	xml	0m 3s	The patch has no ill-formed XML file.
+1 💚	shadedclient	12m 54s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 59s	the patch passed
+1 💚	findbugs	9m 14s	the patch passed
		_ Other Tests _
+1 💚	unit	9m 10s	hadoop-common in the patch passed.
+1 💚	unit	2m 18s	hadoop-hdfs-client in the patch passed.
-1 ❌	unit	87m 38s	hadoop-hdfs in the patch failed.
+1 💚	unit	7m 33s	hadoop-hdfs-rbf in the patch passed.
-1 ❌	asflicense	1m 3s	The patch generated 1 ASF License warnings.
		239m 30s

Reason	Tests
Failed junit tests	hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
	hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks

Subsystem	Report/Notes
Docker	Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/Dockerfile
GITHUB PR	#1725
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname	Linux efb692705f75 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `b033c68`
Default Java	1.8.0_222
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-compile-cc-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-compile-javac-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/testReport/
asflicense	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count	3938 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

jojochuang

Quick first pass

jojochuang · 2019-11-21T00:05:31Z

...project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java

@@ -697,6 +697,12 @@ boolean mkdirs(String src, FsPermission masked, boolean createParent)
  DirectoryListing getListing(String src, byte[] startAfter,
      boolean needLocation) throws IOException;

+  @Idempotent
+  BatchedDirectoryListing getBatchedListing(


I believe we want to add the annotation
@ReadOnly(isCoordinated = true)

Good catch. Done.

jojochuang · 2019-11-21T00:12:47Z

...-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java

+  @Override
+  public BatchedDirectoryListing getBatchedListing(String[] srcs,
+      byte[] startAfter, boolean needLocation) throws IOException {
+    throw new UnsupportedOperationException("Not implemented");


let's file a jira to support this API in RBF.

Done. Filed https://issues.apache.org/jira/browse/HDFS-15029

jojochuang · 2019-11-21T00:23:32Z

...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

+  public byte[] getSrcPathsHash(String[] srcs) {
+    MessageDigest md;
+    try {
+      md = MessageDigest.getInstance("MD5");


We should refine this method.

synchronize this method.

MessageDigest.getInstance() has big overhead so we should reuse it. (It's instantiates a new object each time)

After use, call MessageDigest#reset().

https://stackoverflow.com/questions/13802627/when-to-use-messagedigest-reset

Good point. Will fix this.

jojochuang · 2019-11-21T00:34:28Z

...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

+    return md.digest();
+  }
+
+  BatchedDirectoryListing getBatchedListing(String[] srcs, byte[] startAfter,


This method is longer than I can possibly understand.

I found a little hard to refactoring this given that several variables are used throughout the method, such as lastListing. Would it be helpful if we add more comments in the method body, explaining what each section is doing?

ok. i'm just being lazy.

hadoop-yetus · 2019-12-05T01:26:56Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 35s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	prototool	0m 0s	prototool was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 4 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	1m 6s	Maven dependency ordering for branch
+1 💚	mvninstall	18m 0s	trunk passed
+1 💚	compile	16m 57s	trunk passed
+1 💚	checkstyle	3m 8s	trunk passed
+1 💚	mvnsite	4m 39s	trunk passed
+1 💚	shadedclient	22m 17s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 55s	trunk passed
+0 🆗	spotbugs	1m 18s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	8m 43s	trunk passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 37s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 10s	the patch passed
+1 💚	compile	17m 11s	the patch passed
-1 ❌	cc	17m 11s	root generated 5 new + 21 unchanged - 5 fixed = 26 total (was 26)
-1 ❌	javac	17m 11s	root generated 1 new + 1870 unchanged - 0 fixed = 1871 total (was 1870)
-0 ⚠️	checkstyle	3m 4s	root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚	mvnsite	4m 28s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	xml	0m 3s	The patch has no ill-formed XML file.
+1 💚	shadedclient	12m 53s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 52s	the patch passed
+1 💚	findbugs	9m 9s	the patch passed
		_ Other Tests _
-1 ❌	unit	8m 49s	hadoop-common in the patch failed.
-1 ❌	unit	0m 50s	hadoop-hdfs-client in the patch failed.
-1 ❌	unit	0m 41s	hadoop-hdfs in the patch failed.
-1 ❌	unit	0m 31s	hadoop-hdfs-rbf in the patch failed.
-1 ❌	asflicense	0m 48s	The patch generated 1 ASF License warnings.
		145m 11s

Subsystem	Report/Notes
Docker	Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/Dockerfile
GITHUB PR	#1725
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname	Linux 0b5d40582342 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `8d82c44`
Default Java	1.8.0_222
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-compile-cc-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-compile-javac-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/testReport/
asflicense	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-asflicense-problems.txt
Max. process+thread count	1574 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

jojochuang

Also looks like it doesn't compile.

hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/ListingBenchmark.java

- Caches digest and make accesses to it synchronized. - Changes `parent` to `listedPath` according to CR on JIRA itself - Added documentation for a few public methods - Added metrics for listing operation

sunchao · 2019-12-06T22:40:27Z

One thing I'm not totally sure is the case where the input srcs is empty. Currently it throws the exception:

FileNotFoundException("One or more paths do not exist.")

which I think is not very accurate. Is FileNotFoundException appropriate here?

hadoop-yetus · 2019-12-07T01:56:11Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 36s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	prototool	0m 0s	prototool was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 4 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	1m 6s	Maven dependency ordering for branch
+1 💚	mvninstall	18m 10s	trunk passed
-1 ❌	compile	5m 42s	root in trunk failed.
+1 💚	checkstyle	2m 56s	trunk passed
+1 💚	mvnsite	4m 1s	trunk passed
+1 💚	shadedclient	20m 33s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 10s	trunk passed
+0 🆗	spotbugs	1m 8s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	7m 53s	trunk passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 7s	the patch passed
-1 ❌	compile	5m 39s	root in the patch failed.
-1 ❌	cc	5m 39s	root in the patch failed.
-1 ❌	javac	5m 39s	root in the patch failed.
-0 ⚠️	checkstyle	2m 53s	root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚	mvnsite	3m 39s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	xml	0m 3s	The patch has no ill-formed XML file.
+1 💚	shadedclient	13m 3s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 1s	the patch passed
+1 💚	findbugs	8m 24s	the patch passed
		_ Other Tests _
+1 💚	unit	8m 58s	hadoop-common in the patch passed.
-1 ❌	unit	2m 6s	hadoop-hdfs-client in the patch failed.
+1 💚	unit	87m 24s	hadoop-hdfs in the patch passed.
+1 💚	unit	7m 22s	hadoop-hdfs-rbf in the patch passed.
+1 💚	asflicense	0m 42s	The patch does not generate ASF License warnings.
		210m 38s

Reason	Tests
Failed junit tests	hadoop.hdfs.protocol.TestReadOnly

Subsystem	Report/Notes
Docker	Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/Dockerfile
GITHUB PR	#1725
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname	Linux f784a3824497 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `76bb297`
Default Java	1.8.0_222
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/branch-compile-root.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/testReport/
Max. process+thread count	4277 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-12-08T22:17:02Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 35s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 1s	No case conflicting files found.
+0 🆗	prototool	0m 0s	prototool was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 5 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	1m 6s	Maven dependency ordering for branch
+1 💚	mvninstall	18m 2s	trunk passed
-1 ❌	compile	5m 42s	root in trunk failed.
+1 💚	checkstyle	2m 57s	trunk passed
+1 💚	mvnsite	3m 59s	trunk passed
+1 💚	shadedclient	20m 23s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 10s	trunk passed
+0 🆗	spotbugs	1m 9s	Used deprecated FindBugs config; considering switching to SpotBugs.
-1 ❌	findbugs	2m 45s	hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant findbugs warnings.
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 6s	the patch passed
-1 ❌	compile	5m 35s	root in the patch failed.
-1 ❌	cc	5m 35s	root in the patch failed.
-1 ❌	javac	5m 35s	root in the patch failed.
-0 ⚠️	checkstyle	2m 55s	root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚	mvnsite	3m 41s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	xml	0m 3s	The patch has no ill-formed XML file.
+1 💚	shadedclient	13m 2s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 1s	the patch passed
+1 💚	findbugs	8m 19s	the patch passed
		_ Other Tests _
+1 💚	unit	8m 56s	hadoop-common in the patch passed.
+1 💚	unit	2m 5s	hadoop-hdfs-client in the patch passed.
-1 ❌	unit	87m 52s	hadoop-hdfs in the patch failed.
+1 💚	unit	7m 25s	hadoop-hdfs-rbf in the patch passed.
+1 💚	asflicense	0m 44s	The patch does not generate ASF License warnings.
		210m 41s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.TestFsck

Subsystem	Report/Notes
Docker	Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/Dockerfile
GITHUB PR	#1725
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname	Linux 0019221496f4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `11cd5b6`
Default Java	1.8.0_222
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/branch-compile-root.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/testReport/
Max. process+thread count	3973 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

jojochuang

I think the last one looks good to me.

hadoop-yetus · 2019-12-20T02:43:00Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 11s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+0 🆗	prototool	0m 1s	prototool was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 5 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	1m 11s	Maven dependency ordering for branch
+1 💚	mvninstall	22m 41s	trunk passed
+1 💚	compile	19m 54s	trunk passed
+1 💚	checkstyle	3m 23s	trunk passed
+1 💚	mvnsite	4m 27s	trunk passed
+1 💚	shadedclient	22m 59s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	4m 59s	trunk passed
+0 🆗	spotbugs	1m 27s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	9m 48s	trunk passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 28s	the patch passed
+1 💚	compile	17m 59s	the patch passed
-1 ❌	cc	17m 59s	root generated 2 new + 24 unchanged - 2 fixed = 26 total (was 26)
-1 ❌	javac	17m 59s	root generated 1 new + 1870 unchanged - 0 fixed = 1871 total (was 1870)
-0 ⚠️	checkstyle	3m 34s	root: The patch generated 18 new + 1201 unchanged - 0 fixed = 1219 total (was 1201)
+1 💚	mvnsite	4m 47s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	xml	0m 3s	The patch has no ill-formed XML file.
+1 💚	shadedclient	15m 2s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	5m 27s	the patch passed
+1 💚	findbugs	10m 24s	the patch passed
		_ Other Tests _
+1 💚	unit	9m 33s	hadoop-common in the patch passed.
+1 💚	unit	2m 8s	hadoop-hdfs-client in the patch passed.
-1 ❌	unit	109m 10s	hadoop-hdfs in the patch failed.
-1 ❌	unit	7m 58s	hadoop-hdfs-rbf in the patch failed.
+1 💚	asflicense	0m 55s	The patch does not generate ASF License warnings.
		278m 53s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
	hadoop.hdfs.TestFileChecksum
	hadoop.hdfs.TestFileChecksumCompositeCrc
	hadoop.hdfs.server.namenode.TestRedudantBlocks
	hadoop.hdfs.TestReconstructStripedFile
	hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination

Subsystem	Report/Notes
Docker	Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/Dockerfile
GITHUB PR	#1725
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname	Linux 54d09aa0d437 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `ef59ffd`
Default Java	1.8.0_232
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-compile-cc-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-compile-javac-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/testReport/
Max. process+thread count	3288 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

fanlinqian · 2022-12-04T10:23:02Z

Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows.
for (; srcsIndex < srcs.length; srcsIndex++) {
String src = srcs[srcsIndex];
HdfsPartialListing listing;
try {
DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation);
if (dirListing == null) {
throw new FileNotFoundException("Path " + src + " does not exist");}
listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing()));
numEntries += listing.getPartialListing().size();
lastListing = dirListing;

            } catch (Exception e) {
                if (e instanceof AccessControlException) {
                    logAuditEvent(false, operationName, src);}
                listing = new HdfsPartialListing(srcsIndex,
                        new RemoteException(e.getClass().getCanonicalName(), e.getMessage()));
                lastListing = null;
                LOG.info("Exception listing src {}", src, e);}
            listings.put(srcsIndex, listing);

          //My modification
            (lastListing.getRemainingEntries()!=0)
            {
                 break;
            }

            if (indexStartAfter.length != 0)
            {
                indexStartAfter = new byte[0];
            }
            // Terminate if we've reached the maximum listing size
            if (numEntries >= dir.getListLimit()) {
                break;
            }
        }

The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000.
Looking forward to your reply

steveloughran · 2023-02-17T16:18:40Z

@fanlinqian best to file an HDFS issue on the apache jira server.

(cherry picked from commit d7c4f8a) Change-Id: I7b058f0761a66558907401adf0340c86de3a5993

jojochuang self-requested a review November 19, 2019 20:07

jojochuang requested changes Nov 21, 2019

View reviewed changes

sunchao added the HDFS label Nov 28, 2019

jojochuang requested changes Dec 6, 2019

View reviewed changes

hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/ListingBenchmark.java Show resolved Hide resolved

sunchao added 3 commits December 6, 2019 14:12

WIP

e20ff5d

Address review comments

d1d131a

- Caches digest and make accesses to it synchronized. - Changes `parent` to `listedPath` according to CR on JIRA itself - Added documentation for a few public methods - Added metrics for listing operation

Add license header

cb8870c

sunchao force-pushed the HDFS-13616 branch from 18b63dc to cb8870c Compare December 6, 2019 22:24

Fix UT

f329487

jojochuang approved these changes Dec 12, 2019

View reviewed changes

jojochuang merged commit d7c4f8a into apache:trunk Jan 16, 2020

RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020

HDFS-13616. Batch listing of multiple directories (apache#1725)

5aad0e0

jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023

HDFS-13616. Batch listing of multiple directories (apache#1725)

a6d13fe

(cherry picked from commit d7c4f8a) Change-Id: I7b058f0761a66558907401adf0340c86de3a5993

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-13616. Batch listing of multiple directories #1725

HDFS-13616. Batch listing of multiple directories #1725

sunchao commented Nov 19, 2019 •

edited

Loading

sunchao commented Nov 19, 2019

umbrant commented Nov 19, 2019

hadoop-yetus commented Nov 19, 2019

jojochuang left a comment

jojochuang Nov 21, 2019

sunchao Dec 2, 2019

jojochuang Nov 21, 2019

sunchao Dec 2, 2019

jojochuang Nov 21, 2019

sunchao Dec 2, 2019

jojochuang Nov 21, 2019

sunchao Dec 2, 2019

jojochuang Dec 4, 2019

hadoop-yetus commented Dec 5, 2019

jojochuang left a comment

sunchao commented Dec 6, 2019

hadoop-yetus commented Dec 7, 2019

hadoop-yetus commented Dec 8, 2019

jojochuang left a comment

hadoop-yetus commented Dec 20, 2019

fanlinqian commented Dec 4, 2022

steveloughran commented Feb 17, 2023

HDFS-13616. Batch listing of multiple directories #1725

HDFS-13616. Batch listing of multiple directories #1725

Conversation

sunchao commented Nov 19, 2019 • edited Loading

sunchao commented Nov 19, 2019

umbrant commented Nov 19, 2019

hadoop-yetus commented Nov 19, 2019

jojochuang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Dec 5, 2019

jojochuang left a comment

Choose a reason for hiding this comment

sunchao commented Dec 6, 2019

hadoop-yetus commented Dec 7, 2019

hadoop-yetus commented Dec 8, 2019

jojochuang left a comment

Choose a reason for hiding this comment

hadoop-yetus commented Dec 20, 2019

fanlinqian commented Dec 4, 2022

steveloughran commented Feb 17, 2023

sunchao commented Nov 19, 2019 •

edited

Loading