Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-13616. Batch listing of multiple directories #1725

Merged
merged 4 commits into from
Jan 16, 2020

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Nov 19, 2019

One of the dominant workloads for external metadata services is listing of partition directories. This can end up being bottlenecked on RTT time when partition directories contain a small number of files. This is fairly common, since fine-grained partitioning is used for partition pruning by the query engines.

A batched listing API that takes multiple paths amortizes the RTT cost. Initial benchmarks show a 10-20x improvement in metadata loading performance.

@sunchao
Copy link
Member Author

sunchao commented Nov 19, 2019

A rebase of the original patch by @umbrant . There are not too many conflicts when doing the rebasing. IMO the original patch is already in a very good shape. I'll start from here and see if people have any comments.

cc @jojochuang

@jojochuang jojochuang self-requested a review November 19, 2019 20:07
@umbrant
Copy link
Contributor

umbrant commented Nov 19, 2019

Glad that this is being picked back up! Thanks Chao!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for branch
+1 💚 mvninstall 18m 6s trunk passed
+1 💚 compile 16m 57s trunk passed
+1 💚 checkstyle 3m 8s trunk passed
+1 💚 mvnsite 4m 38s trunk passed
+1 💚 shadedclient 21m 50s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 52s trunk passed
+0 🆗 spotbugs 1m 20s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 8m 38s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 3m 11s the patch passed
+1 💚 compile 16m 18s the patch passed
-1 ❌ cc 16m 18s root generated 2 new + 24 unchanged - 2 fixed = 26 total (was 26)
-1 ❌ javac 16m 18s root generated 1 new + 1869 unchanged - 0 fixed = 1870 total (was 1869)
-0 ⚠️ checkstyle 3m 8s root: The patch generated 15 new + 1199 unchanged - 0 fixed = 1214 total (was 1199)
+1 💚 mvnsite 4m 40s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 12m 54s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 59s the patch passed
+1 💚 findbugs 9m 14s the patch passed
_ Other Tests _
+1 💚 unit 9m 10s hadoop-common in the patch passed.
+1 💚 unit 2m 18s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 87m 38s hadoop-hdfs in the patch failed.
+1 💚 unit 7m 33s hadoop-hdfs-rbf in the patch passed.
-1 ❌ asflicense 1m 3s The patch generated 1 ASF License warnings.
239m 30s
Reason Tests
Failed junit tests hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/Dockerfile
GITHUB PR #1725
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname Linux efb692705f75 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b033c68
Default Java 1.8.0_222
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-compile-cc-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 3938 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass

@@ -697,6 +697,12 @@ boolean mkdirs(String src, FsPermission masked, boolean createParent)
DirectoryListing getListing(String src, byte[] startAfter,
boolean needLocation) throws IOException;

@Idempotent
BatchedDirectoryListing getBatchedListing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to add the annotation
@ReadOnly(isCoordinated = true)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

@Override
public BatchedDirectoryListing getBatchedListing(String[] srcs,
byte[] startAfter, boolean needLocation) throws IOException {
throw new UnsupportedOperationException("Not implemented");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's file a jira to support this API in RBF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public byte[] getSrcPathsHash(String[] srcs) {
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refine this method.

  1. synchronize this method.
  2. MessageDigest.getInstance() has big overhead so we should reuse it. (It's instantiates a new object each time)
  3. After use, call MessageDigest#reset().

https://stackoverflow.com/questions/13802627/when-to-use-messagedigest-reset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will fix this.

return md.digest();
}

BatchedDirectoryListing getBatchedListing(String[] srcs, byte[] startAfter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is longer than I can possibly understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a little hard to refactoring this given that several variables are used throughout the method, such as lastListing. Would it be helpful if we add more comments in the method body, explaining what each section is doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. i'm just being lazy.

@sunchao sunchao added the HDFS label Nov 28, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 6s Maven dependency ordering for branch
+1 💚 mvninstall 18m 0s trunk passed
+1 💚 compile 16m 57s trunk passed
+1 💚 checkstyle 3m 8s trunk passed
+1 💚 mvnsite 4m 39s trunk passed
+1 💚 shadedclient 22m 17s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 55s trunk passed
+0 🆗 spotbugs 1m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 8m 43s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 37s Maven dependency ordering for patch
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 17m 11s the patch passed
-1 ❌ cc 17m 11s root generated 5 new + 21 unchanged - 5 fixed = 26 total (was 26)
-1 ❌ javac 17m 11s root generated 1 new + 1870 unchanged - 0 fixed = 1871 total (was 1870)
-0 ⚠️ checkstyle 3m 4s root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚 mvnsite 4m 28s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 12m 53s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 52s the patch passed
+1 💚 findbugs 9m 9s the patch passed
_ Other Tests _
-1 ❌ unit 8m 49s hadoop-common in the patch failed.
-1 ❌ unit 0m 50s hadoop-hdfs-client in the patch failed.
-1 ❌ unit 0m 41s hadoop-hdfs in the patch failed.
-1 ❌ unit 0m 31s hadoop-hdfs-rbf in the patch failed.
-1 ❌ asflicense 0m 48s The patch generated 1 ASF License warnings.
145m 11s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/Dockerfile
GITHUB PR #1725
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname Linux 0b5d40582342 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 8d82c44
Default Java 1.8.0_222
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-compile-cc-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 1574 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like it doesn't compile.

- Caches digest and make accesses to it synchronized.
- Changes `parent` to `listedPath` according to CR on JIRA itself
- Added documentation for a few public methods
- Added metrics for listing operation
@sunchao
Copy link
Member Author

sunchao commented Dec 6, 2019

One thing I'm not totally sure is the case where the input srcs is empty. Currently it throws the exception:

FileNotFoundException("One or more paths do not exist.")

which I think is not very accurate. Is FileNotFoundException appropriate here?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 6s Maven dependency ordering for branch
+1 💚 mvninstall 18m 10s trunk passed
-1 ❌ compile 5m 42s root in trunk failed.
+1 💚 checkstyle 2m 56s trunk passed
+1 💚 mvnsite 4m 1s trunk passed
+1 💚 shadedclient 20m 33s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 10s trunk passed
+0 🆗 spotbugs 1m 8s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 7m 53s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
-1 ❌ compile 5m 39s root in the patch failed.
-1 ❌ cc 5m 39s root in the patch failed.
-1 ❌ javac 5m 39s root in the patch failed.
-0 ⚠️ checkstyle 2m 53s root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚 mvnsite 3m 39s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 13m 3s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 1s the patch passed
+1 💚 findbugs 8m 24s the patch passed
_ Other Tests _
+1 💚 unit 8m 58s hadoop-common in the patch passed.
-1 ❌ unit 2m 6s hadoop-hdfs-client in the patch failed.
+1 💚 unit 87m 24s hadoop-hdfs in the patch passed.
+1 💚 unit 7m 22s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
210m 38s
Reason Tests
Failed junit tests hadoop.hdfs.protocol.TestReadOnly
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/Dockerfile
GITHUB PR #1725
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname Linux f784a3824497 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 76bb297
Default Java 1.8.0_222
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/branch-compile-root.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/testReport/
Max. process+thread count 4277 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/3/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 6s Maven dependency ordering for branch
+1 💚 mvninstall 18m 2s trunk passed
-1 ❌ compile 5m 42s root in trunk failed.
+1 💚 checkstyle 2m 57s trunk passed
+1 💚 mvnsite 3m 59s trunk passed
+1 💚 shadedclient 20m 23s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 10s trunk passed
+0 🆗 spotbugs 1m 9s Used deprecated FindBugs config; considering switching to SpotBugs.
-1 ❌ findbugs 2m 45s hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant findbugs warnings.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 3m 6s the patch passed
-1 ❌ compile 5m 35s root in the patch failed.
-1 ❌ cc 5m 35s root in the patch failed.
-1 ❌ javac 5m 35s root in the patch failed.
-0 ⚠️ checkstyle 2m 55s root: The patch generated 18 new + 1199 unchanged - 0 fixed = 1217 total (was 1199)
+1 💚 mvnsite 3m 41s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 13m 2s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 1s the patch passed
+1 💚 findbugs 8m 19s the patch passed
_ Other Tests _
+1 💚 unit 8m 56s hadoop-common in the patch passed.
+1 💚 unit 2m 5s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 87m 52s hadoop-hdfs in the patch failed.
+1 💚 unit 7m 25s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
210m 41s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestFsck
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/Dockerfile
GITHUB PR #1725
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname Linux 0019221496f4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 11cd5b6
Default Java 1.8.0_222
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/branch-compile-root.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/testReport/
Max. process+thread count 3973 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last one looks good to me.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 11s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 prototool 0m 1s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 11s Maven dependency ordering for branch
+1 💚 mvninstall 22m 41s trunk passed
+1 💚 compile 19m 54s trunk passed
+1 💚 checkstyle 3m 23s trunk passed
+1 💚 mvnsite 4m 27s trunk passed
+1 💚 shadedclient 22m 59s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 4m 59s trunk passed
+0 🆗 spotbugs 1m 27s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 9m 48s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 3m 28s the patch passed
+1 💚 compile 17m 59s the patch passed
-1 ❌ cc 17m 59s root generated 2 new + 24 unchanged - 2 fixed = 26 total (was 26)
-1 ❌ javac 17m 59s root generated 1 new + 1870 unchanged - 0 fixed = 1871 total (was 1870)
-0 ⚠️ checkstyle 3m 34s root: The patch generated 18 new + 1201 unchanged - 0 fixed = 1219 total (was 1201)
+1 💚 mvnsite 4m 47s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 15m 2s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 5m 27s the patch passed
+1 💚 findbugs 10m 24s the patch passed
_ Other Tests _
+1 💚 unit 9m 33s hadoop-common in the patch passed.
+1 💚 unit 2m 8s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 109m 10s hadoop-hdfs in the patch failed.
-1 ❌ unit 7m 58s hadoop-hdfs-rbf in the patch failed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
278m 53s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
hadoop.hdfs.TestFileChecksum
hadoop.hdfs.TestFileChecksumCompositeCrc
hadoop.hdfs.server.namenode.TestRedudantBlocks
hadoop.hdfs.TestReconstructStripedFile
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/Dockerfile
GITHUB PR #1725
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc prototool
uname Linux 54d09aa0d437 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / ef59ffd
Default Java 1.8.0_232
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-compile-cc-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-compile-javac-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/testReport/
Max. process+thread count 3288 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1725/5/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@jojochuang jojochuang merged commit d7c4f8a into apache:trunk Jan 16, 2020
RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
@fanlinqian
Copy link

Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows.
for (; srcsIndex < srcs.length; srcsIndex++) {
String src = srcs[srcsIndex];
HdfsPartialListing listing;
try {
DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation);
if (dirListing == null) {
throw new FileNotFoundException("Path " + src + " does not exist");}
listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing()));
numEntries += listing.getPartialListing().size();
lastListing = dirListing;

            } catch (Exception e) {
                if (e instanceof AccessControlException) {
                    logAuditEvent(false, operationName, src);}
                listing = new HdfsPartialListing(srcsIndex,
                        new RemoteException(e.getClass().getCanonicalName(), e.getMessage()));
                lastListing = null;
                LOG.info("Exception listing src {}", src, e);}
            listings.put(srcsIndex, listing);

          //My modification
            (lastListing.getRemainingEntries()!=0)
            {
                 break;
            }

            if (indexStartAfter.length != 0)
            {
                indexStartAfter = new byte[0];
            }
            // Terminate if we've reached the maximum listing size
            if (numEntries >= dir.getListLimit()) {
                break;
            }
        }

The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000.
Looking forward to your reply

@steveloughran
Copy link
Contributor

@fanlinqian best to file an HDFS issue on the apache jira server.

jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
(cherry picked from commit d7c4f8a)
Change-Id: I7b058f0761a66558907401adf0340c86de3a5993
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants