Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16180.FsVolumeImpl.nextBlock should consider that the block meta file has been deleted #3315

Merged
merged 1 commit into from
Aug 24, 2021

Conversation

Neilxzn
Copy link
Contributor

@Neilxzn Neilxzn commented Aug 19, 2021

Description of PR

FsVolumeImpl.nextBlock should consider that the block meta file has been deleted
https://issues.apache.org/jira/browse/HDFS-16180
In my cluster, we found that when VolumeScanner run, sometime dn will throw some error log below

 
2021-08-19 08:00:11,549 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1020175758-nnip-1597745872895 blk_1142977964_69237147 URI file:/disk1/dfs/data/current/BP-1020175758- nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
2021-08-19 08:00:48,368 ERROR org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl: nextBlock(DS-060c8e4c-1ef6-49f5-91ef-91957356891a, BP-1020175758- nnip-1597745872895): I/O error
java.io.IOException: Meta file not found, blockFile=/disk1/dfs/data/current/BP-1020175758- nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetUtil.findMetaFile(FsDatasetUtil.java:101)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.nextBlock(FsVolumeImpl.java:809)
at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:528)
at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:628)
2021-08-19 08:00:48,368 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/disk1/dfs/data, DS-060c8e4c-1ef6-49f5-91ef-91957356891a): nextBlock error on org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@7febc6b4

When VolumeScanner scan block blk_1142977964, it has been deleted by datanode, scanner can not find the meta file of blk_1142977964, so it throw these error log.

Maybe we should handle FileNotFoundException during nextblock to reduce error log and nextblock retry times.

How was this patch tested?

no new test.

For code changes:

FsVolumeImpl.nextBlock hanlde FileNotFoundException

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 57s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 17s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 23s trunk passed
+1 💚 javadoc 0m 58s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 28s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 6s trunk passed
+1 💚 shadedclient 16m 13s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 13s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 8s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 53s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 13 unchanged - 0 fixed = 15 total (was 13)
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 javadoc 0m 47s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 17s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 7s the patch passed
+1 💚 shadedclient 16m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 247m 14s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
331m 31s
Reason Tests
Failed junit tests hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData
hadoop.hdfs.TestSetrepIncreasing
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3315/1/artifact/out/Dockerfile
GITHUB PR #3315
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 399651e1a04b 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5df9846
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3315/1/testReport/
Max. process+thread count 3394 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3315/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@Neilxzn
Copy link
Contributor Author

Neilxzn commented Aug 20, 2021

cc @jojochuang. Thank you for the review. I think we should catch FileNotFoundException about the meta file in FsVolumeImpl.nextBlock.

@Neilxzn
Copy link
Contributor Author

Neilxzn commented Aug 20, 2021

The checks from jenkins has failed, but I can't find any error about the patch. Please check it again

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose to reduce the ERROR log to WARN log?

@Neilxzn
Copy link
Contributor Author

Neilxzn commented Aug 20, 2021

Is the purpose to reduce the ERROR log to WARN log?

Yes. It can reduce the ERROR log.

@jojochuang
Copy link
Contributor

Ok make sense. A missing metadata shouldn't be ERROR because volume scanner or disk balancer can still continue.

@Neilxzn
Copy link
Contributor Author

Neilxzn commented Aug 23, 2021

Thanks @jojochuang for the review. Unit test failures seem unrelated. Is it ok to merge it?

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. I thought I submitted this review but it didn't go out.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated my review. +1

@jojochuang jojochuang merged commit 9084c72 into apache:trunk Aug 24, 2021
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…a file has been deleted. (apache#3315)

(cherry picked from commit 9084c72)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java

(cherry picked from commit 7432804)
(cherry picked from commit 8edc48f6ef9465dced3c96b32b133870215e96c3)
Signed-off-by: Arpit Agarwal <[email protected]>

Change-Id: I5ac63fd46c2166af67e7cc9702d2af9e5cb31d51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants