Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16386.Reduce DataNode load when FsDatasetAsyncDiskService is working. #3806

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

jianghuazhu
Copy link
Contributor

Description of PR

When FsDatasetAsyncDiskService is working, if the DataNode has a lot of disks, this will cause a higher load on the DataNode, for example, a lot of memory is used.
This phenomenon will affect the stability of the DataNode.
Details: HDFS-16386

How was this patch tested?

For testing, there is little pressure.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes LGTM, pending a good CI run.

@jianghuazhu
Copy link
Contributor Author

Thank you @sodonnel for your comments and reviews.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
-1 ❌ mvninstall 33m 54s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 17s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 58s trunk passed
+1 💚 mvnsite 1m 25s trunk passed
+1 💚 javadoc 1m 0s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 12s trunk passed
+1 💚 shadedclient 22m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 20s the patch passed
+1 💚 compile 1m 19s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 19s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 52s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 203 unchanged - 0 fixed = 204 total (was 203)
+1 💚 mvnsite 1m 19s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 52s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 26s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 17s the patch passed
+1 💚 shadedclient 22m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 226m 20s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
327m 25s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/1/artifact/out/Dockerfile
GITHUB PR #3806
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 919ee09c35c1 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 82937e661aa457b27af1131ec7d2e56d23efff4b
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/1/testReport/
Max. process+thread count 3019 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@sodonnel
Copy link
Contributor

There seems to be one checkstyle violation - could you fix that please and then I think we are good to commit this.

@jianghuazhu
Copy link
Contributor Author

Thank you @sodonnel.
I'll update it later.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 22s trunk passed
+1 💚 compile 1m 24s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 28s trunk passed
+1 💚 javadoc 1m 2s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 11s trunk passed
+1 💚 shadedclient 22m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 16s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 16s the patch passed
+1 💚 compile 1m 10s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 53s the patch passed
+1 💚 mvnsite 1m 18s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 51s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 24s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 14s the patch passed
+1 💚 shadedclient 22m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 223m 49s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
321m 59s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/2/artifact/out/Dockerfile
GITHUB PR #3806
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux fec30cd16ae9 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fbb76d29e39dfb1808a8b390fe819a0580999473
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/2/testReport/
Max. process+thread count 3095 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@jianghuazhu
Copy link
Contributor Author

jianghuazhu commented Dec 17, 2021

Could you help review this pr again @sodonnel .
Thank you very much.

<name>dfs.datanode.fsdatasetasyncdisk.max.threads.per.volume</name>
<value>4</value>
<description>
The maximum number of threads per volume used to delete blocks on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be used for other asynchronous operations in the future, not just deleting blocks, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomscut for your comments and review.
I agree to your suggestion. But now FsDatasetAsyncDiskService mainly aimed at removing block.
Do you have a better description of this property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be changed to this: delete blocks -> process async disk operations.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 22s trunk passed
+1 💚 compile 1m 25s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 18s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 3s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 javadoc 1m 3s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 9s trunk passed
+1 💚 shadedclient 22m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 14s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 9s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 53s the patch passed
+1 💚 mvnsite 1m 15s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 11s the patch passed
+1 💚 shadedclient 21m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 226m 54s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
324m 49s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/3/artifact/out/Dockerfile
GITHUB PR #3806
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 659443467b11 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / dc62588
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/3/testReport/
Max. process+thread count 3119 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3806/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@jianghuazhu
Copy link
Contributor Author

Could you help review this pr, @jojochuang @ferhui.
Thank you very much.

@jojochuang
Copy link
Contributor

LGTM. For what is worth, we don't need two committers to approve a PR :) Stephen alone is a gold standard.

@sodonnel sodonnel merged commit 746b328 into apache:trunk Dec 20, 2021
asfgit pushed a commit that referenced this pull request Dec 20, 2021
asfgit pushed a commit that referenced this pull request Dec 20, 2021
@brahmareddybattula
Copy link
Contributor

thanks for working on working.. can you guys commit to branch-3.2.3 also..?

@jianghuazhu
Copy link
Contributor Author

Thank you for your reminder and help, @jojochuang .

@jianghuazhu
Copy link
Contributor Author

Thank you for your attention and comments, @brahmareddybattula .
I will continue to work. If necessary, I will create a new jira.

asfgit pushed a commit that referenced this pull request Dec 21, 2021
…rking. (#3806)

(cherry picked from commit 746b328)
(cherry picked from commit b68084a)
@sodonnel
Copy link
Contributor

This will not go onto branch-3.2 cleanly due to HADOOP-17126 (new Preconditions class), however it is a trivial change in the import statement, so I have went ahead and made it and committed to 3.2.3 too.

ashutoshcipher pushed a commit to ashutoshcipher/hadoop that referenced this pull request Dec 22, 2021
sunchao pushed a commit that referenced this pull request Jan 4, 2022
@ZanderXu
Copy link
Contributor

@jianghuazhu I'm so sorry to discuss this issue again.
Setting smaller MAX THREAD can reduce memory usage? HDFS-16386

@ZanderXu
Copy link
Contributor

ThreadPoolExecutor executor = new ThreadPoolExecutor( CORE_THREADS_PER_VOLUME, maxNumThreadsPerVolume, THREADS_KEEP_ALIVE_SECONDS, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), threadFactory);

The ThreadPoolExecutor used the unbounded LinkedBlockingQueue, so the actual thread number should be less than or equal to the corePoolSize. When NN needs one DN to delete a large number of blocks, this DN will create a large number of ReplicaFileDeleteTask, and stored all ReplicaFileDeleteTasks in the LinkedBlockingQueue of the ThreadPoolExecutor, resulting in increased memory or even OOM.

Feel free to correct me if there are mistakes.

@jianghuazhu
Copy link
Contributor Author

Thanks @ZanderXu for following.
Here are some explanations:

  1. The main job of FsDatasetAsyncDiskService is to delete the replica files synchronously or asynchronously. The copy files to be deleted here are all files on the local DataNode, and the number is limited. Although the thread pool uses an unbounded queue, it will not be stored all the time, because it will always be consumed. And these copies have been loaded into memory when the DataNode is working, so the probability of OOM here is very low.
  2. If the copy is deleted asynchronously, the thread pool work will be started. Before this, each disk will correspond to a thread pool, and the thread pool will have at most 4 fixed threads to work, and this condition is fixed. In our cluster, DataNodes have different numbers of disks, 12 disks, 36 disks, and 60 disks will exist. Take DataNode with 36 disks or 60 disks as an example, then during peak hours, DataNode needs to start a lot of thread work. Adjusting the number of threads flexibly will reduce the workload of the DataNode.

@ZanderXu
Copy link
Contributor

Thanks @jianghuazhu for you comment.

  • I have a question, if the queue is unbounded, will the number of active thread in the ThreadPool be greater than the number of core thread?
  • I think that we need to support the ability to dynamically adjust the number of core threads, so that we can adjust it in time for different load to archive the best result.

@jianghuazhu
Copy link
Contributor Author

@ZanderXu , nice to communicate with you.
I suggest that the number of active threads here needs to be set reasonably, according to the load capacity of the cluster.

@ZanderXu
Copy link
Contributor

Thanks, and i will create a new PR to do it.

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants