Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

Merged
merged 1 commit into from
Feb 8, 2023

Conversation

hfutatzhanghb
Copy link
Contributor

Now in method processCommandFromActor,  we have code like below:

writeLock();
try {
  if (actor == bpServiceToActive) {
    return processCommandFromActive(cmd, actor);
  } else {
    return processCommandFromStandby(cmd, actor);
  }
} finally {
  writeUnlock();
} 

if method processCommandFromActive costs much time, the write lock would not release.

It maybe block the updateActorStatesFromHeartbeat method in offerService,furthermore, it can cause the lastcontact of datanode very high, even dead when lastcontact beyond 600s.

bpos.updateActorStatesFromHeartbeat(
    this, resp.getNameNodeHaState());

here we can make write lock fine-grain in processCommandFromActor method to address this problem

@hfutatzhanghb
Copy link
Contributor Author

hi, @srinivasst,@ayushtkn ,@goiri, @tomscut . could you please help me review the code ? does this modification is right?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 27s trunk passed
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 23s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 7s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 25s trunk passed
+1 💚 shadedclient 25m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 18s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 18s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 53s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15)
+1 💚 mvnsite 1m 21s the patch passed
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 25s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 13s the patch passed
+1 💚 shadedclient 25m 32s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 299m 25s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
416m 26s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/artifact/out/Dockerfile
GITHUB PR #5330
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c83292b63478 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 75f1c4c
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/testReport/
Max. process+thread count 3646 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 2s codespell was not available.
+0 🆗 detsecrets 0m 2s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 1s trunk passed
+1 💚 compile 1m 28s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 21s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 6s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 26s trunk passed
+1 💚 shadedclient 25m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 12s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 12s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 50s the patch passed
+1 💚 mvnsite 1m 18s the patch passed
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 13s the patch passed
+1 💚 shadedclient 25m 23s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 291m 26s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
409m 7s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/artifact/out/Dockerfile
GITHUB PR #5330
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d867773c0aa6 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 81cc655
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/testReport/
Max. process+thread count 3286 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor

Thanks for the PR @hfutatzhanghb.
Curious if you have any thread dumps or logs collected (before coming to this conclusion) and would like to share reg the issue.

Comment on lines 690 to 696
if (actor == bpServiceToActive) {
return processCommandFromActive(cmd, actor);
} else {
return processCommandFromStandby(cmd, actor);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even before HDFS-6788, this part was at least covered by synchronized lock on the actor thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @virajjasani . thanks for your careful review. Surely, before HDFS-6788, this part was covered by synchronized lock.
but in method processCommandFromActive and processCommandFromStandby, it just use the parameter actor to print log info. The lock here is just trying to decide actor is whether bpServiceToActive or not and determine to execute either processCommandFromActive or processCommandFromStandby.

when occurs switchover between active namenode and standby namenode, the datanodes would be set to stale status, in stale status, we are not allowed to delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. So, even we execute the DatanodeCommand from the previous active namenode(now standby), it is okay.

@hfutatzhanghb
Copy link
Contributor Author

Thanks for the PR @hfutatzhanghb. Curious if you have any thread dumps or logs collected (before coming to this conclusion) and would like to share reg the issue.

hi, @virajjasani . Thanks for your replying. Some logs are like below:
First, we add some logs in BPServiceActor.CommandProcessingThread#processCommand:

image

and we grep some logs as below:

image

we can draw a conclusion that the execution time of processCommandFromActor method is very high, even more than 119 seconds. And in processCommandFromActor method, it uses the write lock which is the same one as updateActorStatesFromHeartbeat method used. The updateActorStatesFromHeartbeat method is in offerService method, so this could hang the hearbeat thread.

image

In our production cluster, we have use this feature, it works well.

@Hexiaoqiao
Copy link
Contributor

@hfutatzhanghb Thanks for your works here. Totally agree issue it caused, however I am concerned about if it is safe to move process command out of write lock. do you have any lock analysis here? Thanks again.

@hfutatzhanghb
Copy link
Contributor Author

@Hexiaoqiao , thank for your replying~, I will try to draw some pictures to describe it soonly.

@Hexiaoqiao
Copy link
Contributor

@Hexiaoqiao , thank for your replying~, I will try to draw some pictures to describe it soonly.

Great. It will be more helpful to push this improvement forward.
cc @zhangshuyan0 would you mind to take another review?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 6s trunk passed
+1 💚 compile 1m 29s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 5s trunk passed
+1 💚 mvnsite 1m 33s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 28s trunk passed
+1 💚 shadedclient 26m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 30s the patch passed
+1 💚 compile 1m 24s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 24s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 49s the patch passed
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 29s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 37s the patch passed
+1 💚 shadedclient 27m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 288m 25s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
411m 51s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/artifact/out/Dockerfile
GITHUB PR #5330
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 40a8c60b12b3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3d74257
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/testReport/
Max. process+thread count 3711 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor

Hi, @virajjasani . thanks for your careful review. Surely, before HDFS-6788, this part was covered by synchronized lock. but in method processCommandFromActive and processCommandFromStandby, it just use the parameter actor to print log info. The lock here is just trying to decide actor is whether bpServiceToActive or not and determine to execute either processCommandFromActive or processCommandFromStandby.

when occurs switchover between active namenode and standby namenode, the datanodes would be set to stale status, in stale status, we are not allowed to delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. So, even we execute the DatanodeCommand from the previous active namenode(now standby), it is okay.

Thank you @hfutatzhanghb.
I was just going to state that we don't need write lock to verify whether the current actor is the one connected to active namenode, read lock would be sufficient. But looks like you already made the change.

I did a quick glance and we don't hit this log line in our clusters so far but this PR has interesting fix. I will check this further for any more resource contention.

@virajjasani
Copy link
Contributor

In the meantime, I have two nits if you would like to consider:

  1. For processCommandFromActive and processCommandFromStandby, it would be good to pass only actor.getNNSocketAddress() instead of actor, because it's the namenode address that is logged for BlockRecoveryWorker logs and others in standby.
  2. Would be great to change log level to WARN for this:
        if (processCommandsMs > dnConf.getProcessCommandsThresholdMs()) {
          LOG.info("Took {} ms to process {} commands from NN",
              processCommandsMs, cmds.length);
        }

With WARN level, it will likely come up front while debugging any slowness issues.

@zhangshuyan0
Copy link
Contributor

It is great to prevent the heartbeat from being affected by command processing. I checked that processCommandFromXXX() doesn't access any members inside BPOfferService that can be changed.
The only thing to note is that in the original code, after the switchover, the new ANN can guarantee that the DN will not execute the commands from the old ANN as long as it receives two heartbeats from the DN. After the function is placed outside the lock, this guarantee no longer exists. However, as @hfutatzhanghb said, NN will set the DataNode to stale after the switchover, which means that NN does not rely on this guarantee. So, I think this patch is safe.

@hfutatzhanghb
Copy link
Contributor Author

LOG.info("Took {} ms to process {} commands from NN"

@virajjasani , i totally agree with your opinions. i will modify the code laterly.

@hfutatzhanghb
Copy link
Contributor Author

image

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 from my side. Let's wait what will Yetus say.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 24s trunk passed
+1 💚 compile 1m 24s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 1m 19s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 6s trunk passed
+1 💚 mvnsite 1m 30s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 34s trunk passed
+1 💚 shadedclient 25m 43s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 26s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 1m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 52s the patch passed
+1 💚 mvnsite 1m 18s the patch passed
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 1m 25s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 15s the patch passed
+1 💚 shadedclient 25m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 308m 46s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
425m 51s
Reason Tests
Failed junit tests hadoop.hdfs.TestLeaseRecovery2
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/artifact/out/Dockerfile
GITHUB PR #5330
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 77e8052a24cf 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a1cc014
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/testReport/
Max. process+thread count 3141 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding), there is no resource contention within processCommandFromActive and processCommandFromStandby as they don't access Namesapce info, BP actor instance that represent active namenode, registration info etc. We should be safe here.

Thanks @hfutatzhanghb for addressing previous comment. Would be good to backport to 3.3

@Hexiaoqiao Hexiaoqiao changed the title HDFS-16898. Make write lock fine-grain in method processCommandFromActor HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat Feb 8, 2023
@Hexiaoqiao Hexiaoqiao merged commit eb04ecd into apache:trunk Feb 8, 2023
@Hexiaoqiao
Copy link
Contributor

The failed unit test TestLeaseRecovery2 is not related to this PR, which is traced by HDFS-16853.
Committed to trunk. (will backport branch-3.3 for a while, please let me know if we need to backport other active branches.)
Thanks @hfutatzhanghb for your good works. Thanks @virajjasani @zhangshuyan0 for your reviews.

Hexiaoqiao added a commit that referenced this pull request Feb 8, 2023
…ndFromActor (#5330). Contributed by ZhangHB."

This reverts commit eb04ecd.
Hexiaoqiao added a commit that referenced this pull request Feb 8, 2023
…ndFromActor (#5330). Contributed by ZhangHB."

This reverts commit eb04ecd.
Hexiaoqiao added a commit that referenced this pull request Feb 8, 2023
… to reduce impact on heartbeat (#5330). Contributed by ZhangHB.

Reviewed-by: zhangshuyan <[email protected]>
Reviewed-by: Viraj Jasani <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
@Hexiaoqiao
Copy link
Contributor

The git message is not correct. Try to revert and committed again.

@Hexiaoqiao
Copy link
Contributor

@hfutatzhanghb This PR could not cherrypick to branch-3.3 smoothly. Would you mind to submit another PR for branch-3.3?

@hfutatzhanghb
Copy link
Contributor Author

@hfutatzhanghb This PR could not cherrypick to branch-3.3 smoothly. Would you mind to submit another PR for branch-3.3?

@Hexiaoqiao , done~, please have a look. thanks.

@hfutatzhanghb hfutatzhanghb deleted the trunk-processCMD branch May 22, 2023 03:13
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
…tor (apache#5330). Contributed by ZhangHB.

Reviewed-by: zhangshuyan <[email protected]>
Reviewed-by: Viraj Jasani <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
…ndFromActor (apache#5330). Contributed by ZhangHB."

This reverts commit eb04ecd.
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
… to reduce impact on heartbeat (apache#5330). Contributed by ZhangHB.

Reviewed-by: zhangshuyan <[email protected]>
Reviewed-by: Viraj Jasani <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
@hfutatzhanghb
Copy link
Contributor Author

+1 (non-binding), there is no resource contention within processCommandFromActive and processCommandFromStandby as they don't access Namesapce info, BP actor instance that represent active namenode, registration info etc. We should be safe here.

Thanks @hfutatzhanghb for addressing previous comment. Would be good to backport to 3.3

Hi @Hexiaoqiao , sir, the original discuss is here~ please take a look, thanks a lot for your reviewing.

symious pushed a commit to symious/hadoop that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants