HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

hfutatzhanghb · 2023-01-29T13:16:53Z

Now in method processCommandFromActor, we have code like below:

writeLock();
try {
  if (actor == bpServiceToActive) {
    return processCommandFromActive(cmd, actor);
  } else {
    return processCommandFromStandby(cmd, actor);
  }
} finally {
  writeUnlock();
}

if method processCommandFromActive costs much time, the write lock would not release.

It maybe block the updateActorStatesFromHeartbeat method in offerService，furthermore, it can cause the lastcontact of datanode very high, even dead when lastcontact beyond 600s.

bpos.updateActorStatesFromHeartbeat(
    this, resp.getNameNodeHaState());

here we can make write lock fine-grain in processCommandFromActor method to address this problem

hfutatzhanghb · 2023-01-29T13:22:55Z

hi, @srinivasst,@ayushtkn ,@goiri, @tomscut . could you please help me review the code ? does this modification is right?

hadoop-yetus · 2023-01-29T20:14:41Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 36s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	43m 27s		trunk passed
+1 💚	compile	1m 26s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	compile	1m 23s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	checkstyle	1m 7s		trunk passed
+1 💚	mvnsite	1m 31s		trunk passed
+1 💚	javadoc	1m 7s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 33s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 25s		trunk passed
+1 💚	shadedclient	25m 28s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 23s		the patch passed
+1 💚	compile	1m 18s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javac	1m 18s		the patch passed
+1 💚	compile	1m 13s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	javac	1m 13s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 53s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15)
+1 💚	mvnsite	1m 21s		the patch passed
+1 💚	javadoc	0m 49s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 25s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 13s		the patch passed
+1 💚	shadedclient	25m 32s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	299m 25s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 49s		The patch does not generate ASF License warnings.
		416m 26s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestLeaseRecovery2

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/artifact/out/Dockerfile
GITHUB PR	#5330
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux c83292b63478 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `75f1c4c`
Default Java	Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/testReport/
Max. process+thread count	3646 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-01-30T16:46:24Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 2s		codespell was not available.
+0 🆗	detsecrets	0m 2s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	44m 1s		trunk passed
+1 💚	compile	1m 28s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	compile	1m 21s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	checkstyle	1m 6s		trunk passed
+1 💚	mvnsite	1m 31s		trunk passed
+1 💚	javadoc	1m 8s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 32s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 26s		trunk passed
+1 💚	shadedclient	25m 40s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 24s		the patch passed
+1 💚	compile	1m 15s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javac	1m 15s		the patch passed
+1 💚	compile	1m 12s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	javac	1m 12s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 50s		the patch passed
+1 💚	mvnsite	1m 18s		the patch passed
+1 💚	javadoc	0m 50s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 23s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 13s		the patch passed
+1 💚	shadedclient	25m 23s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	291m 26s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 49s		The patch does not generate ASF License warnings.
		409m 7s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestLeaseRecovery2

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/artifact/out/Dockerfile
GITHUB PR	#5330
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux d867773c0aa6 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `81cc655`
Default Java	Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/testReport/
Max. process+thread count	3286 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

virajjasani · 2023-02-03T00:30:00Z

Thanks for the PR @hfutatzhanghb.
Curious if you have any thread dumps or logs collected (before coming to this conclusion) and would like to share reg the issue.

virajjasani · 2023-02-03T00:32:59Z

...project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java

+    if (actor == bpServiceToActive) {
+      return processCommandFromActive(cmd, actor);
+    } else {
+      return processCommandFromStandby(cmd, actor);
+    }


Even before HDFS-6788, this part was at least covered by synchronized lock on the actor thread

Hi, @virajjasani . thanks for your careful review. Surely, before HDFS-6788， this part was covered by synchronized lock.
but in method processCommandFromActive and processCommandFromStandby, it just use the parameter actor to print log info. The lock here is just trying to decide actor is whether bpServiceToActive or not and determine to execute either processCommandFromActive or processCommandFromStandby.

when occurs switchover between active namenode and standby namenode, the datanodes would be set to stale status, in stale status, we are not allowed to delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. So, even we execute the DatanodeCommand from the previous active namenode(now standby), it is okay.

hfutatzhanghb · 2023-02-03T02:50:46Z

Thanks for the PR @hfutatzhanghb. Curious if you have any thread dumps or logs collected (before coming to this conclusion) and would like to share reg the issue.

hi, @virajjasani . Thanks for your replying. Some logs are like below：
First, we add some logs in BPServiceActor.CommandProcessingThread#processCommand:

and we grep some logs as below:

we can draw a conclusion that the execution time of processCommandFromActor method is very high, even more than 119 seconds. And in processCommandFromActor method, it uses the write lock which is the same one as updateActorStatesFromHeartbeat method used. The updateActorStatesFromHeartbeat method is in offerService method, so this could hang the hearbeat thread.

In our production cluster, we have use this feature, it works well.

Hexiaoqiao · 2023-02-03T07:31:45Z

@hfutatzhanghb Thanks for your works here. Totally agree issue it caused, however I am concerned about if it is safe to move process command out of write lock. do you have any lock analysis here? Thanks again.

hfutatzhanghb · 2023-02-03T08:05:02Z

@Hexiaoqiao , thank for your replying~, I will try to draw some pictures to describe it soonly.

Hexiaoqiao · 2023-02-03T08:11:55Z

@Hexiaoqiao , thank for your replying~, I will try to draw some pictures to describe it soonly.

Great. It will be more helpful to push this improvement forward.
cc @zhangshuyan0 would you mind to take another review?

hadoop-yetus · 2023-02-03T13:56:20Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 42s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 6s		trunk passed
+1 💚	compile	1m 29s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	compile	1m 20s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	checkstyle	1m 5s		trunk passed
+1 💚	mvnsite	1m 33s		trunk passed
+1 💚	javadoc	1m 9s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 34s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 28s		trunk passed
+1 💚	shadedclient	26m 46s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 30s		the patch passed
+1 💚	compile	1m 24s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javac	1m 24s		the patch passed
+1 💚	compile	1m 13s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	javac	1m 13s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 49s		the patch passed
+1 💚	mvnsite	1m 25s		the patch passed
+1 💚	javadoc	0m 54s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 29s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 37s		the patch passed
+1 💚	shadedclient	27m 4s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	288m 25s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 48s		The patch does not generate ASF License warnings.
		411m 51s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestLeaseRecovery2

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/artifact/out/Dockerfile
GITHUB PR	#5330
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 40a8c60b12b3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `3d74257`
Default Java	Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/testReport/
Max. process+thread count	3711 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

virajjasani · 2023-02-03T17:50:51Z

Hi, @virajjasani . thanks for your careful review. Surely, before HDFS-6788， this part was covered by synchronized lock. but in method processCommandFromActive and processCommandFromStandby, it just use the parameter actor to print log info. The lock here is just trying to decide actor is whether bpServiceToActive or not and determine to execute either processCommandFromActive or processCommandFromStandby.

when occurs switchover between active namenode and standby namenode, the datanodes would be set to stale status, in stale status, we are not allowed to delete blocks directly, we put those blocks into postponedMisreplicatedBlocks. So, even we execute the DatanodeCommand from the previous active namenode(now standby), it is okay.

Thank you @hfutatzhanghb.
I was just going to state that we don't need write lock to verify whether the current actor is the one connected to active namenode, read lock would be sufficient. But looks like you already made the change.

I did a quick glance and we don't hit this log line in our clusters so far but this PR has interesting fix. I will check this further for any more resource contention.

virajjasani · 2023-02-03T17:53:11Z

In the meantime, I have two nits if you would like to consider:

For processCommandFromActive and processCommandFromStandby, it would be good to pass only actor.getNNSocketAddress() instead of actor, because it's the namenode address that is logged for BlockRecoveryWorker logs and others in standby.
Would be great to change log level to WARN for this:

        if (processCommandsMs > dnConf.getProcessCommandsThresholdMs()) {
          LOG.info("Took {} ms to process {} commands from NN",
              processCommandsMs, cmds.length);
        }

With WARN level, it will likely come up front while debugging any slowness issues.

zhangshuyan0 · 2023-02-06T03:06:52Z

It is great to prevent the heartbeat from being affected by command processing. I checked that processCommandFromXXX() doesn't access any members inside BPOfferService that can be changed.
The only thing to note is that in the original code, after the switchover, the new ANN can guarantee that the DN will not execute the commands from the old ANN as long as it receives two heartbeats from the DN. After the function is placed outside the lock, this guarantee no longer exists. However, as @hfutatzhanghb said, NN will set the DataNode to stale after the switchover, which means that NN does not rely on this guarantee. So, I think this patch is safe.

hfutatzhanghb · 2023-02-06T06:37:30Z

LOG.info("Took {} ms to process {} commands from NN"

@virajjasani , i totally agree with your opinions. i will modify the code laterly.

hfutatzhanghb · 2023-02-07T08:30:39Z

Hexiaoqiao

LGTM. +1 from my side. Let's wait what will Yetus say.

hadoop-yetus · 2023-02-07T14:53:13Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 37s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	43m 24s		trunk passed
+1 💚	compile	1m 24s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	compile	1m 19s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	checkstyle	1m 6s		trunk passed
+1 💚	mvnsite	1m 30s		trunk passed
+1 💚	javadoc	1m 8s		trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 34s		trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 34s		trunk passed
+1 💚	shadedclient	25m 43s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	1m 26s		the patch passed
+1 💚	compile	1m 17s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javac	1m 17s		the patch passed
+1 💚	compile	1m 15s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	javac	1m 15s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 52s		the patch passed
+1 💚	mvnsite	1m 18s		the patch passed
+1 💚	javadoc	0m 50s		the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚	javadoc	1m 25s		the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚	spotbugs	3m 15s		the patch passed
+1 💚	shadedclient	25m 46s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	308m 46s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 48s		The patch does not generate ASF License warnings.
		425m 51s

Reason	Tests
Failed junit tests	hadoop.hdfs.TestLeaseRecovery2

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/artifact/out/Dockerfile
GITHUB PR	#5330
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 77e8052a24cf 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `a1cc014`
Default Java	Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/testReport/
Max. process+thread count	3141 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5330/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

virajjasani

+1 (non-binding), there is no resource contention within processCommandFromActive and processCommandFromStandby as they don't access Namesapce info, BP actor instance that represent active namenode, registration info etc. We should be safe here.

Thanks @hfutatzhanghb for addressing previous comment. Would be good to backport to 3.3

Hexiaoqiao · 2023-02-08T02:36:12Z

The failed unit test TestLeaseRecovery2 is not related to this PR, which is traced by HDFS-16853.
Committed to trunk. (will backport branch-3.3 for a while, please let me know if we need to backport other active branches.)
Thanks @hfutatzhanghb for your good works. Thanks @virajjasani @zhangshuyan0 for your reviews.

…ndFromActor (#5330). Contributed by ZhangHB." This reverts commit eb04ecd.

… to reduce impact on heartbeat (#5330). Contributed by ZhangHB. Reviewed-by: zhangshuyan <[email protected]> Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>

Hexiaoqiao · 2023-02-08T03:21:26Z

The git message is not correct. Try to revert and committed again.

Hexiaoqiao · 2023-02-09T02:18:56Z

@hfutatzhanghb This PR could not cherrypick to branch-3.3 smoothly. Would you mind to submit another PR for branch-3.3?

hfutatzhanghb · 2023-02-16T03:38:17Z

@hfutatzhanghb This PR could not cherrypick to branch-3.3 smoothly. Would you mind to submit another PR for branch-3.3?

@Hexiaoqiao , done~, please have a look. thanks.

…tor (apache#5330). Contributed by ZhangHB. Reviewed-by: zhangshuyan <[email protected]> Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>

…ndFromActor (apache#5330). Contributed by ZhangHB." This reverts commit eb04ecd.

… to reduce impact on heartbeat (apache#5330). Contributed by ZhangHB. Reviewed-by: zhangshuyan <[email protected]> Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>

hfutatzhanghb · 2023-06-12T05:11:01Z

+1 (non-binding), there is no resource contention within processCommandFromActive and processCommandFromStandby as they don't access Namesapce info, BP actor instance that represent active namenode, registration info etc. We should be safe here.

Thanks @hfutatzhanghb for addressing previous comment. Would be good to backport to 3.3

Hi @Hexiaoqiao , sir, the original discuss is here~ please take a look, thanks a lot for your reviewing.

…r of DataNode to reduce impact on heartbeat (apache#5330)

hfutatzhanghb force-pushed the trunk-processCMD branch from 75f1c4c to 81cc655 Compare January 30, 2023 09:55

virajjasani reviewed Feb 3, 2023

View reviewed changes

hfutatzhanghb force-pushed the trunk-processCMD branch from 81cc655 to 3d74257 Compare February 3, 2023 07:03

zhangshuyan0 approved these changes Feb 6, 2023

View reviewed changes

HDFS-16898. Make write lock fine-grain in method processCommandFromActor

a1cc014

hfutatzhanghb force-pushed the trunk-processCMD branch from 3d74257 to a1cc014 Compare February 7, 2023 07:46

hfutatzhanghb requested a review from virajjasani February 7, 2023 07:46

Hexiaoqiao approved these changes Feb 7, 2023

View reviewed changes

virajjasani approved these changes Feb 7, 2023

View reviewed changes

Hexiaoqiao changed the title ~~HDFS-16898. Make write lock fine-grain in method processCommandFromActor~~ HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat Feb 8, 2023

Hexiaoqiao merged commit eb04ecd into apache:trunk Feb 8, 2023

Hexiaoqiao added a commit that referenced this pull request Feb 8, 2023

Revert "HDFS-16898. Make write lock fine-grain in method processComma…

101050f

…ndFromActor (#5330). Contributed by ZhangHB." This reverts commit eb04ecd.

Hexiaoqiao added a commit that referenced this pull request Feb 8, 2023

Revert "HDFS-16898. Make write lock fine-grain in method processComma…

7e91921

…ndFromActor (#5330). Contributed by ZhangHB." This reverts commit eb04ecd.

hfutatzhanghb mentioned this pull request Feb 16, 2023

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat. #5408

Merged

hfutatzhanghb deleted the trunk-processCMD branch May 22, 2023 03:13

ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023

Revert "HDFS-16898. Make write lock fine-grain in method processComma…

3ba42d6

…ndFromActor (apache#5330). Contributed by ZhangHB." This reverts commit eb04ecd.

symious pushed a commit to symious/hadoop that referenced this pull request Nov 21, 2024

SPDI-112945. HDFS-16898. Remove write lock for processCommandFromActo…

b7cca7f

…r of DataNode to reduce impact on heartbeat (apache#5330)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

hfutatzhanghb commented Jan 29, 2023

hfutatzhanghb commented Jan 29, 2023

hadoop-yetus commented Jan 29, 2023

hadoop-yetus commented Jan 30, 2023

virajjasani commented Feb 3, 2023

virajjasani Feb 3, 2023

hfutatzhanghb Feb 3, 2023

hfutatzhanghb commented Feb 3, 2023

Hexiaoqiao commented Feb 3, 2023

hfutatzhanghb commented Feb 3, 2023

Hexiaoqiao commented Feb 3, 2023

hadoop-yetus commented Feb 3, 2023

virajjasani commented Feb 3, 2023

virajjasani commented Feb 3, 2023

zhangshuyan0 commented Feb 6, 2023

hfutatzhanghb commented Feb 6, 2023

hfutatzhanghb commented Feb 7, 2023

Hexiaoqiao left a comment

hadoop-yetus commented Feb 7, 2023

virajjasani left a comment

Hexiaoqiao commented Feb 8, 2023

Hexiaoqiao commented Feb 8, 2023

Hexiaoqiao commented Feb 9, 2023

hfutatzhanghb commented Feb 16, 2023

hfutatzhanghb commented Jun 12, 2023

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

HDFS-16898. Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat #5330

Conversation

hfutatzhanghb commented Jan 29, 2023

hfutatzhanghb commented Jan 29, 2023

hadoop-yetus commented Jan 29, 2023

hadoop-yetus commented Jan 30, 2023

virajjasani commented Feb 3, 2023

virajjasani Feb 3, 2023

Choose a reason for hiding this comment

hfutatzhanghb Feb 3, 2023

Choose a reason for hiding this comment

hfutatzhanghb commented Feb 3, 2023

Hexiaoqiao commented Feb 3, 2023

hfutatzhanghb commented Feb 3, 2023

Hexiaoqiao commented Feb 3, 2023

hadoop-yetus commented Feb 3, 2023

virajjasani commented Feb 3, 2023

virajjasani commented Feb 3, 2023

zhangshuyan0 commented Feb 6, 2023

hfutatzhanghb commented Feb 6, 2023

hfutatzhanghb commented Feb 7, 2023

Hexiaoqiao left a comment

Choose a reason for hiding this comment

hadoop-yetus commented Feb 7, 2023

virajjasani left a comment

Choose a reason for hiding this comment

Hexiaoqiao commented Feb 8, 2023

Hexiaoqiao commented Feb 8, 2023

Hexiaoqiao commented Feb 9, 2023

hfutatzhanghb commented Feb 16, 2023

hfutatzhanghb commented Jun 12, 2023