-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16477. [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS #4009
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @jojochuang @Hexiaoqiao , PTAL. Thanks. |
Hi @tasanuma @ayushtkn @virajith @virajjasani , could you please also take a look at this? Thanks. |
💔 -1 overall
This message was automatically generated. |
Unit tests failed because of OOM, regardless of changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have this metric is Router as well, as we do have for other FSN related metrics. The value there would be the sum of values from all other namespaces.
...fs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/StoragePolicySatisfyManager.java
Outdated
Show resolved
Hide resolved
...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
Outdated
Show resolved
Hide resolved
Thank you @ayushtkn very much for the review. I will update it ASAP. |
💔 -1 overall
This message was automatically generated. |
Hi @tasanuma @virajjasani , could you please also take a look. Thanks. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi @ayushtkn , could you please take a look at this? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a very quick look, mostly everything looks good to me. Minor comments.
I will try to go through this once more when I have some more time.
...bf/src/main/java/org/apache/hadoop/hdfs/server/federation/resolver/NamenodeStatusReport.java
Show resolved
Hide resolved
dfs.satisfyStoragePolicy(new Path(FILE)); | ||
// Assert metrics. | ||
assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); | ||
// Wait till namenode notified about the block location details | ||
DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, | ||
dfs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a race condition possible, if the storage policy gets satisfied before we get the pending sps paths? in that case the assertion shall fail I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a race condition possible, if the storage policy gets satisfied before we get the pending sps paths? in that case the assertion shall fail I suppose?
Thanks for your review. The method doTestWhenStoragePolicySetToCOLD is only called in one place. I think there should be no race condition? What do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Wait till namenode notified about the block location details
DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000,
dfs);
Here you are waiting for SPS to process the path and move the blocks to the correct place, once this is done, whether getPendingSPSPaths
will still return 1? I suppose no, right? the path got processed so the count should reduce to 0.
So, my take is you don't have a control on DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs);
, if by chance SPS process that path before your assertion then the test will fail.
I haven't gone through the code, but that is what I felt in my initial pass, if it doesn't work this way do lemme know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs);
Here you are waiting for SPS to process the path and move the blocks to the correct place, once this is done, whether
getPendingSPSPaths
will still return 1? I suppose no, right? the path got processed so the count should reduce to 0.So, my take is you don't have a control on
DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs);
, if by chance SPS process that path before your assertion then the test will fail.I haven't gone through the code, but that is what I felt in my initial pass, if it doesn't work this way do lemme know
Thank you @ayushtkn for your detailed explanation. I made a mistake and you are right. I'll add a new unit test to assert metrics without running SPS.
💔 -1 overall
This message was automatically generated. |
Hi @ayushtkn , the failed unit test is unrelated to the change. Please have a look again. Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor stuff with the test else the prod changes LGTM.
Give a check to the Jenkins warning. The cc warnings if they aren't related.
TestRouterDistCpProcedure seems to be failing with timeout. Doesn't look related. If you have bandwidth can track & increase the timeout for it in a separate jira
// Assert metrics. | ||
assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is still there, you added a new test as well to get rid of this only right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments. I'm sorry I left that out.
@@ -432,7 +443,6 @@ public void testTraverseWhenParentDeleted() throws Exception { | |||
public void testTraverseWhenRootParentDeleted() throws Exception { | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
avoid this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let me roll back it.
Thank you very much, I will open a new JIRA to do this. |
💔 -1 overall
This message was automatically generated. |
Hi @ayushtkn , the cc warnings seem unrelated to the changes. |
I have retriggered the build. Lets wait. If all clean will push this post that |
Thanks @ayushtkn for your help. |
💔 -1 overall
This message was automatically generated. |
The cc warnings seem related to |
…of paths to be processed by SPS
💔 -1 overall
This message was automatically generated. |
Hi @ayushtkn , the jenkins log seems compiled failed. But I can compiled successfully locally.
And the cc warings seem related to mapreduce-client, but not the changes. Do we need to retrigger the build? Thanks. |
Hi @aajisaka , do you know the cause of these cc warnings? This PR does not change the c++ code associated with mapreduce-client. Could you please take a look? Thank you very much. /results-compile-cc-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt |
💔 -1 overall
This message was automatically generated. |
Anyway it says If by chance it isn't will revert |
Thanks @ayushtkn . |
…of paths to be processed by SPS (apache#4009). Contributed by tomscut. Signed-off-by: Ayush Saxena <[email protected]>
JIRA: HDFS-16477.
Currently we have no idea how many paths are waiting to be processed when using the SPS feature. We should add metric
PendingSPSPaths
for getting the number of paths to be processed by SPS in NameNode.