Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

Merged
merged 2 commits into from
Mar 4, 2020
Merged

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

merged 2 commits into from
Mar 4, 2020

Conversation

invincible-akshay
Copy link
Contributor

Proposed change is in 2nd last sentence of the affected paragraph.
Considering the statement segmented in 3 parts by the commas:

  1. the first part talks about "one thirds of replicas";
  2. the second part talks about "two thirds of replicas"
  3. the third part talking about "the other third" is leading to ambiguity when one thirds and two thirds have already accounted for the whole.
    Possible solution is to either get rid of the third part or rephrase entire sentence to capture the overall essence of the sentence.
    Please suggest.

NOTICE

Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute

Considering the statement segmented in 3 parts by the commas: 
1. the first part talks about "one thirds of replicas"; 
2. the second part talks about "two thirds of replicas"  
3. the third part talking about "the other third" is leading to ambiguity when one thirds and two thirds have already accounted for the whole.
Possible solution is to either get rid of the third part or rephrase entire sentence to capture the overall essence of the sentence.
Please suggest.
@invincible-akshay
Copy link
Contributor Author

Closing because the procedure to raise a JIRA before pull request wasn't followed, will raise again with right steps.

@invincible-akshay
Copy link
Contributor Author

Re-opened on creating JIRA.

@invincible-akshay invincible-akshay changed the title Update HdfsDesign.md Hadoop-16899. Update HdfsDesign.md to reduce ambiguity Mar 2, 2020
@invincible-akshay invincible-akshay changed the title Hadoop-16899. Update HdfsDesign.md to reduce ambiguity Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. Mar 2, 2020
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ trunk Compile Tests _
+1 💚 mvninstall 20m 57s trunk passed
+1 💚 mvnsite 1m 13s trunk passed
+1 💚 shadedclient 37m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 8s the patch passed
+1 💚 mvnsite 1m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 15m 3s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
57m 28s
Subsystem Report/Notes
Docker Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/1/artifact/out/Dockerfile
GITHUB PR #1871
Optional Tests dupname asflicense mvnsite markdownlint
uname Linux 01eccb3eecf9 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / edc2e9d
Max. process+thread count 344 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/1/console
versions git=2.7.4 maven=3.3.9
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@aajisaka
Copy link
Member

aajisaka commented Mar 3, 2020

Thank you for your contribution.

The sentence seems still ambiguous to me

One third of replicas are on one node, two thirds of replicas are on one rack.

-> Two replicas are on one rack, and the remaining replica is on one of the other racks.

  • The replication factor is 3 in this sentence, so 'one' seems clearer than 'one third'.
  • This sentence should tell that a replica is on a rack (instead of node) and the other two replicas are on one of 'the other' racks.

The following sentence is not directly related to your PR, however, it can be fixed at the same time.

However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks.

  • it does reduce -> it does not reduce

If a block is placed in three unique racks, the probability of rack-local read will increase and the network bandwidth will be reduced when reading the data. Therefore I think 'does' should be changed to 'does not'.

@invincible-akshay
Copy link
Contributor Author

Hi @aajisaka , thank you for your feedback. I agree with you, talk about fractions made me think about multiple blocks of file. I will replace the sentence with the one you suggested, it will make it clear.

I am happy to include the 2nd recommended update as well.

I will update and raise the PR again.
Or is it appropriate to update the code in same branch and let the PR get updated automatically? This is my first time so not very sure about the conventions.

@aajisaka
Copy link
Member

aajisaka commented Mar 3, 2020

Thanks.

Or is it appropriate to update the code in same branch and let the PR get updated automatically? This is my first time so not very sure about the conventions.

You can add commits in the same branch and let the PR get updated automatically :)

@invincible-akshay
Copy link
Contributor Author

Should we also update the following:

With this policy, the replicas of a file do not evenly distribute across the racks.

  • file -> block

And for the previous discussion I'm considering the statement as follows:

Two replicas are on different nodes of one rack and the remaining replica is on a node of one of the other racks.

Updated 3 sentences to correct some and improve clarity in others.
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ trunk Compile Tests _
+1 💚 mvninstall 19m 14s trunk passed
+1 💚 mvnsite 1m 14s trunk passed
+1 💚 shadedclient 34m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 6s the patch passed
+1 💚 mvnsite 1m 7s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
53m 15s
Subsystem Report/Notes
Docker Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/2/artifact/out/Dockerfile
GITHUB PR #1871
Optional Tests dupname asflicense mvnsite markdownlint
uname Linux cdbaa9a58979 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c0d0842
Max. process+thread count 414 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/2/console
versions git=2.7.4 maven=3.3.9
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@aajisaka aajisaka self-requested a review March 4, 2020 01:51
@aajisaka aajisaka merged commit bbd704b into apache:trunk Mar 4, 2020
aajisaka pushed a commit that referenced this pull request Mar 4, 2020
bilaharith pushed a commit to bilaharith/hadoop that referenced this pull request Mar 19, 2020
RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
zhangxiping1 pushed a commit to zhangxiping1/hadoop that referenced this pull request Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants