Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

invincible-akshay · 2020-03-02T20:16:13Z

Proposed change is in 2nd last sentence of the affected paragraph.
Considering the statement segmented in 3 parts by the commas:

the first part talks about "one thirds of replicas";
the second part talks about "two thirds of replicas"
the third part talking about "the other third" is leading to ambiguity when one thirds and two thirds have already accounted for the whole.
Possible solution is to either get rid of the third part or rephrase entire sentence to capture the overall essence of the sentence.
Please suggest.

NOTICE

Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute

Considering the statement segmented in 3 parts by the commas: 1. the first part talks about "one thirds of replicas"; 2. the second part talks about "two thirds of replicas" 3. the third part talking about "the other third" is leading to ambiguity when one thirds and two thirds have already accounted for the whole. Possible solution is to either get rid of the third part or rephrase entire sentence to capture the overall essence of the sentence. Please suggest.

invincible-akshay · 2020-03-02T20:17:57Z

Closing because the procedure to raise a JIRA before pull request wasn't followed, will raise again with right steps.

invincible-akshay · 2020-03-02T20:57:12Z

Re-opened on creating JIRA.

hadoop-yetus · 2020-03-02T21:58:06Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 26s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+0 🆗	markdownlint	0m 0s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ trunk Compile Tests _
+1 💚	mvninstall	20m 57s	trunk passed
+1 💚	mvnsite	1m 13s	trunk passed
+1 💚	shadedclient	37m 36s	branch has no errors when building and testing our client artifacts.
		_ Patch Compile Tests _
+1 💚	mvninstall	1m 8s	the patch passed
+1 💚	mvnsite	1m 8s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	shadedclient	15m 3s	patch has no errors when building and testing our client artifacts.
		_ Other Tests _
+1 💚	asflicense	0m 28s	The patch does not generate ASF License warnings.
		57m 28s

Subsystem	Report/Notes
Docker	Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/1/artifact/out/Dockerfile
GITHUB PR	#1871
Optional Tests	dupname asflicense mvnsite markdownlint
uname	Linux 01eccb3eecf9 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `edc2e9d`
Max. process+thread count	344 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/1/console
versions	git=2.7.4 maven=3.3.9
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

aajisaka · 2020-03-03T03:03:32Z

Thank you for your contribution.

The sentence seems still ambiguous to me

One third of replicas are on one node, two thirds of replicas are on one rack.

-> Two replicas are on one rack, and the remaining replica is on one of the other racks.

The replication factor is 3 in this sentence, so 'one' seems clearer than 'one third'.
This sentence should tell that a replica is on a rack (instead of node) and the other two replicas are on one of 'the other' racks.

The following sentence is not directly related to your PR, however, it can be fixed at the same time.

However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks.

it does reduce -> it does not reduce

If a block is placed in three unique racks, the probability of rack-local read will increase and the network bandwidth will be reduced when reading the data. Therefore I think 'does' should be changed to 'does not'.

invincible-akshay · 2020-03-03T03:22:25Z

Hi @aajisaka , thank you for your feedback. I agree with you, talk about fractions made me think about multiple blocks of file. I will replace the sentence with the one you suggested, it will make it clear.

I am happy to include the 2nd recommended update as well.

I will update and raise the PR again.
Or is it appropriate to update the code in same branch and let the PR get updated automatically? This is my first time so not very sure about the conventions.

aajisaka · 2020-03-03T03:51:54Z

Thanks.

Or is it appropriate to update the code in same branch and let the PR get updated automatically? This is my first time so not very sure about the conventions.

You can add commits in the same branch and let the PR get updated automatically :)

invincible-akshay · 2020-03-03T04:15:47Z

Should we also update the following:

With this policy, the replicas of a file do not evenly distribute across the racks.

file -> block

And for the previous discussion I'm considering the statement as follows:

Two replicas are on different nodes of one rack and the remaining replica is on a node of one of the other racks.

Updated 3 sentences to correct some and improve clarity in others.

hadoop-yetus · 2020-03-03T17:48:57Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 37s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+0 🆗	markdownlint	0m 0s	markdownlint was not available.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ trunk Compile Tests _
+1 💚	mvninstall	19m 14s	trunk passed
+1 💚	mvnsite	1m 14s	trunk passed
+1 💚	shadedclient	34m 24s	branch has no errors when building and testing our client artifacts.
		_ Patch Compile Tests _
+1 💚	mvninstall	1m 6s	the patch passed
+1 💚	mvnsite	1m 7s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	shadedclient	13m 45s	patch has no errors when building and testing our client artifacts.
		_ Other Tests _
+1 💚	asflicense	0m 31s	The patch does not generate ASF License warnings.
		53m 15s

Subsystem	Report/Notes
Docker	Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/2/artifact/out/Dockerfile
GITHUB PR	#1871
Optional Tests	dupname asflicense mvnsite markdownlint
uname	Linux cdbaa9a58979 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / c0d0842
Max. process+thread count	414 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1871/2/console
versions	git=2.7.4 maven=3.3.9
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

(cherry picked from commit bbd704b)

invincible-akshay closed this Mar 2, 2020

invincible-akshay reopened this Mar 2, 2020

invincible-akshay changed the title ~~Update HdfsDesign.md~~ Hadoop-16899. Update HdfsDesign.md to reduce ambiguity Mar 2, 2020

invincible-akshay changed the title ~~Hadoop-16899. Update HdfsDesign.md to reduce ambiguity~~ Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. Mar 2, 2020

Update HdfcDesign.md

f6864c4

Updated 3 sentences to correct some and improve clarity in others.

aajisaka self-requested a review March 4, 2020 01:51

aajisaka approved these changes Mar 4, 2020

View reviewed changes

aajisaka merged commit bbd704b into apache:trunk Mar 4, 2020

aajisaka pushed a commit that referenced this pull request Mar 4, 2020

HADOOP-16899. Update HdfsDesign.md to reduce ambiguity. (#1871)

369f4f9

(cherry picked from commit bbd704b)

bilaharith pushed a commit to bilaharith/hadoop that referenced this pull request Mar 19, 2020

HADOOP-16899. Update HdfsDesign.md to reduce ambiguity. (apache#1871)

afc14f7

RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020

HADOOP-16899. Update HdfsDesign.md to reduce ambiguity. (apache#1871)

934b4bd

zhangxiping1 pushed a commit to zhangxiping1/hadoop that referenced this pull request Dec 13, 2022

HADOOP-16899. Update HdfsDesign.md to reduce ambiguity. (apache#1871)

d8e0293

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

invincible-akshay commented Mar 2, 2020

invincible-akshay commented Mar 2, 2020

invincible-akshay commented Mar 2, 2020

hadoop-yetus commented Mar 2, 2020

aajisaka commented Mar 3, 2020

invincible-akshay commented Mar 3, 2020

aajisaka commented Mar 3, 2020

invincible-akshay commented Mar 3, 2020

hadoop-yetus commented Mar 3, 2020

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

Hadoop-16899. Update HdfsDesign.md to reduce ambiguity. #1871

Conversation

invincible-akshay commented Mar 2, 2020

NOTICE

invincible-akshay commented Mar 2, 2020

invincible-akshay commented Mar 2, 2020

hadoop-yetus commented Mar 2, 2020

aajisaka commented Mar 3, 2020

invincible-akshay commented Mar 3, 2020

aajisaka commented Mar 3, 2020

invincible-akshay commented Mar 3, 2020

hadoop-yetus commented Mar 3, 2020