Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Introduce primary weight factor for primary shards distribution #6017

Merged
merged 10 commits into from
Feb 2, 2023

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Jan 25, 2023

Description

This change introduces the mechanism to balance primary shards across cluster

  • Introduce new weight factor PRIMARY_BALANCE_FACTOR_SETTING setting which defines the constant for primary shards; used to calculate the weight on node. This factor defines the tendency to balance the primary shards; higher the value more chances of even distribution. This setting's default is initially set to 0.0f to avoid rebalance disruption in existing clusters. The other settings used in weight calculation are INDEX_BALANCE_FACTOR_SETTING govern shards/node specific to index and SHARD_BALANCE_FACTOR_SETTING shards/node.
  • This needed changes in utilities classes ModelIndex, ModelNode; used for weight calculation.
  • It changes existing TriConsumer to conform to existing Java Consumer interface definition and related unit tests changes. It used as consumer to reflect settings changes to WeightFunction.
  • Unit tests in BalanceConfigurationTests to verify shard distribution with new setting.
  • Integration tests

Default setting consideration

PRIMARY_BALANCE_FACTOR_SETTING is initially set to 0 to prevent unnecessary disruption in doc rep indices. One improvement is to use some sane default when there are segrep enabled indices to automatically balance cluster based on primary shard. This is difficult as

  1. There is no single default which works for all cluster configuration. Added a task in [Segment Replication] Update shard allocation to evenly distribute primaries. #5240 to perform analysis on shard distribution for different configuration so that we can come up with some guidance.
  2. Giving control to customer is better over automatic primary shard allocation; to allow primary allocation to be performed when needed based on resource consumption and consent.

Issues Resolved

#5240

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2023

Codecov Report

Merging #6017 (11b0c80) into main (c5a1bdf) will decrease coverage by 0.15%.
The diff coverage is 72.22%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6017      +/-   ##
============================================
- Coverage     70.87%   70.72%   -0.15%     
+ Complexity    58830    58752      -78     
============================================
  Files          4776     4778       +2     
  Lines        280993   281074      +81     
  Branches      40598    40617      +19     
============================================
- Hits         199141   198794     -347     
- Misses        65488    65852     +364     
- Partials      16364    16428      +64     
Impacted Files Coverage Δ
...r/routing/allocation/allocator/ShardsBalancer.java 25.00% <0.00%> (-8.34%) ⬇️
...c/main/java/org/opensearch/common/TriConsumer.java 0.00% <0.00%> (ø)
...rg/opensearch/common/settings/ClusterSettings.java 91.89% <ø> (ø)
...search/common/settings/AbstractScopedSettings.java 86.91% <25.00%> (-1.25%) ⬇️
...in/java/org/opensearch/common/collect/Triplet.java 53.33% <53.33%> (ø)
...va/org/opensearch/cluster/routing/RoutingNode.java 78.78% <85.71%> (+0.25%) ⬆️
.../allocation/allocator/BalancedShardsAllocator.java 90.57% <85.71%> (-1.78%) ⬇️
...n/java/org/opensearch/common/settings/Setting.java 90.09% <95.23%> (+0.16%) ⬆️
...ting/allocation/allocator/LocalShardsBalancer.java 85.01% <100.00%> (+0.32%) ⬆️
...egations/support/MultiTermsValuesSourceConfig.java 80.32% <100.00%> (ø)
... and 483 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2023

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dreamer-89 for this change!

@dreamer-89 dreamer-89 merged commit aa21b5f into opensearch-project:main Feb 2, 2023
@dreamer-89 dreamer-89 added the backport 2.x Backport to 2.x branch label Feb 2, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 2, 2023
…rds distribution (#6017)

* Add integration test to show shard allocation

Signed-off-by: Suraj Singh <[email protected]>

* Update WeightFunction to consider primary shards for uniform primary distribution

Signed-off-by: Suraj Singh <[email protected]>

* Include primary shard weight for all shard types

Signed-off-by: Suraj Singh <[email protected]>

* Update integration test to show docrep & segrep indices

Signed-off-by: Suraj Singh <[email protected]>

* Add settings updater and update TriConsumer functional interface declaration

Signed-off-by: Suraj Singh <[email protected]>

* Add balance configuration test

Signed-off-by: Suraj Singh <[email protected]>

* Fix failing unit tests and merge conflicts

Signed-off-by: Suraj Singh <[email protected]>

* Rename primary balance factor

Signed-off-by: Suraj Singh <[email protected]>

* Update integration tests to avoid green state timeouts

Signed-off-by: Suraj Singh <[email protected]>

* PR feedback

Signed-off-by: Suraj Singh <[email protected]>

---------

Signed-off-by: Suraj Singh <[email protected]>
(cherry picked from commit aa21b5f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 pushed a commit that referenced this pull request Feb 2, 2023
…rds distribution (#6017) (#6161)

* Add integration test to show shard allocation



* Update WeightFunction to consider primary shards for uniform primary distribution



* Include primary shard weight for all shard types



* Update integration test to show docrep & segrep indices



* Add settings updater and update TriConsumer functional interface declaration



* Add balance configuration test



* Fix failing unit tests and merge conflicts



* Rename primary balance factor



* Update integration tests to avoid green state timeouts



* PR feedback



---------


(cherry picked from commit aa21b5f)

Signed-off-by: Suraj Singh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Feb 22, 2023
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Feb 22, 2023
dreamer-89 added a commit that referenced this pull request Feb 22, 2023
* Revert "[Segment Replication] Add changelog entry and fix allocation test (#6250)"

This reverts commit f8102d1.

* Revert "[Segment Replication] Introduce primary weight factor for primary shards distribution (#6017)"

This reverts commit aa21b5f.
dreamer-89 added a commit that referenced this pull request Feb 22, 2023
* Revert "[Segment Replication] Add changelog entry and fix allocation test (#6250) (#6270)"

This reverts commit 726c068.

* Revert "[Segment Replication] Introduce primary weight factor for primary shards distribution (#6017) (#6161)"

This reverts commit 2e3e1ed.
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 22, 2023
* Revert "[Segment Replication] Add changelog entry and fix allocation test (#6250) (#6270)"

This reverts commit 726c068.

* Revert "[Segment Replication] Introduce primary weight factor for primary shards distribution (#6017) (#6161)"

This reverts commit 2e3e1ed.

(cherry picked from commit e316959)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 pushed a commit that referenced this pull request Feb 22, 2023
…6444)

* Revert "[Segment Replication] Add changelog entry and fix allocation test (#6250) (#6270)"

This reverts commit 726c068.

* Revert "[Segment Replication] Introduce primary weight factor for primary shards distribution (#6017) (#6161)"

This reverts commit 2e3e1ed.

(cherry picked from commit e316959)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mch2 pushed a commit to mch2/OpenSearch that referenced this pull request Mar 4, 2023
…rds distribution (opensearch-project#6017)

* Add integration test to show shard allocation

Signed-off-by: Suraj Singh <[email protected]>

* Update WeightFunction to consider primary shards for uniform primary distribution

Signed-off-by: Suraj Singh <[email protected]>

* Include primary shard weight for all shard types

Signed-off-by: Suraj Singh <[email protected]>

* Update integration test to show docrep & segrep indices

Signed-off-by: Suraj Singh <[email protected]>

* Add settings updater and update TriConsumer functional interface declaration

Signed-off-by: Suraj Singh <[email protected]>

* Add balance configuration test

Signed-off-by: Suraj Singh <[email protected]>

* Fix failing unit tests and merge conflicts

Signed-off-by: Suraj Singh <[email protected]>

* Rename primary balance factor

Signed-off-by: Suraj Singh <[email protected]>

* Update integration tests to avoid green state timeouts

Signed-off-by: Suraj Singh <[email protected]>

* PR feedback

Signed-off-by: Suraj Singh <[email protected]>

---------

Signed-off-by: Suraj Singh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants