Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Followup on comments from #91612 #91700

Merged
merged 5 commits into from
Nov 21, 2022

Conversation

idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Nov 18, 2022

Followup on comments from #91612

@idegtiarenko idegtiarenko added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0 labels Nov 18, 2022
@idegtiarenko idegtiarenko changed the title Followup on comments in #91612 Followup on comments from #91612 Nov 18, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)


var settings = Settings.EMPTY;
var settings = Settings.builder()
.put(ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES_SETTING.getKey(), 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining why those settings are necessary here?

var metadataBuilder = Metadata.builder();
var routingTableBuilder = RoutingTable.builder();
for (var index : indices) {
var inSyncId = UUIDs.randomBase64UUID(random());
var build = index.putInSyncAllocationIds(0, Set.of(inSyncId)).build();
var build = index.settings(settings(Version.CURRENT).put("index.routing.allocation.exclude._id", "node-3"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this setting used intentionally here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It forces the shards to be rebalanced by a followup allocate call, I would add a comment

@@ -1024,7 +1024,6 @@ public void testDoNotRebalanceToTheNodeThatNoLongerExists() {
.put(SETTING_NUMBER_OF_REPLICAS, 0)
.put(SETTING_INDEX_VERSION_CREATED.getKey(), Version.CURRENT)
)
.system(randomBoolean())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is unnecessary in that test

@idegtiarenko idegtiarenko requested a review from fcofdez November 18, 2022 15:31
Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍. Thanks for adding the extra comments!

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few small comments but overall this looks better thanks.

@@ -95,93 +88,74 @@ public void testDecideShardAllocation() {

public void testBalanceByShardLoad() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit:

Suggested change
public void testBalanceByShardLoad() {
public void testBalanceByWriteLoad() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB ^

}
}
assertThat(nodeIngestLoad, lessThanOrEqualTo(8.0 + 1.5));
assertThat(nodeIngestLoad, lessThanOrEqualTo(8.0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicitly this asserts that heavy-index is on a different node from light-index-* but could we be more explicit about that? Can we also assert that zero-write-load-index and no-write-load-index end up on the same node as heavy-index because of shard balance?

var indexMetadata = clusterState.metadata().index(shardRouting.index());
nodeDiskUsage += indexMetadata.getForecastedShardSizeInBytes().orElse(0L);
}
}
assertThat(nodeDiskUsage, lessThanOrEqualTo(ByteSizeValue.ofGb(8 + 1).getBytes()));
assertThat(nodeDiskUsage, lessThanOrEqualTo(ByteSizeValue.ofGb(8).getBytes()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a couple of nits

@@ -95,93 +88,74 @@ public void testDecideShardAllocation() {

public void testBalanceByShardLoad() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB ^

Comment on lines 349 to 361
// IndexRoutingTable.builder(build.getIndex())
// .addShard(
// TestShardRouting.newShardRouting(
// new ShardId(build.getIndex(), 0),
// // allocate indices on excluded node so that they are rebalanced according to the weights later on in test
// "node-3",
// null,
// true,
// ShardRoutingState.STARTED,
// AllocationId.newInitializing(inSyncId)
// )
// )
// );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this commented-out code?

@idegtiarenko idegtiarenko merged commit 642a77f into elastic:main Nov 21, 2022
@idegtiarenko idegtiarenko deleted the test_comments_followup branch November 21, 2022 15:11
idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Nov 21, 2022
elasticsearchmachine pushed a commit that referenced this pull request Nov 21, 2022
* New balancer unit tests (#91612)

(cherry picked from commit 7c39880)

* Add tests to BalancedShardsAllocator.Balancer#getIndexDiskUsageInBytes (#91642)

This commit includes minor cleanups

(cherry picked from commit 6cff05b)

* Simulate balancer in realistic cluster (#91404)

This commit adds a test suite which creates a somewhat realistic cluster
and runs the shards allocator until convergence, then reports on the
balance quality of the resulting allocation. This is useful for
exploring changes to the balancer (although such experiments may want to
increase the size of the cluster somewhat) and by running it in CI we
can at least be sure that the balancer doesn't throw anything unexpected
and does eventually converge in these situations.

* Followup on comments from #91612 (#91700)

Co-authored-by: Francisco Fernández Castaño <[email protected]>
Co-authored-by: David Turner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants