-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Followup on comments from #91612 #91700
Followup on comments from #91612 #91700
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
|
||
var settings = Settings.EMPTY; | ||
var settings = Settings.builder() | ||
.put(ThrottlingAllocationDecider.CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES_SETTING.getKey(), 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining why those settings are necessary here?
var metadataBuilder = Metadata.builder(); | ||
var routingTableBuilder = RoutingTable.builder(); | ||
for (var index : indices) { | ||
var inSyncId = UUIDs.randomBase64UUID(random()); | ||
var build = index.putInSyncAllocationIds(0, Set.of(inSyncId)).build(); | ||
var build = index.settings(settings(Version.CURRENT).put("index.routing.allocation.exclude._id", "node-3")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this setting used intentionally here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It forces the shards to be rebalanced by a followup allocate call, I would add a comment
@@ -1024,7 +1024,6 @@ public void testDoNotRebalanceToTheNodeThatNoLongerExists() { | |||
.put(SETTING_NUMBER_OF_REPLICAS, 0) | |||
.put(SETTING_INDEX_VERSION_CREATED.getKey(), Version.CURRENT) | |||
) | |||
.system(randomBoolean()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is unnecessary in that test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍. Thanks for adding the extra comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small comments but overall this looks better thanks.
test/framework/src/main/java/org/elasticsearch/cluster/ESAllocationTestCase.java
Show resolved
Hide resolved
@@ -95,93 +88,74 @@ public void testDecideShardAllocation() { | |||
|
|||
public void testBalanceByShardLoad() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit:
public void testBalanceByShardLoad() { | |
public void testBalanceByWriteLoad() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB ^
} | ||
} | ||
assertThat(nodeIngestLoad, lessThanOrEqualTo(8.0 + 1.5)); | ||
assertThat(nodeIngestLoad, lessThanOrEqualTo(8.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicitly this asserts that heavy-index
is on a different node from light-index-*
but could we be more explicit about that? Can we also assert that zero-write-load-index
and no-write-load-index
end up on the same node as heavy-index
because of shard balance?
var indexMetadata = clusterState.metadata().index(shardRouting.index()); | ||
nodeDiskUsage += indexMetadata.getForecastedShardSizeInBytes().orElse(0L); | ||
} | ||
} | ||
assertThat(nodeDiskUsage, lessThanOrEqualTo(ByteSizeValue.ofGb(8 + 1).getBytes())); | ||
assertThat(nodeDiskUsage, lessThanOrEqualTo(ByteSizeValue.ofGb(8).getBytes())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a couple of nits
@@ -95,93 +88,74 @@ public void testDecideShardAllocation() { | |||
|
|||
public void testBalanceByShardLoad() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB ^
// IndexRoutingTable.builder(build.getIndex()) | ||
// .addShard( | ||
// TestShardRouting.newShardRouting( | ||
// new ShardId(build.getIndex(), 0), | ||
// // allocate indices on excluded node so that they are rebalanced according to the weights later on in test | ||
// "node-3", | ||
// null, | ||
// true, | ||
// ShardRoutingState.STARTED, | ||
// AllocationId.newInitializing(inSyncId) | ||
// ) | ||
// ) | ||
// ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this commented-out code?
* New balancer unit tests (#91612) (cherry picked from commit 7c39880) * Add tests to BalancedShardsAllocator.Balancer#getIndexDiskUsageInBytes (#91642) This commit includes minor cleanups (cherry picked from commit 6cff05b) * Simulate balancer in realistic cluster (#91404) This commit adds a test suite which creates a somewhat realistic cluster and runs the shards allocator until convergence, then reports on the balance quality of the resulting allocation. This is useful for exploring changes to the balancer (although such experiments may want to increase the size of the cluster somewhat) and by running it in CI we can at least be sure that the balancer doesn't throw anything unexpected and does eventually converge in these situations. * Followup on comments from #91612 (#91700) Co-authored-by: Francisco Fernández Castaño <[email protected]> Co-authored-by: David Turner <[email protected]>
Followup on comments from #91612