-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix disk computation when initializing unassigned shards in desired balance computation #102207
Fix disk computation when initializing unassigned shards in desired balance computation #102207
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @idegtiarenko, I've created a changelog YAML for you. |
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java
# Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java
@idegtiarenko is this ready for review now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, I think there's a little bit of missing test coverage tho (see inline comments)
} | ||
|
||
/** | ||
* Must be called all shards that are in progress of initializations are processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally follow this comment, although I think I see what's going on. Could you add some more detail about how/why we only use the reservedSpace
when setting up the desired-balance computation and not while it's iterating?
@@ -254,6 +257,7 @@ public DesiredBalance compute( | |||
final long computationStartedTime = threadPool.relativeTimeInMillis(); | |||
long nextReportTime = computationStartedTime + timeWarningInterval; | |||
|
|||
clusterInfoSimulator.discardReservedSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DesiredBalanceComputerTests
all pass if this line is removed, which seems suspicious. Could we have a test that exercises this?
final long expectedShardSize = getExpectedShardSize(shardRouting, 0L, routingAllocation); | ||
final var shardToInitialize = unassignedReplicaIterator.initialize(nodeId, null, expectedShardSize, changes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, reverting this change doesn't make any DesiredBalanceComputerTests
fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this and below is no longer necessary. Now we rely on getExpectedShardSize
rather then on expected shard size set on shard itself.
final long expectedShardSize = getExpectedShardSize(shardRouting, 0L, routingAllocation); | ||
final var shardToInitialize = unassignedPrimaryIterator.initialize(nodeId, null, expectedShardSize, changes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, reverting this change doesn't make any DesiredBalanceComputerTests
fail.
// ensure new value is within bounds | ||
leastAvailableSpaceUsage.put(nodeId, updateWithFreeBytes(leastUsage, delta)); | ||
private void updateDiskUsage(Map<String, DiskUsage> availableSpaceUsage, String nodeId, String path, ShardId shardId, long delta) { | ||
if (reservedSpace.getOrDefault(new NodeAndPath(nodeId, path), ReservedSpace.EMPTY).containsShardId(shardId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify which shards have reserved space already?
); | ||
} | ||
|
||
public void testInitializeShardFromSearchableSnapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
public void testInitializeShardFromSearchableSnapshot() { | |
public void testInitializeShardFromPartialSearchableSnapshot() { |
since the simple (non-partial) searchable snapshot case is tested in the test above with a randomBool.
); | ||
} | ||
|
||
public void testRelocateSearchableSnapshotShard() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
public void testRelocateSearchableSnapshotShard() { | |
public void testRelocatePartialSearchableSnapshotShard() { |
new ClusterInfoTestBuilder() // | ||
.withNode(fromNodeId, new DiskUsageBuilder(1000, 1000)) | ||
.withNode(toNodeId, new DiskUsageBuilder(1000, 1000)) | ||
.withShard(shard, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Could also have the comment
// partial searchable snapshot uses DiskThresholdDecider.SETTING_IGNORE_DISK_WATERMARKS resulting
// in a 0 size reported
here as well
|
||
var snapshot = new Snapshot("repository", new SnapshotId("snapshot", randomUUID())); | ||
|
||
var shardSizeInfo = Maps.<String, Long>newHashMapWithExpectedSize(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Why 5 here and below? I only see 3 potential .put() operations below.
shardSizeInfo.put(shardIdentifierFromRouting(shardIdFrom(indexMetadata1, 0), true), ByteSizeValue.ofGb(8).getBytes()); | ||
shardSizeInfo.put(shardIdentifierFromRouting(shardIdFrom(indexMetadata1, 1), true), ByteSizeValue.ofGb(8).getBytes()); | ||
|
||
// index-2 is restored earlier but not allocated according to the desired balance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not allocated according to the desired balance
How is that signified? E.g., in case 1
below it says it's initializing on desired node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it is initializing, but not started yet. Clarifying that.
I am going to split the reserved space handling into a separate pr to keep this one simpler and as that aspect turned out to be more complex then originally anticipated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2 as long as CI is happy
This change fixes a bug of a node disk usage computation when starting an unassigned shard that already have desired node assignment. Previously such shard assumed to use no space. This is not true when such shard is restored from snapshot.
Related to: #91386