Fix disk computation when initializing unassigned shards in desired balance computation #102207

idegtiarenko · 2023-11-15T10:19:48Z

This change fixes a bug of a node disk usage computation when starting an unassigned shard that already have desired node assignment. Previously such shard assumed to use no space. This is not true when such shard is restored from snapshot.

Related to: #91386

elasticsearchmachine · 2023-11-15T10:20:12Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2023-11-15T10:20:12Z

Hi @idegtiarenko, I've created a changelog YAML for you.

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

# Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

kingherc · 2023-12-18T12:38:24Z

@idegtiarenko is this ready for review now?

DaveCTurner

Makes sense to me, I think there's a little bit of missing test coverage tho (see inline comments)

DaveCTurner · 2024-01-03T09:57:02Z

server/src/main/java/org/elasticsearch/cluster/ClusterInfoSimulator.java

+    }
+
+    /**
+     * Must be called all shards that are in progress of initializations are processed


I don't totally follow this comment, although I think I see what's going on. Could you add some more detail about how/why we only use the reservedSpace when setting up the desired-balance computation and not while it's iterating?

DaveCTurner · 2024-01-03T10:01:42Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

@@ -254,6 +257,7 @@ public DesiredBalance compute(
        final long computationStartedTime = threadPool.relativeTimeInMillis();
        long nextReportTime = computationStartedTime + timeWarningInterval;

+        clusterInfoSimulator.discardReservedSpace();


The DesiredBalanceComputerTests all pass if this line is removed, which seems suspicious. Could we have a test that exercises this?

DaveCTurner · 2024-01-03T10:03:32Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+                        final long expectedShardSize = getExpectedShardSize(shardRouting, 0L, routingAllocation);
+                        final var shardToInitialize = unassignedReplicaIterator.initialize(nodeId, null, expectedShardSize, changes);


Likewise, reverting this change doesn't make any DesiredBalanceComputerTests fail.

Okay, this and below is no longer necessary. Now we rely on getExpectedShardSize rather then on expected shard size set on shard itself.

DaveCTurner · 2024-01-03T10:03:43Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

+                        final long expectedShardSize = getExpectedShardSize(shardRouting, 0L, routingAllocation);
+                        final var shardToInitialize = unassignedPrimaryIterator.initialize(nodeId, null, expectedShardSize, changes);


Likewise, reverting this change doesn't make any DesiredBalanceComputerTests fail.

kingherc · 2024-01-03T13:00:08Z

server/src/main/java/org/elasticsearch/cluster/ClusterInfoSimulator.java

-            // ensure new value is within bounds
-            leastAvailableSpaceUsage.put(nodeId, updateWithFreeBytes(leastUsage, delta));
+    private void updateDiskUsage(Map<String, DiskUsage> availableSpaceUsage, String nodeId, String path, ShardId shardId, long delta) {
+        if (reservedSpace.getOrDefault(new NodeAndPath(nodeId, path), ReservedSpace.EMPTY).containsShardId(shardId)) {


Could you clarify which shards have reserved space already?

kingherc · 2024-01-03T13:14:21Z

...t/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterInfoSimulatorTests.java

+        );
+    }
+
+    public void testInitializeShardFromSearchableSnapshot() {


nit

Suggested change

public void testInitializeShardFromSearchableSnapshot() {

public void testInitializeShardFromPartialSearchableSnapshot() {

since the simple (non-partial) searchable snapshot case is tested in the test above with a randomBool.

kingherc · 2024-01-03T13:15:09Z

...t/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterInfoSimulatorTests.java

+        );
+    }
+
+    public void testRelocateSearchableSnapshotShard() {


nit

Suggested change

public void testRelocateSearchableSnapshotShard() {

public void testRelocatePartialSearchableSnapshotShard() {

kingherc · 2024-01-03T13:15:51Z

...t/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterInfoSimulatorTests.java

+                new ClusterInfoTestBuilder() //
+                    .withNode(fromNodeId, new DiskUsageBuilder(1000, 1000))
+                    .withNode(toNodeId, new DiskUsageBuilder(1000, 1000))
+                    .withShard(shard, 0)


nit Could also have the comment

// partial searchable snapshot uses DiskThresholdDecider.SETTING_IGNORE_DISK_WATERMARKS resulting // in a 0 size reported

here as well

kingherc · 2024-01-03T13:19:42Z

...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

+
+        var snapshot = new Snapshot("repository", new SnapshotId("snapshot", randomUUID()));
+
+        var shardSizeInfo = Maps.<String, Long>newHashMapWithExpectedSize(5);


nit Why 5 here and below? I only see 3 potential .put() operations below.

kingherc · 2024-01-03T13:20:44Z

...java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

+        shardSizeInfo.put(shardIdentifierFromRouting(shardIdFrom(indexMetadata1, 0), true), ByteSizeValue.ofGb(8).getBytes());
+        shardSizeInfo.put(shardIdentifierFromRouting(shardIdFrom(indexMetadata1, 1), true), ByteSizeValue.ofGb(8).getBytes());
+
+        // index-2 is restored earlier but not allocated according to the desired balance


but not allocated according to the desired balance

How is that signified? E.g., in case 1 below it says it's initializing on desired node.

Correct, it is initializing, but not started yet. Clarifying that.

idegtiarenko · 2024-01-03T13:58:21Z

I am going to split the reserved space handling into a separate pr to keep this one simpler and as that aspect turned out to be more complex then originally anticipated

DaveCTurner

LGTM

kingherc

LGTM2 as long as CI is happy

idegtiarenko added 3 commits November 15, 2023 08:43

cleanup the test

ab93080

fix disk computation when initializing unassigned shards

eb5f041

cleanup

1f65f81

idegtiarenko requested review from DaveCTurner and kingherc November 15, 2023 10:19

Update docs/changelog/102207.yaml

80a02d9

idegtiarenko commented Nov 15, 2023

View reviewed changes

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java Outdated Show resolved Hide resolved

kingherc reviewed Nov 15, 2023

View reviewed changes

Merge branch 'main' into fix_disk_usage_computation

1348d98

# Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

Merge branch 'main' into fix_disk_usage_computation

0f692d3

# Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputerTests.java

idegtiarenko added 5 commits December 19, 2023 14:24

upd

63979bd

upd

fb5ec20

upd

63e329c

correctly initialize searchable snapshot cluster info size

bff873f

account for half recovered shards

7d7bf51

idegtiarenko requested a review from kingherc December 20, 2023 11:49

idegtiarenko added 2 commits December 20, 2023 12:50

Merge branch 'main' into fix_disk_usage_computation

c3e641d

resolve todos

55fb2b7

arteam self-requested a review December 21, 2023 15:52

arteam removed their request for review January 2, 2024 09:26

add comment

9e9a083

idegtiarenko added 2 commits January 2, 2024 16:45

Merge branch 'main' into fix_disk_usage_computation

5d0df83

fix test

6536734

DaveCTurner reviewed Jan 3, 2024

View reviewed changes

Merge branch 'main' into fix_disk_usage_computation

6f171d9

kingherc reviewed Jan 3, 2024

View reviewed changes

idegtiarenko added 3 commits January 3, 2024 14:31

test with partially initialized shards

100f5b1

remove unnecessary shard size estimation

b0c26e9

clarify comments

5e7adc9

idegtiarenko added 2 commits January 3, 2024 15:05

revert reserved space handling

b278266

revert reserved space handling

b6fdd55

idegtiarenko requested review from DaveCTurner and kingherc January 3, 2024 14:09

do not copy map if nothing has changed

c8697c8

DaveCTurner approved these changes Jan 4, 2024

View reviewed changes

Merge branch 'main' into fix_disk_usage_computation

7268c0a

kingherc approved these changes Jan 4, 2024

View reviewed changes

update comment

146ca34

idegtiarenko merged commit 3ba017e into elastic:main Jan 4, 2024
15 checks passed

idegtiarenko deleted the fix_disk_usage_computation branch January 4, 2024 10:24

idegtiarenko mentioned this pull request Jan 5, 2024

Fix disk usage computation during desired balance calculation #102013

Closed

This was referenced Jan 15, 2024

[CI] DesiredBalanceComputerTests testDesiredBalanceShouldConvergeInABigCluster failing #104343

Closed

Fix testDesiredBalanceShouldConvergeInABigCluster #104442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix disk computation when initializing unassigned shards in desired balance computation #102207

Fix disk computation when initializing unassigned shards in desired balance computation #102207

idegtiarenko commented Nov 15, 2023

elasticsearchmachine commented Nov 15, 2023

elasticsearchmachine commented Nov 15, 2023

kingherc commented Dec 18, 2023

DaveCTurner left a comment

DaveCTurner Jan 3, 2024

DaveCTurner Jan 3, 2024

DaveCTurner Jan 3, 2024

idegtiarenko Jan 3, 2024

DaveCTurner Jan 3, 2024

kingherc Jan 3, 2024

kingherc Jan 3, 2024

kingherc Jan 3, 2024

kingherc Jan 3, 2024

kingherc Jan 3, 2024

kingherc Jan 3, 2024

idegtiarenko Jan 3, 2024

idegtiarenko commented Jan 3, 2024

DaveCTurner left a comment

kingherc left a comment

		final long expectedShardSize = getExpectedShardSize(shardRouting, 0L, routingAllocation);
		final var shardToInitialize = unassignedReplicaIterator.initialize(nodeId, null, expectedShardSize, changes);

	public void testInitializeShardFromSearchableSnapshot() {
	public void testInitializeShardFromPartialSearchableSnapshot() {

	public void testRelocateSearchableSnapshotShard() {
	public void testRelocatePartialSearchableSnapshotShard() {


		var snapshot = new Snapshot("repository", new SnapshotId("snapshot", randomUUID()));

		var shardSizeInfo = Maps.<String, Long>newHashMapWithExpectedSize(5);

Fix disk computation when initializing unassigned shards in desired balance computation #102207

Fix disk computation when initializing unassigned shards in desired balance computation #102207

Conversation

idegtiarenko commented Nov 15, 2023

elasticsearchmachine commented Nov 15, 2023

elasticsearchmachine commented Nov 15, 2023

kingherc commented Dec 18, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idegtiarenko commented Jan 3, 2024

DaveCTurner left a comment

Choose a reason for hiding this comment

kingherc left a comment

Choose a reason for hiding this comment