storage controller: use proper ScheduleContext when evacuating a node #9908

jcsp · 2024-11-27T12:43:06Z

Problem

When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from.

We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants.

This was a TODO in handle_node_availability_transition:

                        // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters
                        // for tenants without secondary locations: if they have a secondary location, then this
                        // schedule() call is just promoting an existing secondary)

This is a precursor to #8264, where the current imperfect scheduling during node evacuation hampers testing.

Summary of changes

Add an iterator type that yields each shard along with a schedulecontext that includes all the other shards from the same tenant
Use the iterator to replace hand-crafted logic in optimize_all_plan (functionally identical)
Use the iterator in handle_node_availability_transition to apply proper anti-affinity during node evacuation.

VladLazar

Nice!

VladLazar · 2024-11-27T13:38:12Z

storage_controller/src/service.rs

+
+            if tenant_shard_id.shard_number.0 == tenant_shard_id.shard_count.count() - 1 {
+                let tenant_id = tenant_shard_id.tenant_id;
+                let tenant_shards = std::mem::take(&mut tenant_shards);


nit: you could break tenant_shards instead which is a bit more idiomatic imo

VladLazar · 2024-11-27T13:38:50Z

storage_controller/src/service.rs

+/// tenant while considering the individual shards within it.  This iterator is a helper
+/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext
+/// for the tenant.
+struct TenantShardContextIterator<'a> {


nit: I'd move this into a separate module. service is already plenty big.

VladLazar · 2024-11-27T13:40:20Z

storage_controller/src/scheduler.rs

@@ -305,7 +305,7 @@ impl std::ops::Add for AffinityScore {

 /// Hint for whether this is a sincere attempt to schedule, or a speculative
 /// check for where we _would_ schedule (done during optimization)
-#[derive(Debug)]
+#[derive(Debug, Clone)]


nit: why not Copy?

VladLazar · 2024-11-27T13:47:51Z

storage_controller/src/service.rs

-/// When making scheduling decisions, it is useful to have the ScheduleContext for a whole
-/// tenant while considering the individual shards within it.  This iterator is a helper
-/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext
-/// for the tenant.


Did you mean to delete this?

github-actions · 2024-11-27T15:18:40Z

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)

Flaky tests (1)

Postgres 14

test_pull_timeline[True]: release-arm64

Code coverage* (full report)

functions: 30.7% (7982 of 26018 functions)
lines: 48.6% (63406 of 130479 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
d70d741 at 2024-11-27T15:18:39.899Z :recycle:}

jcsp added 2 commits November 27, 2024 13:07

storcon: add tenant context iterator

5230552

storcon: use tenant iterator in node evac and optimisatino

d70d741

jcsp requested a review from VladLazar November 27, 2024 13:09

jcsp force-pushed the jcsp/storcon-context-iterator branch from 66f7e23 to d70d741 Compare November 27, 2024 13:09

VladLazar approved these changes Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage controller: use proper ScheduleContext when evacuating a node #9908

storage controller: use proper ScheduleContext when evacuating a node #9908

jcsp commented Nov 27, 2024 •

edited

Loading

VladLazar left a comment

VladLazar Nov 27, 2024

VladLazar Nov 27, 2024

VladLazar Nov 27, 2024

VladLazar Nov 27, 2024

github-actions bot commented Nov 27, 2024

Postgres 14

storage controller: use proper ScheduleContext when evacuating a node #9908

Are you sure you want to change the base?

storage controller: use proper ScheduleContext when evacuating a node #9908

Conversation

jcsp commented Nov 27, 2024 • edited Loading

Problem

Summary of changes

VladLazar left a comment

Choose a reason for hiding this comment

VladLazar Nov 27, 2024

Choose a reason for hiding this comment

VladLazar Nov 27, 2024

Choose a reason for hiding this comment

VladLazar Nov 27, 2024

Choose a reason for hiding this comment

VladLazar Nov 27, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 27, 2024

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)

Postgres 14

Code coverage* (full report)

jcsp commented Nov 27, 2024 •

edited

Loading