Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: use proper ScheduleContext when evacuating a node #9908

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Nov 27, 2024

Problem

When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from.

We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants.

This was a TODO in handle_node_availability_transition:

                        // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters
                        // for tenants without secondary locations: if they have a secondary location, then this
                        // schedule() call is just promoting an existing secondary)

This is a precursor to #8264, where the current imperfect scheduling during node evacuation hampers testing.

Summary of changes

  • Add an iterator type that yields each shard along with a schedulecontext that includes all the other shards from the same tenant
  • Use the iterator to replace hand-crafted logic in optimize_all_plan (functionally identical)
  • Use the iterator in handle_node_availability_transition to apply proper anti-affinity during node evacuation.

@jcsp jcsp requested a review from VladLazar November 27, 2024 13:09
@jcsp jcsp force-pushed the jcsp/storcon-context-iterator branch from 66f7e23 to d70d741 Compare November 27, 2024 13:09
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


if tenant_shard_id.shard_number.0 == tenant_shard_id.shard_count.count() - 1 {
let tenant_id = tenant_shard_id.tenant_id;
let tenant_shards = std::mem::take(&mut tenant_shards);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could break tenant_shards instead which is a bit more idiomatic imo

/// tenant while considering the individual shards within it. This iterator is a helper
/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext
/// for the tenant.
struct TenantShardContextIterator<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd move this into a separate module. service is already plenty big.

@@ -305,7 +305,7 @@ impl std::ops::Add for AffinityScore {

/// Hint for whether this is a sincere attempt to schedule, or a speculative
/// check for where we _would_ schedule (done during optimization)
#[derive(Debug)]
#[derive(Debug, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not Copy?

Comment on lines 7097 to 7100
/// When making scheduling decisions, it is useful to have the ScheduleContext for a whole
/// tenant while considering the individual shards within it. This iterator is a helper
/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext
/// for the tenant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to delete this?

Copy link

6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 30.7% (7982 of 26018 functions)
  • lines: 48.6% (63406 of 130479 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
d70d741 at 2024-11-27T15:18:39.899Z :recycle:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants