-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage controller: use proper ScheduleContext when evacuating a node #9908
base: main
Are you sure you want to change the base?
Conversation
66f7e23
to
d70d741
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
||
if tenant_shard_id.shard_number.0 == tenant_shard_id.shard_count.count() - 1 { | ||
let tenant_id = tenant_shard_id.tenant_id; | ||
let tenant_shards = std::mem::take(&mut tenant_shards); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could break tenant_shards
instead which is a bit more idiomatic imo
/// tenant while considering the individual shards within it. This iterator is a helper | ||
/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext | ||
/// for the tenant. | ||
struct TenantShardContextIterator<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd move this into a separate module. service
is already plenty big.
@@ -305,7 +305,7 @@ impl std::ops::Add for AffinityScore { | |||
|
|||
/// Hint for whether this is a sincere attempt to schedule, or a speculative | |||
/// check for where we _would_ schedule (done during optimization) | |||
#[derive(Debug)] | |||
#[derive(Debug, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not Copy
?
/// When making scheduling decisions, it is useful to have the ScheduleContext for a whole | ||
/// tenant while considering the individual shards within it. This iterator is a helper | ||
/// that gathers all the shards in a tenant and then yields them together with a ScheduleContext | ||
/// for the tenant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to delete this?
6941 tests run: 6633 passed, 0 failed, 308 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
d70d741 at 2024-11-27T15:18:39.899Z :recycle: |
Problem
When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from.
We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants.
This was a TODO in
handle_node_availability_transition
:This is a precursor to #8264, where the current imperfect scheduling during node evacuation hampers testing.
Summary of changes
handle_node_availability_transition
to apply proper anti-affinity during node evacuation.