-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage controller: make chaos less disruptive to AZ locality #10438
Conversation
@problame this aims to avoid future conflicts between wanting to exercise migrations and wanting to avoid scrambling AZ locality |
7348 tests run: 6971 passed, 0 failed, 377 skipped (full report)Flaky tests (6)Postgres 17
Postgres 16
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
acbfb8c at 2025-01-17T16:12:18.433Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better than what we have now.
Some nits wrt determinism and randomness.
I'm going to merge this to let the staging environment stabilize its scheduling, and do a small followup to make it a hashset etc. |
## Problem In #10438 I had got the function for picking tenants backwards, and it was preferring to move things _away_ from their preferred AZ. ## Summary of changes - Fix condition in `is_attached_outside_preferred_az`
## Problem In #10438 it was pointed out that it would be good to avoid picking tenants in ID order, and also to avoid situations where we might double-select the same tenant. There was an initial swing at this in #10443, where Chi suggested a simpler approach which is done in this PR ## Summary of changes - Split total set of tenants into in and out of home AZ - Consume out of home AZ first, and if necessary shuffle + consume from out of home AZ
Problem
Since #9916 , the chaos code is actively fighting the optimizer: tenants tend to be attached in their preferred AZ, so most chaos migrations were moving them to a non-preferred AZ.
Summary of changes
The resulting behavior should be an alternating "back and forth" where the chaos code migrates thiings away from home, and then migrates them back on the next iteration.
The side effect will be that the chaos code actively helps to push things into their home AZ. That's not contrary to its purpose though: we mainly just want it to continuously migrate things to exercise migration+notification code.