-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: generation number fetch on startup and use in /attach #5163
Conversation
1640 tests run: 1567 passed, 0 failed, 73 skipped (full report)Code coverage full report
The comment gets automatically updated with the latest test results
562e47e at 2023-09-06T13:41:27.664Z :recycle: |
4443dc9
to
6a952e4
Compare
When we refactor mgr.rs to implement a Mgr type, this arg count will go down.
6a952e4
to
acce508
Compare
Was going to do that later with the validation bits, but there's no time like the present :-) This is now control_plane_client.rs
Right now what I care most about is not breaking the legacy mode (so still running all tests without generations). When it's time to check everything works in the new mode (i.e. before we think about cutting over in staging, then prod), it'll make sense to flip it back and forth globally. |
Co-authored-by: Joonas Koivunen <[email protected]>
Co-authored-by: Joonas Koivunen <[email protected]>
This was wrongly assuming generations should be the same: the local metadata will actually always have the current generation set, and in this situation we want to UseLocal if the sizes match, but take the metadata (generation) from the remote metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more nites nats, but this is looking good. Would like to focus on the filter_map vs. keeping the error, sounds like this should never happen and if there was a parsing error feels like we should not continue.
I don't see how that would be safe to continue.
Co-authored-by: Joonas Koivunen <[email protected]>
## Problem Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see [the RFC]( https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md)) In this PR: - A new "deletion queue" subsystem is introduced, through which deletions flow - `RemoteTimelineClient` is modified to send deletions through the deletion queue: - For GC & compaction, deletions flow through the full generation verifying process - For timeline deletions, deletions take a fast path that bypasses generation verification - The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers). - Until `control_plane_api` is set, all deletions skip generation validation - Tests are introduced for the new functionality in `test_pageserver_generations.py` Once this lands, if a pageserver is configured with the `control_plane_api` configuration added in #5163, it becomes safe to attach a tenant to multiple pageservers concurrently. --------- Co-authored-by: Joonas Koivunen <[email protected]> Co-authored-by: Christian Schwarz <[email protected]>
Problem
Closes: #5136
Summary of changes
control_plane_api
controls other functionality in this PR: if it is unset (default) then everything still works as it does today.control_plane_api
is set, then on startup we call out to control plane/re-attach
endpoint to discover our attachments and their generations. If an attachment is missing from the response we implicitly detach the tenant./attach
API may include ageneration
parameter. Ifcontrol_plane_api
is set, then this parameter is mandatory.neon_local
testing environment now includes a new binaryattachment_service
which implements the endpoints that the pageserver requires to operate. This is on by default if runningcargo neon
by hand. Intest_runner/
tests, it is off by default: existing tests continue to run with in the legacy generation-less mode.Caveats:
/re-attach
API doesn't tell us which timelines we should attach -- we still use local disk state for that. Ref: pageserver: enable arbitrary attachments during/re-attach
#5173Checklist before requesting a review
Checklist before merging