Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: include generations in attachment, re-attach on startup #5136

Closed
Tracked by #5050
jcsp opened this issue Aug 29, 2023 · 0 comments · Fixed by #5163
Closed
Tracked by #5050

pageserver: include generations in attachment, re-attach on startup #5136

jcsp opened this issue Aug 29, 2023 · 0 comments · Fixed by #5163
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests

Comments

@jcsp
Copy link
Collaborator

jcsp commented Aug 29, 2023

Pageserver needs control plane input two places:

  • During attachment, where a generation should be included in the request
  • During startup, to re-attach and get fresh generations

This change also includes implementing a tiny version of the API for use in neon_local and tests.

Pageserver will be configured with a control_plane_api URL -- if this is unset, it implicitly disables generation number awareness and enables pageserver to start without calling out to the control plane.

As part of this issue, we must also implement generation-aware index_part.json loading, because we will start writing out objects with the generation suffix as soon as control_plane_api is set.

We will continue to deploy with control_plane_api unset, so nothing changes in production yet[.

@jcsp jcsp self-assigned this Aug 29, 2023
@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Aug 29, 2023
@jcsp jcsp closed this as completed in #5163 Sep 6, 2023
jcsp added a commit that referenced this issue Sep 6, 2023
)

## Problem

- #5050 

Closes: #5136

## Summary of changes

- A new configuration property `control_plane_api` controls other
functionality in this PR: if it is unset (default) then everything still
works as it does today.
- If `control_plane_api` is set, then on startup we call out to control
plane `/re-attach` endpoint to discover our attachments and their
generations. If an attachment is missing from the response we implicitly
detach the tenant.
- Calls to pageserver `/attach` API may include a `generation`
parameter. If `control_plane_api` is set, then this parameter is
mandatory.
- RemoteTimelineClient's loading of index_part.json is generation-aware,
and will try to load the index_part with the most recent generation <=
its own generation.
- The `neon_local` testing environment now includes a new binary
`attachment_service` which implements the endpoints that the pageserver
requires to operate. This is on by default if running `cargo neon` by
hand. In `test_runner/` tests, it is off by default: existing tests
continue to run with in the legacy generation-less mode.

Caveats:
- The re-attachment during startup assumes that we are only re-attaching
tenants that have previously been attached, and not totally new tenants
-- this relies on the control plane's attachment logic to keep retrying
so that we should eventually see the attach API call. That's important
because the `/re-attach` API doesn't tell us which timelines we should
attach -- we still use local disk state for that. Ref:
#5173
- Testing: generations are only enabled for one integration test right
now (test_pageserver_restart), as a smoke test that all the machinery
basically works. Writing fuller tests that stress tenant migration will
come later, and involve extending our test fixtures to deal with
multiple pageservers.
- I'm not in love with "attachment_service" as a name for the neon_local
component, but it's not very important because we can easily rename
these test bits whenever we want.
- Limited observability when in re-attach on startup: when I add
generation validation for deletions in a later PR, I want to wrap up the
control plane API calls in some small client class that will expose
metrics for things like errors calling the control plane API, which will
act as a strong red signal that something is not right.

Co-authored-by: Christian Schwarz <[email protected]>
Co-authored-by: Joonas Koivunen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant