Skip to content

Commit

Permalink
pageserver: generation number fetch on startup and use in /attach (#5163
Browse files Browse the repository at this point in the history
)

## Problem

- #5050 

Closes: #5136

## Summary of changes

- A new configuration property `control_plane_api` controls other
functionality in this PR: if it is unset (default) then everything still
works as it does today.
- If `control_plane_api` is set, then on startup we call out to control
plane `/re-attach` endpoint to discover our attachments and their
generations. If an attachment is missing from the response we implicitly
detach the tenant.
- Calls to pageserver `/attach` API may include a `generation`
parameter. If `control_plane_api` is set, then this parameter is
mandatory.
- RemoteTimelineClient's loading of index_part.json is generation-aware,
and will try to load the index_part with the most recent generation <=
its own generation.
- The `neon_local` testing environment now includes a new binary
`attachment_service` which implements the endpoints that the pageserver
requires to operate. This is on by default if running `cargo neon` by
hand. In `test_runner/` tests, it is off by default: existing tests
continue to run with in the legacy generation-less mode.

Caveats:
- The re-attachment during startup assumes that we are only re-attaching
tenants that have previously been attached, and not totally new tenants
-- this relies on the control plane's attachment logic to keep retrying
so that we should eventually see the attach API call. That's important
because the `/re-attach` API doesn't tell us which timelines we should
attach -- we still use local disk state for that. Ref:
#5173
- Testing: generations are only enabled for one integration test right
now (test_pageserver_restart), as a smoke test that all the machinery
basically works. Writing fuller tests that stress tenant migration will
come later, and involve extending our test fixtures to deal with
multiple pageservers.
- I'm not in love with "attachment_service" as a name for the neon_local
component, but it's not very important because we can easily rename
these test bits whenever we want.
- Limited observability when in re-attach on startup: when I add
generation validation for deletions in a later PR, I want to wrap up the
control plane API calls in some small client class that will expose
metrics for things like errors calling the control plane API, which will
act as a strong red signal that something is not right.

Co-authored-by: Christian Schwarz <[email protected]>
Co-authored-by: Joonas Koivunen <[email protected]>
  • Loading branch information
3 people authored Sep 6, 2023
1 parent da60f69 commit 61d661a
Show file tree
Hide file tree
Showing 30 changed files with 1,264 additions and 84 deletions.
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions control_plane/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ git-version.workspace = true
nix.workspace = true
once_cell.workspace = true
postgres.workspace = true
hex.workspace = true
hyper.workspace = true
regex.workspace = true
reqwest = { workspace = true, features = ["blocking", "json"] }
serde.workspace = true
Expand All @@ -20,6 +22,7 @@ serde_with.workspace = true
tar.workspace = true
thiserror.workspace = true
toml.workspace = true
tokio.workspace = true
url.workspace = true
# Note: Do not directly depend on pageserver or safekeeper; use pageserver_api or safekeeper_api
# instead, so that recompile times are better.
Expand Down
106 changes: 106 additions & 0 deletions control_plane/src/attachment_service.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
use crate::{background_process, local_env::LocalEnv};
use anyhow::anyhow;
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
use std::{path::PathBuf, process::Child};
use utils::id::{NodeId, TenantId};

pub struct AttachmentService {
env: LocalEnv,
listen: String,
path: PathBuf,
}

const COMMAND: &str = "attachment_service";

#[serde_as]
#[derive(Serialize, Deserialize)]
pub struct AttachHookRequest {
#[serde_as(as = "DisplayFromStr")]
pub tenant_id: TenantId,
pub pageserver_id: Option<NodeId>,
}

#[derive(Serialize, Deserialize)]
pub struct AttachHookResponse {
pub gen: Option<u32>,
}

impl AttachmentService {
pub fn from_env(env: &LocalEnv) -> Self {
let path = env.base_data_dir.join("attachments.json");

// Makes no sense to construct this if pageservers aren't going to use it: assume
// pageservers have control plane API set
let listen_url = env.pageserver.control_plane_api.clone().unwrap();

let listen = format!(
"{}:{}",
listen_url.host_str().unwrap(),
listen_url.port().unwrap()
);

Self {
env: env.clone(),
path,
listen,
}
}

fn pid_file(&self) -> PathBuf {
self.env.base_data_dir.join("attachment_service.pid")
}

pub fn start(&self) -> anyhow::Result<Child> {
let path_str = self.path.to_string_lossy();

background_process::start_process(
COMMAND,
&self.env.base_data_dir,
&self.env.attachment_service_bin(),
["-l", &self.listen, "-p", &path_str],
[],
background_process::InitialPidFile::Create(&self.pid_file()),
// TODO: a real status check
|| Ok(true),
)
}

pub fn stop(&self, immediate: bool) -> anyhow::Result<()> {
background_process::stop_process(immediate, COMMAND, &self.pid_file())
}

/// Call into the attach_hook API, for use before handing out attachments to pageservers
pub fn attach_hook(
&self,
tenant_id: TenantId,
pageserver_id: NodeId,
) -> anyhow::Result<Option<u32>> {
use hyper::StatusCode;

let url = self
.env
.pageserver
.control_plane_api
.clone()
.unwrap()
.join("attach_hook")
.unwrap();
let client = reqwest::blocking::ClientBuilder::new()
.build()
.expect("Failed to construct http client");

let request = AttachHookRequest {
tenant_id,
pageserver_id: Some(pageserver_id),
};

let response = client.post(url).json(&request).send()?;
if response.status() != StatusCode::OK {
return Err(anyhow!("Unexpected status {}", response.status()));
}

let response = response.json::<AttachHookResponse>()?;
Ok(response.gen)
}
}
Loading

1 comment on commit 61d661a

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1708 tests run: 1628 passed, 0 failed, 80 skipped (full report)


Code coverage full report

  • functions: 52.6% (7526 of 14318 functions)
  • lines: 81.3% (44408 of 54629 lines)

The comment gets automatically updated with the latest test results
61d661a at 2023-09-06T14:22:02.054Z :recycle:

Please sign in to comment.