-
Notifications
You must be signed in to change notification settings - Fork 463
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pageserver: generation number support in keys and indices (#5140)
## Problem To implement split brain protection, we need tenants and timelines to be aware of their current generation, and use it when composing S3 keys. ## Summary of changes - A `Generation` type is introduced in the `utils` crate -- it is in this broadly-visible location because it will later be used from `control_plane/` as well as `pageserver/`. Generations can be a number, None, or Broken, to support legacy content (None), and Tenants in the broken state (Broken). - Tenant, Timeline, and RemoteTimelineClient all get a generation attribute - IndexPart's IndexLayerMetadata has a new `generation` attribute. Legacy layers' metadata will deserialize to Generation::none(). - Remote paths are composed with a trailing generation suffix. If a generation is equal to Generation::none() (as it currently always is), then this suffix is an empty string. - Functions for composing remote storage paths added in remote_timeline_client: these avoid the way that we currently always compose a local path and then strip the prefix, and avoid requiring a PageserverConf reference on functions that want to create remote paths (the conf is only needed for local paths). These are less DRY than the old functions, but remote storage paths are a very rarely changing thing, so it's better to write out our paths clearly in the functions than to compose timeline paths from tenant paths, etc. - Code paths that construct a Tenant take a `generation` argument in anticipation that we will soon load generations on startup before constructing Tenant. Until the whole feature is done, we don't want any generation-ful keys though: so initially we will carry this everywhere with the special Generation::none() value. Closes: #5135 Co-authored-by: Christian Schwarz <[email protected]>
- Loading branch information
Showing
13 changed files
with
433 additions
and
159 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
use std::fmt::Debug; | ||
|
||
use serde::{Deserialize, Serialize}; | ||
|
||
/// Tenant generations are used to provide split-brain safety and allow | ||
/// multiple pageservers to attach the same tenant concurrently. | ||
/// | ||
/// See docs/rfcs/025-generation-numbers.md for detail on how generation | ||
/// numbers are used. | ||
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd, Ord)] | ||
pub enum Generation { | ||
// Generations with this magic value will not add a suffix to S3 keys, and will not | ||
// be included in persisted index_part.json. This value is only to be used | ||
// during migration from pre-generation metadata to generation-aware metadata, | ||
// and should eventually go away. | ||
// | ||
// A special Generation is used rather than always wrapping Generation in an Option, | ||
// so that code handling generations doesn't have to be aware of the legacy | ||
// case everywhere it touches a generation. | ||
None, | ||
// Generations with this magic value may never be used to construct S3 keys: | ||
// we will panic if someone tries to. This is for Tenants in the "Broken" state, | ||
// so that we can satisfy their constructor with a Generation without risking | ||
// a code bug using it in an S3 write (broken tenants should never write) | ||
Broken, | ||
Valid(u32), | ||
} | ||
|
||
/// The Generation type represents a number associated with a Tenant, which | ||
/// increments every time the tenant is attached to a new pageserver, or | ||
/// an attached pageserver restarts. | ||
/// | ||
/// It is included as a suffix in S3 keys, as a protection against split-brain | ||
/// scenarios where pageservers might otherwise issue conflicting writes to | ||
/// remote storage | ||
impl Generation { | ||
/// Create a new Generation that represents a legacy key format with | ||
/// no generation suffix | ||
pub fn none() -> Self { | ||
Self::None | ||
} | ||
|
||
// Create a new generation that will panic if you try to use get_suffix | ||
pub fn broken() -> Self { | ||
Self::Broken | ||
} | ||
|
||
pub fn new(v: u32) -> Self { | ||
Self::Valid(v) | ||
} | ||
|
||
pub fn is_none(&self) -> bool { | ||
matches!(self, Self::None) | ||
} | ||
|
||
pub fn get_suffix(&self) -> String { | ||
match self { | ||
Self::Valid(v) => { | ||
format!("-{:08x}", v) | ||
} | ||
Self::None => "".into(), | ||
Self::Broken => { | ||
panic!("Tried to use a broken generation"); | ||
} | ||
} | ||
} | ||
} | ||
|
||
impl Serialize for Generation { | ||
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> | ||
where | ||
S: serde::Serializer, | ||
{ | ||
if let Self::Valid(v) = self { | ||
v.serialize(serializer) | ||
} else { | ||
// We should never be asked to serialize a None or Broken. Structures | ||
// that include an optional generation should convert None to an | ||
// Option<Generation>::None | ||
Err(serde::ser::Error::custom( | ||
"Tried to serialize invalid generation ({self})", | ||
)) | ||
} | ||
} | ||
} | ||
|
||
impl<'de> Deserialize<'de> for Generation { | ||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> | ||
where | ||
D: serde::Deserializer<'de>, | ||
{ | ||
Ok(Self::Valid(u32::deserialize(deserializer)?)) | ||
} | ||
} | ||
|
||
// We intentionally do not implement Display for Generation, to reduce the | ||
// risk of a bug where the generation is used in a format!() string directly | ||
// instead of using get_suffix(). | ||
impl Debug for Generation { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
match self { | ||
Self::Valid(v) => { | ||
write!(f, "{:08x}", v) | ||
} | ||
Self::None => { | ||
write!(f, "<none>") | ||
} | ||
Self::Broken => { | ||
write!(f, "<broken>") | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
83ae2bd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests were run or test report is not available
83ae2bd at 2023-08-31T08:24:56.381Z :recycle: