Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

Merged
merged 11 commits into from
Apr 27, 2023

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Apr 24, 2023

This PR is split off of #2902 , which also tries to use those datasets,
and attempts to make a subset of that change with a smaller diff.

This PR:

  • Formats U.2 and M.2s with expected datasets
  • Adds /pool/ext/<UUID>/zone, and places storage-based zone filesystems there
  • Adds "oxi_" internal zpools to the "virtual hardware" scripts, emulating M.2s.

Part of #2888

@smklein smklein changed the title M2 datasets [sled-agent] Conform datasets to RFD 118 format, but still use ramdisks Apr 24, 2023
Comment on lines +443 to +445
// Filesystem path of the zone
zonepath: PathBuf,

Copy link
Collaborator Author

@smklein smklein Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within this PR, this is always /zone/<zone name>, which resides in the ramdisk. However, this will change in the future, as some zones move to:

/pool/ext/<UUID>/zone/<zone name>

Comment on lines +11 to +12
pub const ZONE_ZFS_RAMDISK_DATASET_MOUNTPOINT: &str = "/zone";
pub const ZONE_ZFS_RAMDISK_DATASET: &str = "rpool/zone";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and renamed these constants to make it a little more clear that "anything you put here is on a ramdisk, and will not last across reboots!"

@@ -137,15 +137,15 @@ impl Zfs {
}

/// Creates a new ZFS filesystem named `name`, unless one already exists.
pub fn ensure_zoned_filesystem(
pub fn ensure_filesystem(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When provisioning filesystems in sled-hardware/src/disk.rs this doesn't want zoned, so it's configurable.

This function was also renamed to make the intent a bit more clear.

if pool.kind() == crate::zpool::ZpoolKind::Internal {
continue;
}
let internal = pool.kind() == crate::zpool::ZpoolKind::Internal;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this PR doesn't actually migrate any config data out of the ramdisk, eventually we'll move things from /var/oxide into /pool/int/<UUID>/config on the M.2s.

As a result, we will want to destroy internal datasets to "uninstall".

Comment on lines +81 to +86
sled_hardware::HardwareUpdate::DiskAdded(disk) => {
self.storage.upsert_disk(disk).await;
}
sled_hardware::HardwareUpdate::DiskRemoved(disk) => {
self.storage.delete_disk(disk).await;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstrap agent now uses a StorageManager, so it can create the necessary datasets on the M.2s before the sled agents launch. This will be especially important as we move the sled agent's configuration data out of /var/oxide -- these datasets must exist before the sled agent is created by RSS.

Comment on lines +866 to +871
// TODO: Don't we need to do some accounting, e.g. for all the information
// that's no longer accessible? Or is that up to Nexus to figure out at
// a later point-in-time?
//
// If we're storing zone images on the M.2s for internal services, how
// do we reconcile them?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely needed to do this before this PR, but I'm labeling it. This will be more and more relevant as we move configs from the ramdisks onto U.2s.

/// A sled-local view of all attached storage.
#[derive(Clone)]
pub struct StorageManager {
inner: Arc<StorageManagerInner>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now Arc wrapped so that the ServiceManager can also reference it.

In the future, the ServiceManager will be asking the StorageManager questions like:

  • "I have a zone I want to install - what U.2s are available for me to use for the backing filesystem?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.

I think the unfortunate part is that both the StorageManager and ServiceManager are both in the business of provisioning services. If anything, the service management side of StorageManager -- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.

Yep, I get that part. I was thinking more that it would be nice to create a Handle to the StorageManager rather than sharing StorageManagerInner. It would probably look similar in shape and contents - especially talking to the storage worker, but would express the purpose better I think.

I think the unfortunate part is that both the StorageManager and ServiceManager are both in the business of provisioning services. If anything, the service management side of StorageManager -- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.

That's a good idea!

Comment on lines +203 to +206
// Stores software images.
//
// Should be duplicated to both M.2s.
INSTALL_DATASET,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives us a landing spot for the update process - we should be placing non-ramdisk control plane zones within

/pool/int/<UUID>/install

On each of the M.2s

@smklein smklein marked this pull request as ready for review April 24, 2023 19:54
@andrewjstone andrewjstone self-requested a review April 25, 2023 19:59
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments about abstraction, but nothing you don't already know. Other than that this is a good step forward. Let's merge it.

@@ -164,6 +177,11 @@ impl HardwareMonitor {
let hardware = HardwareManager::new(log, sled_mode)
.map_err(|e| Error::Hardware(e))?;

// TODO: The coupling between the storage and service manager is growing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either combine them or create an API between the two. It's kinda strange seeing an etherstub being passed into the StorageManager. I'm happy to discuss a plan and help with this.

}
}
}

// A wrapper around `ZoneRequest`, which allows it to be serialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this configuration that is only used during development? Is it something that goes away with self-assembling zones or is this actually part of making self-assembling zones work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately not a development-only piece of metadata.

Here's my primer on "old vs new" zone management:

Old

  • All zone filesystems are provisioned in rpool/zone (the ramdisk)
  • Zones are not self-assembling, meaning that they need tweaking from the sled agent after booting
  • The list of zones under the purview of the sled are stored in a config file in /var/oxide (... unfortunately, this is also in the ramdisk)

New (not totally done in this PR, but we're heading this direction)

  • Zone filesystems are provisioned in /pool/ext/<UUID>/zone on particular U.2s
  • Zones are self-assembling, meaning that once they are created, they can be booted
    • NOTE: The sled agent may still need to supply some metadata prior to running the zone, such as creating VNICs!
  • The list of zones under the purview of the sled agent are stored in /pool/int/<UUID>/config on both M.2s

So this metadata...

Yeah, so this piece of metadata is here because we're breaking the previous standard that "every zone filesystem lives in /zone". By shuffling that around, the sled agent actually needs to know where to find zones to launch them.

I admittedly could store this information alongside the zone filesystem on the U.2, but by storing it within the config directory of the M.2, I figured we'd have a better ability for sleds to eventually notice / report "hey, I should launch a particular zone, but it hasn't come up yet because that disk is gone".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great explanation! Much appreciated.

/// A sled-local view of all attached storage.
#[derive(Clone)]
pub struct StorageManager {
inner: Arc<StorageManagerInner>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.

@smklein
Copy link
Collaborator Author

smklein commented Apr 25, 2023

Thanks for the review! One last thing that I need to figure out before merging, and which (I believe?) is breaking the deploy job -- this PR is perhaps a bit too excited about using the "dataset mountpoint" for zones (e.g., /pool/ext/<UUID>/zone). For non-gimlet systems using synthetic disks, this mountpoint is not actually being created, which is a problem.

@andrewjstone
Copy link
Contributor

Thanks for the review! One last thing that I need to figure out before merging, and which (I believe?) is breaking the deploy job -- this PR is perhaps a bit too excited about using the "dataset mountpoint" for zones (e.g., /pool/ext/<UUID>/zone). For non-gimlet systems using synthetic disks, this mountpoint is not actually being created, which is a problem.

That makes sense. You are welcome!

@smklein smklein requested a review from faithanalog April 26, 2023 03:36
@smklein
Copy link
Collaborator Author

smklein commented Apr 26, 2023

@faithanalog - I'm adding you as an FYI here - if you're busy, you can skim through this, but figured you might be interested since you went through some of this code recently!

@smklein smklein merged commit 789fac4 into main Apr 27, 2023
@smklein smklein deleted the m2-datasets branch April 27, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants