-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919
Conversation
// Filesystem path of the zone | ||
zonepath: PathBuf, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within this PR, this is always /zone/<zone name>
, which resides in the ramdisk. However, this will change in the future, as some zones move to:
/pool/ext/<UUID>/zone/<zone name>
pub const ZONE_ZFS_RAMDISK_DATASET_MOUNTPOINT: &str = "/zone"; | ||
pub const ZONE_ZFS_RAMDISK_DATASET: &str = "rpool/zone"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and renamed these constants to make it a little more clear that "anything you put here is on a ramdisk, and will not last across reboots!"
@@ -137,15 +137,15 @@ impl Zfs { | |||
} | |||
|
|||
/// Creates a new ZFS filesystem named `name`, unless one already exists. | |||
pub fn ensure_zoned_filesystem( | |||
pub fn ensure_filesystem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When provisioning filesystems in sled-hardware/src/disk.rs
this doesn't want zoned
, so it's configurable.
This function was also renamed to make the intent a bit more clear.
if pool.kind() == crate::zpool::ZpoolKind::Internal { | ||
continue; | ||
} | ||
let internal = pool.kind() == crate::zpool::ZpoolKind::Internal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this PR doesn't actually migrate any config data out of the ramdisk, eventually we'll move things from /var/oxide
into /pool/int/<UUID>/config
on the M.2s.
As a result, we will want to destroy internal datasets to "uninstall".
sled_hardware::HardwareUpdate::DiskAdded(disk) => { | ||
self.storage.upsert_disk(disk).await; | ||
} | ||
sled_hardware::HardwareUpdate::DiskRemoved(disk) => { | ||
self.storage.delete_disk(disk).await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bootstrap agent now uses a StorageManager
, so it can create the necessary datasets on the M.2s before the sled agents launch. This will be especially important as we move the sled agent's configuration data out of /var/oxide
-- these datasets must exist before the sled agent is created by RSS.
// TODO: Don't we need to do some accounting, e.g. for all the information | ||
// that's no longer accessible? Or is that up to Nexus to figure out at | ||
// a later point-in-time? | ||
// | ||
// If we're storing zone images on the M.2s for internal services, how | ||
// do we reconcile them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely needed to do this before this PR, but I'm labeling it. This will be more and more relevant as we move configs from the ramdisks onto U.2s.
/// A sled-local view of all attached storage. | ||
#[derive(Clone)] | ||
pub struct StorageManager { | ||
inner: Arc<StorageManagerInner>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now Arc
wrapped so that the ServiceManager
can also reference it.
In the future, the ServiceManager
will be asking the StorageManager
questions like:
- "I have a zone I want to install - what U.2s are available for me to use for the backing filesystem?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.
I think the unfortunate part is that both the StorageManager
and ServiceManager
are both in the business of provisioning services. If anything, the service management side of StorageManager
-- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.
Yep, I get that part. I was thinking more that it would be nice to create a Handle
to the StorageManager
rather than sharing StorageManagerInner
. It would probably look similar in shape and contents - especially talking to the storage worker, but would express the purpose better I think.
I think the unfortunate part is that both the StorageManager and ServiceManager are both in the business of provisioning services. If anything, the service management side of StorageManager -- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.
That's a good idea!
// Stores software images. | ||
// | ||
// Should be duplicated to both M.2s. | ||
INSTALL_DATASET, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives us a landing spot for the update process - we should be placing non-ramdisk control plane zones within
/pool/int/<UUID>/install
On each of the M.2s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments about abstraction, but nothing you don't already know. Other than that this is a good step forward. Let's merge it.
@@ -164,6 +177,11 @@ impl HardwareMonitor { | |||
let hardware = HardwareManager::new(log, sled_mode) | |||
.map_err(|e| Error::Hardware(e))?; | |||
|
|||
// TODO: The coupling between the storage and service manager is growing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either combine them or create an API between the two. It's kinda strange seeing an etherstub being passed into the StorageManager
. I'm happy to discuss a plan and help with this.
} | ||
} | ||
} | ||
|
||
// A wrapper around `ZoneRequest`, which allows it to be serialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this configuration that is only used during development? Is it something that goes away with self-assembling zones or is this actually part of making self-assembling zones work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunately not a development-only piece of metadata.
Here's my primer on "old vs new" zone management:
Old
- All zone filesystems are provisioned in
rpool/zone
(the ramdisk) - Zones are not self-assembling, meaning that they need tweaking from the sled agent after booting
- The list of zones under the purview of the sled are stored in a config file in
/var/oxide
(... unfortunately, this is also in the ramdisk)
New (not totally done in this PR, but we're heading this direction)
- Zone filesystems are provisioned in
/pool/ext/<UUID>/zone
on particular U.2s - Zones are self-assembling, meaning that once they are created, they can be booted
- NOTE: The sled agent may still need to supply some metadata prior to running the zone, such as creating VNICs!
- The list of zones under the purview of the sled agent are stored in
/pool/int/<UUID>/config
on both M.2s
So this metadata...
Yeah, so this piece of metadata is here because we're breaking the previous standard that "every zone filesystem lives in /zone
". By shuffling that around, the sled agent actually needs to know where to find zones to launch them.
I admittedly could store this information alongside the zone filesystem on the U.2, but by storing it within the config
directory of the M.2, I figured we'd have a better ability for sleds to eventually notice / report "hey, I should launch a particular zone, but it hasn't come up yet because that disk is gone".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great explanation! Much appreciated.
/// A sled-local view of all attached storage. | ||
#[derive(Clone)] | ||
pub struct StorageManager { | ||
inner: Arc<StorageManagerInner>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.
Thanks for the review! One last thing that I need to figure out before merging, and which (I believe?) is breaking the |
That makes sense. You are welcome! |
@faithanalog - I'm adding you as an FYI here - if you're busy, you can skim through this, but figured you might be interested since you went through some of this code recently! |
This PR is split off of #2902 , which also tries to use those datasets,
and attempts to make a subset of that change with a smaller diff.
This PR:
/pool/ext/<UUID>/zone
, and places storage-based zone filesystems therePart of #2888