[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

smklein · 2023-04-24T18:13:36Z

This PR is split off of #2902 , which also tries to use those datasets,
and attempts to make a subset of that change with a smaller diff.

This PR:

Formats U.2 and M.2s with expected datasets
Adds /pool/ext/<UUID>/zone, and places storage-based zone filesystems there
Adds "oxi_" internal zpools to the "virtual hardware" scripts, emulating M.2s.

Part of #2888

…ramdisk

smklein · 2023-04-24T19:20:29Z

illumos-utils/src/running_zone.rs

+    // Filesystem path of the zone
+    zonepath: PathBuf,
+


Within this PR, this is always /zone/<zone name>, which resides in the ramdisk. However, this will change in the future, as some zones move to:

/pool/ext/<UUID>/zone/<zone name>

smklein · 2023-04-24T19:26:08Z

illumos-utils/src/zfs.rs

+pub const ZONE_ZFS_RAMDISK_DATASET_MOUNTPOINT: &str = "/zone";
+pub const ZONE_ZFS_RAMDISK_DATASET: &str = "rpool/zone";


I went ahead and renamed these constants to make it a little more clear that "anything you put here is on a ramdisk, and will not last across reboots!"

smklein · 2023-04-24T19:27:24Z

illumos-utils/src/zfs.rs

@@ -137,15 +137,15 @@ impl Zfs {
    }

    /// Creates a new ZFS filesystem named `name`, unless one already exists.
-    pub fn ensure_zoned_filesystem(
+    pub fn ensure_filesystem(


When provisioning filesystems in sled-hardware/src/disk.rs this doesn't want zoned, so it's configurable.

This function was also renamed to make the intent a bit more clear.

smklein · 2023-04-24T19:28:47Z

illumos-utils/src/zfs.rs

-        if pool.kind() == crate::zpool::ZpoolKind::Internal {
-            continue;
-        }
+        let internal = pool.kind() == crate::zpool::ZpoolKind::Internal;


Although this PR doesn't actually migrate any config data out of the ramdisk, eventually we'll move things from /var/oxide into /pool/int/<UUID>/config on the M.2s.

As a result, we will want to destroy internal datasets to "uninstall".

smklein · 2023-04-24T19:30:38Z

sled-agent/src/bootstrap/hardware.rs

+                            sled_hardware::HardwareUpdate::DiskAdded(disk) => {
+                                self.storage.upsert_disk(disk).await;
+                            }
+                            sled_hardware::HardwareUpdate::DiskRemoved(disk) => {
+                                self.storage.delete_disk(disk).await;
+                            }


The bootstrap agent now uses a StorageManager, so it can create the necessary datasets on the M.2s before the sled agents launch. This will be especially important as we move the sled agent's configuration data out of /var/oxide -- these datasets must exist before the sled agent is created by RSS.

smklein · 2023-04-24T19:34:24Z

sled-agent/src/storage_manager.rs

+        // TODO: Don't we need to do some accounting, e.g. for all the information
+        // that's no longer accessible? Or is that up to Nexus to figure out at
+        // a later point-in-time?
+        //
+        // If we're storing zone images on the M.2s for internal services, how
+        // do we reconcile them?


We definitely needed to do this before this PR, but I'm labeling it. This will be more and more relevant as we move configs from the ramdisks onto U.2s.

smklein · 2023-04-24T19:36:17Z

sled-agent/src/storage_manager.rs

+/// A sled-local view of all attached storage.
+#[derive(Clone)]
+pub struct StorageManager {
+    inner: Arc<StorageManagerInner>,


This is now Arc wrapped so that the ServiceManager can also reference it.

In the future, the ServiceManager will be asking the StorageManager questions like:

"I have a zone I want to install - what U.2s are available for me to use for the backing filesystem?"

I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.

If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.

I think the unfortunate part is that both the StorageManager and ServiceManager are both in the business of provisioning services. If anything, the service management side of StorageManager -- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.

If it helps, the dependency is "real", in a sense -- provisioning services does (and will increasingly) depend on access to storage as we migrate the location of zone filesystems. Sharing this reference makes the sled agent coherent.

Yep, I get that part. I was thinking more that it would be nice to create a Handle to the StorageManager rather than sharing StorageManagerInner. It would probably look similar in shape and contents - especially talking to the storage worker, but would express the purpose better I think.

I think the unfortunate part is that both the StorageManager and ServiceManager are both in the business of provisioning services. If anything, the service management side of StorageManager -- which currently handles the CRDB, Clickhouse, and Crucible zones -- could be migrated out.

That's a good idea!

smklein · 2023-04-24T19:39:24Z

sled-hardware/src/disk.rs

+    // Stores software images.
+    //
+    // Should be duplicated to both M.2s.
+    INSTALL_DATASET,


This gives us a landing spot for the update process - we should be placing non-ramdisk control plane zones within

/pool/int/<UUID>/install

On each of the M.2s

andrewjstone

I have a few comments about abstraction, but nothing you don't already know. Other than that this is a good step forward. Let's merge it.

andrewjstone · 2023-04-25T21:58:27Z

sled-agent/src/bootstrap/hardware.rs

@@ -164,6 +177,11 @@ impl HardwareMonitor {
        let hardware = HardwareManager::new(log, sled_mode)
            .map_err(|e| Error::Hardware(e))?;

+        // TODO: The coupling between the storage and service manager is growing


Either combine them or create an API between the two. It's kinda strange seeing an etherstub being passed into the StorageManager. I'm happy to discuss a plan and help with this.

andrewjstone · 2023-04-25T22:12:04Z

sled-agent/src/services.rs

        }
    }
 }

+// A wrapper around `ZoneRequest`, which allows it to be serialized


Is this configuration that is only used during development? Is it something that goes away with self-assembling zones or is this actually part of making self-assembling zones work?

This is unfortunately not a development-only piece of metadata.

Here's my primer on "old vs new" zone management:

Old

All zone filesystems are provisioned in rpool/zone (the ramdisk)

Zones are not self-assembling, meaning that they need tweaking from the sled agent after booting

The list of zones under the purview of the sled are stored in a config file in /var/oxide (... unfortunately, this is also in the ramdisk)

New (not totally done in this PR, but we're heading this direction)

Zone filesystems are provisioned in /pool/ext/<UUID>/zone on particular U.2s

Zones are self-assembling, meaning that once they are created, they can be booted

NOTE: The sled agent may still need to supply some metadata prior to running the zone, such as creating VNICs!

The list of zones under the purview of the sled agent are stored in /pool/int/<UUID>/config on both M.2s

So this metadata...

Yeah, so this piece of metadata is here because we're breaking the previous standard that "every zone filesystem lives in /zone". By shuffling that around, the sled agent actually needs to know where to find zones to launch them.

I admittedly could store this information alongside the zone filesystem on the U.2, but by storing it within the config directory of the M.2, I figured we'd have a better ability for sleds to eventually notice / report "hey, I should launch a particular zone, but it hasn't come up yet because that disk is gone".

Thanks for the great explanation! Much appreciated.

andrewjstone · 2023-04-25T22:29:07Z

sled-agent/src/storage_manager.rs

+/// A sled-local view of all attached storage.
+#[derive(Clone)]
+pub struct StorageManager {
+    inner: Arc<StorageManagerInner>,


I guess this is part of your thoughts about merging the two above. As it stands I also find it unfortunate to share an inner resource of one manager with that of another. I'd love to collaborate on how we can separate the abstractions entirely or merge them, although I admit a preference for separation at this point.

smklein · 2023-04-25T22:36:47Z

Thanks for the review! One last thing that I need to figure out before merging, and which (I believe?) is breaking the deploy job -- this PR is perhaps a bit too excited about using the "dataset mountpoint" for zones (e.g., /pool/ext/<UUID>/zone). For non-gimlet systems using synthetic disks, this mountpoint is not actually being created, which is a problem.

andrewjstone · 2023-04-25T22:38:58Z

Thanks for the review! One last thing that I need to figure out before merging, and which (I believe?) is breaking the deploy job -- this PR is perhaps a bit too excited about using the "dataset mountpoint" for zones (e.g., /pool/ext/<UUID>/zone). For non-gimlet systems using synthetic disks, this mountpoint is not actually being created, which is a problem.

That makes sense. You are welcome!

smklein · 2023-04-26T03:37:21Z

@faithanalog - I'm adding you as an FYI here - if you're busy, you can skim through this, but figured you might be interested since you went through some of this code recently!

smklein force-pushed the m2-datasets branch from 2d5f137 to 160023e Compare April 24, 2023 18:14

smklein changed the title ~~M2 datasets~~ [sled-agent] Conform datasets to RFD 118 format, but still use ramdisks Apr 24, 2023

[sled-agent] Conform M.2s and U.2s to RFD 118 format - but still use …

6ca95b4

…ramdisk

smklein force-pushed the m2-datasets branch from 0cc513e to 6ca95b4 Compare April 24, 2023 19:18

smklein commented Apr 24, 2023

View reviewed changes

Remove not-in-use error

ce24dd2

smklein marked this pull request as ready for review April 24, 2023 19:54

smklein added 5 commits April 25, 2023 10:13

Merge branch 'main' into m2-datasets

2fde2d3

Don't silence errors

d16aa64

shift drop impl

b48a3ca

Less locking

b6ec7dc

continued locking improvements

2968a23

andrewjstone self-requested a review April 25, 2023 19:59

smklein added 2 commits April 25, 2023 17:49

Merge branch 'main' into m2-datasets

996353c

Rename helper function

e11d5e6

andrewjstone approved these changes Apr 25, 2023

View reviewed changes

Add support for synthetic disk zpool formatting

4a8238a

smklein requested a review from faithanalog April 26, 2023 03:36

Merge branch 'main' into m2-datasets

78079e6

smklein merged commit 789fac4 into main Apr 27, 2023

smklein deleted the m2-datasets branch April 27, 2023 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

smklein commented Apr 24, 2023 •

edited

Loading

smklein Apr 24, 2023 •

edited

Loading

smklein Apr 24, 2023

smklein Apr 24, 2023

smklein Apr 24, 2023

smklein Apr 24, 2023

smklein Apr 24, 2023

smklein Apr 24, 2023

andrewjstone Apr 25, 2023

smklein Apr 25, 2023

andrewjstone Apr 26, 2023

smklein Apr 24, 2023

andrewjstone left a comment

andrewjstone Apr 25, 2023

andrewjstone Apr 25, 2023

smklein Apr 25, 2023

andrewjstone Apr 26, 2023

andrewjstone Apr 25, 2023

smklein commented Apr 25, 2023

andrewjstone commented Apr 25, 2023

smklein commented Apr 26, 2023

		pub const ZONE_ZFS_RAMDISK_DATASET_MOUNTPOINT: &str = "/zone";
		pub const ZONE_ZFS_RAMDISK_DATASET: &str = "rpool/zone";

[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

[sled-agent] Conform datasets to RFD 118 format, but still use ramdisks #2919

Conversation

smklein commented Apr 24, 2023 • edited Loading

smklein Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Old

New (not totally done in this PR, but we're heading this direction)

So this metadata...

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Apr 25, 2023

andrewjstone commented Apr 25, 2023

smklein commented Apr 26, 2023

smklein commented Apr 24, 2023 •

edited

Loading

smklein Apr 24, 2023 •

edited

Loading