-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sled Agent must manage durable storage for configs, zones, explicitly #2888
Comments
…ks (#2919) This PR is split off of #2902 , which also tries to *use* those datasets, and attempts to make a subset of that change with a smaller diff. This PR: - Formats U.2 and M.2s with expected datasets - Adds `/pool/ext/<UUID>/zone`, and places storage-based zone filesystems there - Adds "oxi_" internal zpools to the "virtual hardware" scripts, emulating M.2s. Part of #2888
## History The Sled Agent has historically had two different "managers" responsible for Zones: 1. `ServiceManager`, which resided over zones that do not operate on Datasets 2. `StorageManager`, which manages disks, but also manages zones which operate on those disks This separation is even reflected in the sled agent API exposed to Nexus - the Sled Agent exposes: - `PUT /services` - `PUT /filesystem` For "add a service (within a zone) to this sled" vs "add a dataset (and corresponding zone) to this sled within a particular zpool". This has been kinda handy for Nexus, since "provision CRDB on this dataset" and "start the CRDB service on that dataset" don't need to be separate operations. Within the Sled Agent, however, it has been a pain-in-the-butt from a perspective of diverging implementations. The `StorageManager` and `ServiceManager` have evolved their own mechanisms for storing configs, identifying filesystems on which to place zpools, etc, even though their responsibilities (managing running zones) overlap quite a lot. ## This PR This PR migrates the responsibility for "service management" entirely into the `ServiceManager`, leaving the `StorageManager` responsible for monitoring disks. In detail, this means: - The responsibility for launching Clickhouse, CRDB, and Crucible zones has moved from `storage_manager.rs` into `services.rs` - Unfortunately, this also means we're taking a somewhat hacky approach to formatting CRDB. This is fixed in #2954. - The `StorageManager` no longer requires an Etherstub device during construction - The `ServiceZoneRequest` can operate on an optional `dataset` argument - The "config management" for datastore-based zones is now much more aligned with non-dataset zones. Each sled stores `/var/oxide/services.toml` and `/var/oxide/storage-services.toml` for each group. - These still need to be fixed with #2888 , but it should be simpler now. - `filesystem_ensure` - which previously asked the `StorageManager` to format a dataset and also launch a zone - now asks the `StorageManager` to format a dataset, and separately asks the `ServiceManager` to launch a zone. - In the future, this may become vectorized ("ensure the sled has *all* the datasets we want...") to have parity with the service management, but this would require a more invasive change in Nexus.
I just remembered that #3007 adds some support for persisting zone filesystems under crypt/zone. This code has been ready to go for a few weeks now and has been tested numerous times. I've done cold boot testing on paris with it, but since there is no persistent install of software on the M.2s like dogfood, I have to re-run omicron-package install after reboot to rediscover and mount the U.2 encrypted datasets. The PR has the details about this. While I'd like to test with a proper install on dogfood I"m at about the point where I'd like to just merge this and see what happens. |
FYI: I've been punting this a bit, because we clear these zones out on reboot, and the system is functional with self-managed storage of these zones. It's possible to put Nexus more explicitly in control of this "zone filesystem management", but not urgent. |
Going to mark this as largely "done", as nexus is largely in charge of zone dataset allocation. The other sub-issues that aren't complete can remain open and be tracked separately. |
See also: RFD 118
Sled Agent currently configures a few pieces of data outside datasets explicitly allocated in pools:
/var/oxide
contains a variety of configuration information, including.../zone
contains the "filesystem for all zones"/opt/oxide
contains all the "latest installed system images", and is used to update control plane software which exists outside the ramdisk.Q: So, why is this bad?
A: All those paths are currently backed by a ramdisk -- specifically
rpool
-- on gimlets.This means that when we reboot, a significant portion of the necessary configuration information to launch the sled will be lost. Furthermore, for the
zonepath
filesystems, a significant portion of user RAM will be dedicated to zone-based filesystems, which we'd prefer to distribute to disk-backed file storage.Here's a list of some of the work we need to accomplish to mitigate this in a production environment:
zonepath
s from U.2s/opt/oxide
into/pool/int/<UUID>/install
. #2971The text was updated successfully, but these errors were encountered: