Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sled Agent must manage durable storage for configs, zones, explicitly #2888

Closed
6 of 10 tasks
smklein opened this issue Apr 20, 2023 · 3 comments
Closed
6 of 10 tasks

Sled Agent must manage durable storage for configs, zones, explicitly #2888

smklein opened this issue Apr 20, 2023 · 3 comments
Assignees
Labels
Sled Agent Related to the Per-Sled Configuration and Management storage Related to storage.

Comments

@smklein
Copy link
Collaborator

smklein commented Apr 20, 2023

See also: RFD 118

Sled Agent currently configures a few pieces of data outside datasets explicitly allocated in pools:

  • /var/oxide contains a variety of configuration information, including...
    • RSS setup information
    • The "Sled Request" used to launch the sled-agent (including underlay information)
    • A list of "All services which should be launched on this sled"
    • A list of "All services which should be launched on u.2 zpools"
  • /zone contains the "filesystem for all zones"
  • /opt/oxide contains all the "latest installed system images", and is used to update control plane software which exists outside the ramdisk.

Q: So, why is this bad?
A: All those paths are currently backed by a ramdisk -- specifically rpool -- on gimlets.

This means that when we reboot, a significant portion of the necessary configuration information to launch the sled will be lost. Furthermore, for the zonepath filesystems, a significant portion of user RAM will be dedicated to zone-based filesystems, which we'd prefer to distribute to disk-backed file storage.

Here's a list of some of the work we need to accomplish to mitigate this in a production environment:

@smklein smklein added this to the MVP milestone Apr 20, 2023
@smklein smklein added storage Related to storage. Sled Agent Related to the Per-Sled Configuration and Management labels Apr 20, 2023
@smklein smklein self-assigned this Apr 20, 2023
smklein added a commit that referenced this issue Apr 27, 2023
…ks (#2919)

This PR is split off of
#2902 , which also tries to
*use* those datasets,
and attempts to make a subset of that change with a smaller diff.

This PR:
- Formats U.2 and M.2s with expected datasets
- Adds `/pool/ext/<UUID>/zone`, and places storage-based zone
filesystems there
- Adds "oxi_" internal zpools to the "virtual hardware" scripts,
emulating M.2s.

Part of #2888
smklein added a commit that referenced this issue May 1, 2023
## History

The Sled Agent has historically had two different "managers" responsible
for Zones:

1. `ServiceManager`, which resided over zones that do not operate on
Datasets
2. `StorageManager`, which manages disks, but also manages zones which
operate on those disks

This separation is even reflected in the sled agent API exposed to Nexus
- the Sled Agent exposes:

- `PUT /services`
- `PUT /filesystem`

For "add a service (within a zone) to this sled" vs "add a dataset (and
corresponding zone) to this sled within a particular zpool".

This has been kinda handy for Nexus, since "provision CRDB on this
dataset" and "start the CRDB service on that dataset" don't need to be
separate operations. Within the Sled Agent, however, it has been a
pain-in-the-butt from a perspective of diverging implementations. The
`StorageManager` and `ServiceManager` have evolved their own mechanisms
for storing configs, identifying filesystems on which to place zpools,
etc, even though their responsibilities (managing running zones) overlap
quite a lot.

## This PR

This PR migrates the responsibility for "service management" entirely
into the `ServiceManager`, leaving the `StorageManager` responsible for
monitoring disks.

In detail, this means:

- The responsibility for launching Clickhouse, CRDB, and Crucible zones
has moved from `storage_manager.rs` into `services.rs`
- Unfortunately, this also means we're taking a somewhat hacky approach
to formatting CRDB. This is fixed in
#2954.
- The `StorageManager` no longer requires an Etherstub device during
construction
- The `ServiceZoneRequest` can operate on an optional `dataset` argument
- The "config management" for datastore-based zones is now much more
aligned with non-dataset zones. Each sled stores
`/var/oxide/services.toml` and `/var/oxide/storage-services.toml` for
each group.
- These still need to be fixed with
#2888 , but it should be
simpler now.
- `filesystem_ensure` - which previously asked the `StorageManager` to
format a dataset and also launch a zone - now asks the `StorageManager`
to format a dataset, and separately asks the `ServiceManager` to launch
a zone.
- In the future, this may become vectorized ("ensure the sled has *all*
the datasets we want...") to have parity with the service management,
but this would require a more invasive change in Nexus.
@askfongjojo askfongjojo modified the milestones: MVP, FCS May 11, 2023
@andrewjstone
Copy link
Contributor

I just remembered that #3007 adds some support for persisting zone filesystems under crypt/zone.

This code has been ready to go for a few weeks now and has been tested numerous times. I've done cold boot testing on paris with it, but since there is no persistent install of software on the M.2s like dogfood, I have to re-run omicron-package install after reboot to rediscover and mount the U.2 encrypted datasets. The PR has the details about this. While I'd like to test with a proper install on dogfood I"m at about the point where I'd like to just merge this and see what happens.

@morlandi7 morlandi7 modified the milestones: FCS, 1.0.1 Jul 28, 2023
@morlandi7 morlandi7 modified the milestones: 1.0.1, 1.0.2 Aug 15, 2023
@smklein smklein modified the milestones: 1.0.2, 1.0.3 Aug 22, 2023
@morlandi7 morlandi7 modified the milestones: 1.0.3, 3, 4 Oct 2, 2023
@morlandi7 morlandi7 modified the milestones: 4, 5 Nov 14, 2023
@smklein smklein modified the milestones: 5, 6 Nov 28, 2023
@smklein smklein modified the milestones: 6, 7 Jan 23, 2024
@smklein
Copy link
Collaborator Author

smklein commented Jan 23, 2024

FYI: I've been punting this a bit, because we clear these zones out on reboot, and the system is functional with self-managed storage of these zones. It's possible to put Nexus more explicitly in control of this "zone filesystem management", but not urgent.

@morlandi7 morlandi7 modified the milestones: 7, 8 Apr 2, 2024
@morlandi7 morlandi7 modified the milestones: 8, 9 May 13, 2024
@morlandi7 morlandi7 modified the milestones: 9, 10 Jun 27, 2024
@morlandi7 morlandi7 modified the milestones: 10, 11 Aug 14, 2024
@morlandi7 morlandi7 modified the milestones: 11, 12 Oct 11, 2024
@morlandi7 morlandi7 removed this from the 12 milestone Nov 22, 2024
@smklein
Copy link
Collaborator Author

smklein commented Nov 22, 2024

Going to mark this as largely "done", as nexus is largely in charge of zone dataset allocation. The other sub-issues that aren't complete can remain open and be tracked separately.

@smklein smklein closed this as completed Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sled Agent Related to the Per-Sled Configuration and Management storage Related to storage.
Projects
None yet
Development

No branches or pull requests

4 participants