-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instance specs could be improved #735
Comments
All of the recommendations described here seem good to me! |
Another improvement for the pile: Footnotes
|
The problem(s)
The instance spec types in the
propolis_api_types
crate are an important conceptual part of the live migration protocol, but there are several things about the way I implemented them that leave things to be desired:It's easy to construct an instance spec that type-checks but can't be used to start an instance. For example, it's trivial to have a spec with a storage device with no valid backend.
Instance specs are composed of device specs and backend specs, each of which are further divided into their own fields and sub-collections. This makes adding new components while respecting the spec versioning rules something of a hassle (and the extra structure doesn't even guarantee the specs are valid!).
Live migration assumes that all (migratory) VM components have names. This allows Propolis to send a source component's saved state to the correct counterpart on the target. While many spec components are stored in a
HashMap<String, Component>
, some are not, so Propolis has to generate names for them, which then have to be kept consistent between Propolis versions.One lives in
propolis_api_types
.propolis-server
then wraps that in its own builder, which can also convert config TOML entries andInstanceEnsureRequest
components to spec components. Yet a third builder lives inpropolis-client
; it's basically the same as the builder inpropolis_api_types
, except that it operates on Progenitor-generated types instead of the raw types.propolis_api_types
contains a lot more than just typesFor a crate that's ostensibly just a bag of types, this crate has a lot of code, most of which is only ever used by Propolis servers. Other Propolis components that want Just The Types, Please shouldn't need to pull in all the server-only methods.
The SoftNPU component definitions in
propolis_api_types
are behind Falcon feature guards. This forces us to maintain two separate OpenAPI documents, one with Falcon enabled and one without, which can lead to annoying rake-steps in CI if one forgets to check the build for one of the two variants. (In fact, as I write this, PHD builds are broken when Falcon is enabled because of a missingpatch
directive in the Falcon client generator...)Proposed solutions
I propose the following changes to fix these problems:
Spec
type for propolis-serverThis type is the VM component manifest type that Propolis server uses internally (in particular during VM setup). This spec isn't
Serialize
, isn't part of the API, and isn't used in the live migration protocol. Because this spec can be structured however we like without concern for versioning it, it can (and will) be structured so that illegal states are unrepresentable.propolis_api_types::v0::InstanceSpecV0
Instead of a complex hierarchy of component types, the "on the wire" spec used in the API and LM protocol is just a mainboard and a
HashMap<String, Component>
. Adding a new component type is very easy: just define a struct for it and add it toenum Component
. Because all components are now in a map, they all have names.propolis-server implements
From<InternalSpec> for VersionedInstanceSpec>
andTryFrom<VersionedInstanceSpec> for InternalSpec
and converts to/from its internal type at the API and LM protocol boundaries. TheTryFrom
impl gives us a "parse, don't validate" discipline for incoming API specs.propolis-server understands how to convert config TOML elements and
InstanceEnsureRequest
fields into spec elements and feeds them into a singleInternalSpec
builder. TheTryFrom<VersionedInstanceSpec>
impl also operates in terms of this builder. Thepropolis_api_types
builder andpropolis_client
builders are removed. The only user of the latter (PHD) builds API specs simply by adding components to a new V0 spec's map. (This plan assumes there won't eventually be other Propolis clients who want to initialize VMs using raw specs. I had originally planned for the control plane to do this, but have changed my mind and now think it should continue using the existing instance-ensure API. See the headnote to RFD 344 for more details.)The migration compatibility traits that live in
propolis_api_types
are only useful in the LM protocol. They should live in propolis-server instead (which can then implement them in terms of its internal spec representation, if it's convenient to do so).#[cfg(feature)]
guards from API typesInstead of compiling components out and having different clients built with different feature sets, there's just one client that defines all possible component types. If a server doesn't support a component a client has asked for, it rejects the spec (once again made easier by having an internal spec type to parse into--if a feature is compiled out of the server, then the internal type won't have fields for its components).
I have drafted a set of changes that achieves these goals and am pretty happy with how it looks; indeed, I've mostly filed this issue to have a wall of text to link the PRs to. I'll start putting those up for review presently.
Doing this now lays the groundwork for adding CPUID information to instance specs and initializing migration targets from sources' specifications. (It's also much easier to correct foundational mistakes like these while the API and migration protocol can still change without strictly requiring new versions of everything.)
The text was updated successfully, but these errors were encountered: