-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] add support for ingesting TUF repos #4690
[nexus] add support for ingesting TUF repos #4690
Conversation
Created using spr 1.3.5
Created using spr 1.3.5 [skip ci]
Created using spr 1.3.5
Created using spr 1.3.5 [skip ci]
Created using spr 1.3.5
Created using spr 1.3.5 [skip ci]
Created using spr 1.3.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB structure looks good to me, couple comments but nothing blocking!
nexus/db-model/src/tuf_repo.rs
Outdated
/// This is equivalent to [`external::TufRepoMeta`], but is more | ||
/// Diesel-friendly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I welcome feedback on this, but the structure of "external" vs "db-model" objects are intentionally distinct. This lets us make database changes without breaking the API.
(They used to all be identical, but this meant that DB refactors -- even column re-ordering -- would result in a breaking API change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that makes sense.
-- Identified by both a random uuid and its SHA256 hash. The hash could be the | ||
-- primary key, but it seems unnecessarily large and unwieldy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First and foremost , I think the choice to use a UUID PK is totally reasonable here. We do so for many other objects that still have unique constraints elsewhere.
That being said: I had thought that the authz macros were the spot where a non-UUID PK was the most painful, but it seems like you've been able to work around that.
Isn't the "real" PK basically the system_version? as in, for a single "system version", shouldn't we have a single TUF repo, or am I mis-understanding this relationship?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the real primary key is the system version. But I think I was a bit concerned about an arbitrary-length DB primary key.
-- Switch artifacts | ||
'switch_sp', | ||
'switch_rot' | ||
CONSTRAINT unique_checksum UNIQUE (sha256), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why must the checksum be unique? I don't think it's likely, but if we decided to publish a "new version, with no updated software", would the TUF repo have the same contents?
Or does this content also contain version info, so we'd always expect it to be unique, and this is just additional validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the content would also contain version info, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm -- yes, artifacts.json
contains the system version. When that changes, the hash will also change.
…x serialization (#4781) This is prep work for #4690. I'd like to add support for making changes to a DeserializedManifest, then serializing it. However, it turned out that the bytesize crate does not serialize bytes correctly (bytesize-rs/bytesize#40). To address this, just use a u64 and write our own deserializer, using the alternative parse-size crate. Also add a test to ensure that serialization of the fake manifest roundtrips correctly.
Created using spr 1.3.5 [skip ci]
In #4690 I'd like to be able to generate a tufaceous repo twice and expect that each artifact has the same hash. Writing tests for this is a bit annoying at the moment because at the moment the overall repo has a different hash when generated again (because TUF bakes timestamps into its format), but each artifact has the same hash. #4690 has other work going on which makes writing this test quite simple, so I'll include the test there.
Created using spr 1.3.5
|
||
/// Test that the archive generated by running `tufaceous assemble` twice | ||
/// has the same artifacts and hashes. | ||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the promised test for #4823.
In #4690 I was dealing with some of these paths, and conversion to and fro is a bit annoying. There is no loss of generality here because all of these paths are read from a config file -- and are therefore expected to be UTF-8 anyway.
CREATE INDEX IF NOT EXISTS lookup_deployment_by_creation on omicron.public.update_deployment ( | ||
time_created | ||
PRIMARY KEY ( | ||
tuf_repo_id, tuf_artifact_name, tuf_artifact_version, tuf_artifact_kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me tempted to propose an surrogate UUID primary key for tuf_artifact
. Happy to defer to other folks with more DB experience though. Is the pain of a surrogate key worth not having this relatively big (3xSTRING(63)
) foreign key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this post it seems like the main thing to look for with natural keys is the ability to distribute well. In this case the first entry is a tuf_repo_id
which is a UUID so this key should be okay.
@@ -127,7 +127,7 @@ impl UpdateManager { | |||
|
|||
let response = nexus | |||
.cpapi_artifact_download( | |||
nexus_client::types::KnownArtifactKind::ControlPlane, | |||
&KnownArtifactKind::ControlPlane.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't decide if the change from KnownArtifactKind
to String
here (and on the server side and in the db schema) is a good defense for future unknown artifact kinds, or us giving up some type safety unnecessarily, because we will have to be able to evolve a variety of HTTP endpoints that have enums (including adding new variants). If we're going the route of oxidecomputer/dropshot#869, could we use exclusively KnownArtifactKind
and know that we will be able to add to that in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parts to this:
- For the DB schema, I do think we must use arbitrary strings. In the case of an update, an old version of Nexus must be comfortable doing something reasonable with artifacts present in newer TUF repos. I don't really see a good way out of this.
- For the HTTP endpoints--assuming that the DB schema must be this way, is there any path to providing something like KnownArtifactKind at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry I'm just coming up to speed on this and I'm kind of thinking out loud!)
What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?
I assume the risks we're worried about (at least, the ones I'm worried about) are that using strings makes it super easy to (1) accidentally put an actually invalid value into the field (like control_plane_zone
instead of control_plane
) and (2) write code that operates on input that it doesn't correctly support but doesn't realize that. Using an enum is a nice way to represent for any hunk of code that it at least in principle understands what it's working with.
Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.
Another approach would be to store just a string in the database, but have the model type provide a Rust enum of known values. One option here would be to include an "Unknown" variant with the raw string value. But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string. Hopefully most consumers would use the first one, but in the rare cases where we need to be able to talk about the data we don't understand, we can do that.
I'm not sure -- just brainstorming here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".
I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.
There are probably ways to resolve it. For example, one thing that came to mind as I was writing this is that we could imagine shipping a "pre-ingest" schema update within the TUF repo -- and then the current version of Nexus is responsible for applying that schema update before processing the TUF repo. But that seems like it blows up the complexity of the process quite significantly, for a benefit that doesn't (at first) appear commensurate.
What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?
I think at the very least the old version of Nexus should store the artifact, so that if the new version of Nexus can use it somehow it'll know what to do with it.
One of the main consumers of artifacts is sled-agent -- a possible state where a new version of sled-agent hits an old version of Nexus, asking for an artifact that it (but not the old Nexus) knows about.
Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.
Hmm, what would the benefit be of doing this in persistent storage, versus doing it in code? We already have roughly this conversion between ArtifactKind
and KnownArtifactKind
.
But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string.
Right, this is roughly ArtifactKind
vs KnownArtifactKind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.
We could require this to be split over two updates; the first update adds the new case(s) to the enum, and the second uses them. This seems similar to the plan for backwards-incompatible API changes, where the first update is still compatible with previous versions, and the second update is only compatible with the immediately-prior version (which brings the schema in line with what that second update expects to be there). But maybe this is more onerous than it's worth, unless we end up needing to split updates like this anyway.
I'm left with a kind of uneasy feeling here - if the right thing to do is use strings, I'm worried there are other places where we're using serde (especially outside of API interfaces for which we have a plan) that are going to be more painful than we realize. But I don't really disagree with any of your arguments, and I don't want to hold up this PR on an uneasy feeling, so I'm happy to move forward with what you have.
Created using spr 1.3.5
Created using spr 1.3.5
I've addressed most of the review comments here -- think I've covered everything! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry for being a slowpoke on the review followup.
Created using spr 1.3.5
Ah crap, this is going to conflict with #4890. |
Implement basic support for uploading TUF repos via an endpoint. The PR looks pretty big but most of it is fairly mechanical addition and removal (and much of it has to be done in one go because of internal dependencies.)
Also include a few more changes:
by_hash
andby_id
maps to update-common'sUpdatePlanBuilder
.This PR does not include actually storing TUF repos and replicating them among sleds. That work has been deprioritized for now, to instead focus on sled addition and removal.
TODO:
Depends on #4659.
Depends on #4707.