Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] add support for ingesting TUF repos #4690

Merged

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Dec 13, 2023

Implement basic support for uploading TUF repos via an endpoint. The PR looks pretty big but most of it is fairly mechanical addition and removal (and much of it has to be done in one go because of internal dependencies.)

Also include a few more changes:

  • Move more code to update-common.
  • Move the by_hash and by_id maps to update-common's UpdatePlanBuilder.
  • Remove old update-related code and database migrations that will be replaced by newer blueprint design. (This is the vast majority of deleted code, and ideally would be a separate PR except it's a bit inconvenient to have a PR stack with multiple schema migrations.)

This PR does not include actually storing TUF repos and replicating them among sleds. That work has been deprioritized for now, to instead focus on sled addition and removal.

TODO:

  • Finish hooking up endpoint.
  • DB migrations.
  • Tests.

Depends on #4659.

Depends on #4707.

Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5

[skip ci]
Created using spr 1.3.5
Created using spr 1.3.5

[skip ci]
Created using spr 1.3.5
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB structure looks good to me, couple comments but nothing blocking!

Comment on lines 71 to 72
/// This is equivalent to [`external::TufRepoMeta`], but is more
/// Diesel-friendly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome feedback on this, but the structure of "external" vs "db-model" objects are intentionally distinct. This lets us make database changes without breaking the API.

(They used to all be identical, but this meant that DB refactors -- even column re-ordering -- would result in a breaking API change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that makes sense.

Comment on lines +1890 to +1891
-- Identified by both a random uuid and its SHA256 hash. The hash could be the
-- primary key, but it seems unnecessarily large and unwieldy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and foremost , I think the choice to use a UUID PK is totally reasonable here. We do so for many other objects that still have unique constraints elsewhere.

That being said: I had thought that the authz macros were the spot where a non-UUID PK was the most painful, but it seems like you've been able to work around that.

Isn't the "real" PK basically the system_version? as in, for a single "system version", shouldn't we have a single TUF repo, or am I mis-understanding this relationship?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the real primary key is the system version. But I think I was a bit concerned about an arbitrary-length DB primary key.

-- Switch artifacts
'switch_sp',
'switch_rot'
CONSTRAINT unique_checksum UNIQUE (sha256),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must the checksum be unique? I don't think it's likely, but if we decided to publish a "new version, with no updated software", would the TUF repo have the same contents?

Or does this content also contain version info, so we'd always expect it to be unique, and this is just additional validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the content would also contain version info, yeah.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm -- yes, artifacts.json contains the system version. When that changes, the hash will also change.

sunshowers added a commit that referenced this pull request Jan 9, 2024
…x serialization (#4781)

This is prep work for #4690. I'd like to add support for making changes
to a DeserializedManifest, then serializing it.

However, it turned out that the bytesize crate does not serialize bytes
correctly (bytesize-rs/bytesize#40). To address
this, just use a u64 and write our own deserializer, using the alternative
parse-size crate.

Also add a test to ensure that serialization of the fake manifest
roundtrips correctly.
david-crespo and others added 2 commits January 16, 2024 13:40
Created using spr 1.3.5

[skip ci]
Created using spr 1.3.5
@sunshowers sunshowers changed the base branch from sunshowers/spr/main.wip-nexus-add-support-for-ingesting-tuf-repos to main January 16, 2024 21:40
@sunshowers sunshowers marked this pull request as draft January 16, 2024 22:08
sunshowers added a commit that referenced this pull request Jan 17, 2024
In #4690 I'd like to be able to generate a tufaceous repo twice and
expect that each artifact has the same hash.

Writing tests for this is a bit annoying at the moment because at the
moment the overall repo has a different hash when generated again
(because TUF bakes timestamps into its format), but each artifact has
the same hash. #4690 has other work going on which makes writing this
test quite simple, so I'll include the test there.
Created using spr 1.3.5
@sunshowers sunshowers changed the title [WIP] [nexus] add support for ingesting TUF repos [nexus] add support for ingesting TUF repos Jan 17, 2024
Created using spr 1.3.5
Created using spr 1.3.5
@sunshowers sunshowers marked this pull request as ready for review January 17, 2024 22:33

/// Test that the archive generated by running `tufaceous assemble` twice
/// has the same artifacts and hashes.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the promised test for #4823.

Created using spr 1.3.5
sunshowers added a commit that referenced this pull request Jan 18, 2024
In #4690 I was dealing with some of these paths, and conversion to and
fro is a bit annoying.

There is no loss of generality here because all of these paths are read
from a config file -- and are therefore expected to be UTF-8 anyway.
update-common/src/artifacts/update_plan.rs Outdated Show resolved Hide resolved
schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
CREATE INDEX IF NOT EXISTS lookup_deployment_by_creation on omicron.public.update_deployment (
time_created
PRIMARY KEY (
tuf_repo_id, tuf_artifact_name, tuf_artifact_version, tuf_artifact_kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me tempted to propose an surrogate UUID primary key for tuf_artifact. Happy to defer to other folks with more DB experience though. Is the pain of a surrogate key worth not having this relatively big (3xSTRING(63)) foreign key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this post it seems like the main thing to look for with natural keys is the ability to distribute well. In this case the first entry is a tuf_repo_id which is a UUID so this key should be okay.

@@ -127,7 +127,7 @@ impl UpdateManager {

let response = nexus
.cpapi_artifact_download(
nexus_client::types::KnownArtifactKind::ControlPlane,
&KnownArtifactKind::ControlPlane.to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't decide if the change from KnownArtifactKind to String here (and on the server side and in the db schema) is a good defense for future unknown artifact kinds, or us giving up some type safety unnecessarily, because we will have to be able to evolve a variety of HTTP endpoints that have enums (including adding new variants). If we're going the route of oxidecomputer/dropshot#869, could we use exclusively KnownArtifactKind and know that we will be able to add to that in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two parts to this:

  1. For the DB schema, I do think we must use arbitrary strings. In the case of an update, an old version of Nexus must be comfortable doing something reasonable with artifacts present in newer TUF repos. I don't really see a good way out of this.
  2. For the HTTP endpoints--assuming that the DB schema must be this way, is there any path to providing something like KnownArtifactKind at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry I'm just coming up to speed on this and I'm kind of thinking out loud!)

What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?

I assume the risks we're worried about (at least, the ones I'm worried about) are that using strings makes it super easy to (1) accidentally put an actually invalid value into the field (like control_plane_zone instead of control_plane) and (2) write code that operates on input that it doesn't correctly support but doesn't realize that. Using an enum is a nice way to represent for any hunk of code that it at least in principle understands what it's working with.

Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.

Another approach would be to store just a string in the database, but have the model type provide a Rust enum of known values. One option here would be to include an "Unknown" variant with the raw string value. But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string. Hopefully most consumers would use the first one, but in the rare cases where we need to be able to talk about the data we don't understand, we can do that.

I'm not sure -- just brainstorming here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgallagher:

I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".

I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.

There are probably ways to resolve it. For example, one thing that came to mind as I was writing this is that we could imagine shipping a "pre-ingest" schema update within the TUF repo -- and then the current version of Nexus is responsible for applying that schema update before processing the TUF repo. But that seems like it blows up the complexity of the process quite significantly, for a benefit that doesn't (at first) appear commensurate.

@davepacheco:

What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?

I think at the very least the old version of Nexus should store the artifact, so that if the new version of Nexus can use it somehow it'll know what to do with it.

One of the main consumers of artifacts is sled-agent -- a possible state where a new version of sled-agent hits an old version of Nexus, asking for an artifact that it (but not the old Nexus) knows about.

Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.

Hmm, what would the benefit be of doing this in persistent storage, versus doing it in code? We already have roughly this conversion between ArtifactKind and KnownArtifactKind.

But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string.

Right, this is roughly ArtifactKind vs KnownArtifactKind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.

We could require this to be split over two updates; the first update adds the new case(s) to the enum, and the second uses them. This seems similar to the plan for backwards-incompatible API changes, where the first update is still compatible with previous versions, and the second update is only compatible with the immediately-prior version (which brings the schema in line with what that second update expects to be there). But maybe this is more onerous than it's worth, unless we end up needing to split updates like this anyway.

I'm left with a kind of uneasy feeling here - if the right thing to do is use strings, I'm worried there are other places where we're using serde (especially outside of API interfaces for which we have a plan) that are going to be more painful than we realize. But I don't really disagree with any of your arguments, and I don't want to hold up this PR on an uneasy feeling, so I'm happy to move forward with what you have.

nexus/src/external_api/http_entrypoints.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/update.rs Outdated Show resolved Hide resolved
Created using spr 1.3.5
sunshowers added a commit that referenced this pull request Jan 19, 2024
Created using spr 1.3.5
@sunshowers
Copy link
Contributor Author

I've addressed most of the review comments here -- think I've covered everything!

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry for being a slowpoke on the review followup.

Created using spr 1.3.5
@sunshowers sunshowers enabled auto-merge (squash) January 24, 2024 21:27
@sunshowers
Copy link
Contributor Author

Ah crap, this is going to conflict with #4890.

Created using spr 1.3.5
@sunshowers sunshowers merged commit e69e6f6 into main Jan 25, 2024
22 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/wip-nexus-add-support-for-ingesting-tuf-repos branch January 25, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants