[nexus] add support for ingesting TUF repos #4690

sunshowers · 2023-12-13T23:41:16Z

Implement basic support for uploading TUF repos via an endpoint. The PR looks pretty big but most of it is fairly mechanical addition and removal (and much of it has to be done in one go because of internal dependencies.)

Also include a few more changes:

Move more code to update-common.
Move the by_hash and by_id maps to update-common's UpdatePlanBuilder.
Remove old update-related code and database migrations that will be replaced by newer blueprint design. (This is the vast majority of deleted code, and ideally would be a separate PR except it's a bit inconvenient to have a PR stack with multiple schema migrations.)

This PR does not include actually storing TUF repos and replicating them among sleds. That work has been deprioritized for now, to instead focus on sled addition and removal.

TODO:

Finish hooking up endpoint.
DB migrations.
Tests.

Depends on #4659.

Depends on #4707.

Created using spr 1.3.5

Created using spr 1.3.5 [skip ci]

Created using spr 1.3.5

Created using spr 1.3.5 [skip ci]

Created using spr 1.3.5

Created using spr 1.3.5 [skip ci]

Created using spr 1.3.5

smklein

DB structure looks good to me, couple comments but nothing blocking!

smklein · 2023-12-20T04:22:28Z

nexus/db-model/src/tuf_repo.rs

+/// This is equivalent to [`external::TufRepoMeta`], but is more
+/// Diesel-friendly.


I welcome feedback on this, but the structure of "external" vs "db-model" objects are intentionally distinct. This lets us make database changes without breaking the API.

(They used to all be identical, but this meant that DB refactors -- even column re-ordering -- would result in a breaking API change)

Ah yeah that makes sense.

smklein · 2023-12-20T04:26:18Z

schema/crdb/dbinit.sql

+-- Identified by both a random uuid and its SHA256 hash. The hash could be the
+-- primary key, but it seems unnecessarily large and unwieldy.


First and foremost , I think the choice to use a UUID PK is totally reasonable here. We do so for many other objects that still have unique constraints elsewhere.

That being said: I had thought that the authz macros were the spot where a non-UUID PK was the most painful, but it seems like you've been able to work around that.

Isn't the "real" PK basically the system_version? as in, for a single "system version", shouldn't we have a single TUF repo, or am I mis-understanding this relationship?

Yeah, the real primary key is the system version. But I think I was a bit concerned about an arbitrary-length DB primary key.

smklein · 2023-12-20T04:27:31Z

schema/crdb/dbinit.sql

-    -- Switch artifacts
-    'switch_sp',
-    'switch_rot'
+    CONSTRAINT unique_checksum UNIQUE (sha256),


Why must the checksum be unique? I don't think it's likely, but if we decided to publish a "new version, with no updated software", would the TUF repo have the same contents?

Or does this content also contain version info, so we'd always expect it to be unique, and this is just additional validation?

My understanding is that the content would also contain version info, yeah.

To confirm -- yes, artifacts.json contains the system version. When that changes, the hash will also change.

…x serialization (#4781) This is prep work for #4690. I'd like to add support for making changes to a DeserializedManifest, then serializing it. However, it turned out that the bytesize crate does not serialize bytes correctly (bytesize-rs/bytesize#40). To address this, just use a u64 and write our own deserializer, using the alternative parse-size crate. Also add a test to ensure that serialization of the fake manifest roundtrips correctly.

Created using spr 1.3.5 [skip ci]

Created using spr 1.3.5

In #4690 I'd like to be able to generate a tufaceous repo twice and expect that each artifact has the same hash. Writing tests for this is a bit annoying at the moment because at the moment the overall repo has a different hash when generated again (because TUF bakes timestamps into its format), but each artifact has the same hash. #4690 has other work going on which makes writing this test quite simple, so I'll include the test there.

Created using spr 1.3.5

sunshowers · 2024-01-17T23:11:25Z

update-common/src/artifacts/artifacts_with_plan.rs

+
+    /// Test that the archive generated by running `tufaceous assemble` twice
+    /// has the same artifacts and hashes.
+    #[tokio::test(flavor = "multi_thread", worker_threads = 2)]


This is the promised test for #4823.

Created using spr 1.3.5

In #4690 I was dealing with some of these paths, and conversion to and fro is a bit annoying. There is no loss of generality here because all of these paths are read from a config file -- and are therefore expected to be UTF-8 anyway.

update-common/src/artifacts/artifacts_with_plan.rs

update-common/src/artifacts/update_plan.rs

schema/crdb/dbinit.sql

jgallagher · 2024-01-18T15:14:49Z

schema/crdb/dbinit.sql

-CREATE INDEX IF NOT EXISTS lookup_deployment_by_creation on omicron.public.update_deployment (
-    time_created
+    PRIMARY KEY (
+        tuf_repo_id, tuf_artifact_name, tuf_artifact_version, tuf_artifact_kind


This makes me tempted to propose an surrogate UUID primary key for tuf_artifact. Happy to defer to other folks with more DB experience though. Is the pain of a surrogate key worth not having this relatively big (3xSTRING(63)) foreign key?

According to this post it seems like the main thing to look for with natural keys is the ability to distribute well. In this case the first entry is a tuf_repo_id which is a UUID so this key should be okay.

jgallagher · 2024-01-18T15:20:26Z

sled-agent/src/updates.rs

@@ -127,7 +127,7 @@ impl UpdateManager {

                let response = nexus
                    .cpapi_artifact_download(
-                        nexus_client::types::KnownArtifactKind::ControlPlane,
+                        &KnownArtifactKind::ControlPlane.to_string(),


I can't decide if the change from KnownArtifactKind to String here (and on the server side and in the db schema) is a good defense for future unknown artifact kinds, or us giving up some type safety unnecessarily, because we will have to be able to evolve a variety of HTTP endpoints that have enums (including adding new variants). If we're going the route of oxidecomputer/dropshot#869, could we use exclusively KnownArtifactKind and know that we will be able to add to that in the future?

There are two parts to this:

For the DB schema, I do think we must use arbitrary strings. In the case of an update, an old version of Nexus must be comfortable doing something reasonable with artifacts present in newer TUF repos. I don't really see a good way out of this.

For the HTTP endpoints--assuming that the DB schema must be this way, is there any path to providing something like KnownArtifactKind at all?

I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".

(Sorry I'm just coming up to speed on this and I'm kind of thinking out loud!)

What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?

I assume the risks we're worried about (at least, the ones I'm worried about) are that using strings makes it super easy to (1) accidentally put an actually invalid value into the field (like control_plane_zone instead of control_plane) and (2) write code that operates on input that it doesn't correctly support but doesn't realize that. Using an enum is a nice way to represent for any hunk of code that it at least in principle understands what it's working with.

Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.

Another approach would be to store just a string in the database, but have the model type provide a Rust enum of known values. One option here would be to include an "Unknown" variant with the raw string value. But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string. Hopefully most consumers would use the first one, but in the rare cases where we need to be able to talk about the data we don't understand, we can do that.

I'm not sure -- just brainstorming here.

@jgallagher:

I don't think I'm ready to concede that the DB schema has to use strings instead of enums. We use enums in the DB schema many places that may grow over time (metric producers, network interface kind, zone types, etc.), and we're going to have to be able to add variants to those. I don't know exactly how we're doing that yet, but I don't see why artifact kinds would be any different. If we require schema migrations to follow the same kind of rules as what we've talked about for API versioning, I assume it'd be something like "break into two steps: step one just updates the enum and must complete before step two uses it".

I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.

There are probably ways to resolve it. For example, one thing that came to mind as I was writing this is that we could imagine shipping a "pre-ingest" schema update within the TUF repo -- and then the current version of Nexus is responsible for applying that schema update before processing the TUF repo. But that seems like it blows up the complexity of the process quite significantly, for a benefit that doesn't (at first) appear commensurate.

@davepacheco:

What would an old version of Nexus be able to sensibly do with an artifact kind that it doesn't know about? It could distribute it, but how would it know who to distribute it to or when? Or is it that it can't do anything (yet), but just needs to store the metadata until the upgrade?

I think at the very least the old version of Nexus should store the artifact, so that if the new version of Nexus can use it somehow it'll know what to do with it.

One of the main consumers of artifacts is sled-agent -- a possible state where a new version of sled-agent hits an old version of Nexus, asking for an artifact that it (but not the old Nexus) knows about.

Would it help to store both a "raw" artifact kind as a string and an enum that includes an "unknown" variant? Use the enum where possible. But after any upgrade, after the schema update, go through and update the enum value for anything that's "unknown" based on the raw value. This is a little more work, but I think not much? and gives consumers of this information a way to use the well-typed thing.

Hmm, what would the benefit be of doing this in persistent storage, versus doing it in code? We already have roughly this conversion between ArtifactKind and KnownArtifactKind.

But maybe a better one is that we provide two model types for this table: one uses a Rust enum without the "Unknown" variant and the other provides the string.

Right, this is roughly ArtifactKind vs KnownArtifactKind.

I think one major difference is unlike all the other examples, artifact kinds are part of the update process. In your proposed model, step one to update the enum must somehow happen before the artifacts are ingested. This seems like a circular problem.

We could require this to be split over two updates; the first update adds the new case(s) to the enum, and the second uses them. This seems similar to the plan for backwards-incompatible API changes, where the first update is still compatible with previous versions, and the second update is only compatible with the immediately-prior version (which brings the schema in line with what that second update expects to be there). But maybe this is more onerous than it's worth, unless we end up needing to split updates like this anyway.

I'm left with a kind of uneasy feeling here - if the right thing to do is use strings, I'm worried there are other places where we're using serde (especially outside of API interfaces for which we have a plan) that are going to be more painful than we realize. But I don't really disagree with any of your arguments, and I don't want to hold up this PR on an uneasy feeling, so I'm happy to move forward with what you have.

nexus/src/external_api/http_entrypoints.rs

nexus/db-queries/src/db/datastore/update.rs

Created using spr 1.3.5

…4845) Pointed out by John in #4690.

Created using spr 1.3.5

sunshowers · 2024-01-23T21:31:46Z

I've addressed most of the review comments here -- think I've covered everything!

jgallagher

Thanks! Sorry for being a slowpoke on the review followup.

Created using spr 1.3.5

sunshowers · 2024-01-24T22:35:56Z

Ah crap, this is going to conflict with #4890.

Created using spr 1.3.5

sunshowers added 7 commits December 13, 2023 15:41

[𝘀𝗽𝗿] initial version

6814fef

Created using spr 1.3.5

[𝘀𝗽𝗿] changes to main this commit is based on

99ac94f

Created using spr 1.3.5 [skip ci]

more progress

34c01eb

Created using spr 1.3.5

[𝘀𝗽𝗿] changes introduced through rebase

d2f866f

Created using spr 1.3.5 [skip ci]

Ready for preliminary review

c6f3ea2

Created using spr 1.3.5

[𝘀𝗽𝗿] changes introduced through rebase

711dd7d

Created using spr 1.3.5 [skip ci]

Fix doc issues

8a9325c

Created using spr 1.3.5

sunshowers requested review from smklein and jgallagher December 16, 2023 00:10

smklein reviewed Dec 20, 2023

View reviewed changes

sunshowers mentioned this pull request Jan 8, 2024

[tufaceous-lib] make DeserializedManifest public and serializable, fix serialization #4781

Merged

david-crespo and others added 2 commits January 16, 2024 13:40

[𝘀𝗽𝗿] changes introduced through rebase

4fb8d7b

Created using spr 1.3.5 [skip ci]

Rebase

790cb96

Created using spr 1.3.5

sunshowers changed the base branch from sunshowers/spr/main.wip-nexus-add-support-for-ingesting-tuf-repos to main January 16, 2024 21:40

sunshowers marked this pull request as draft January 16, 2024 22:08

sunshowers mentioned this pull request Jan 17, 2024

[tufaceous-lib] add fake artifacts with an mtime of zero #4823

Merged

Ready?

51c64a9

Created using spr 1.3.5

sunshowers changed the title ~~[WIP] [nexus] add support for ingesting TUF repos~~ [nexus] add support for ingesting TUF repos Jan 17, 2024

sunshowers added 2 commits January 17, 2024 14:28

Fix tests

c012765

Created using spr 1.3.5

Fix clippy

b1e183c

Created using spr 1.3.5

sunshowers marked this pull request as ready for review January 17, 2024 22:33

sunshowers requested review from smklein and andrewjstone January 17, 2024 22:33

sunshowers mentioned this pull request Jan 17, 2024

[common] switch nexus config to using camino paths #4831

Merged

sunshowers commented Jan 17, 2024

View reviewed changes

ugh

a539a19

Created using spr 1.3.5

jgallagher reviewed Jan 18, 2024

View reviewed changes

sunshowers mentioned this pull request Jan 19, 2024

[update-common] move unzip_into_tempdir onto the blocking task pool #4845

Merged

Address most review comments

1808fbd

Created using spr 1.3.5

sunshowers added a commit that referenced this pull request Jan 19, 2024

[update-common] move unzip_into_tempdir onto the blocking task pool (#…

294b878

…4845) Pointed out by John in #4690.

Rebase + fix tests

823e943

Created using spr 1.3.5

sunshowers requested review from jgallagher and davepacheco January 23, 2024 21:31

davepacheco assigned sunshowers Jan 24, 2024

sunshowers mentioned this pull request Jan 24, 2024

Nexus datastore: combine update_available_artifact_{upsert,hard_delete_outdated} #766

Closed

jgallagher approved these changes Jan 24, 2024

View reviewed changes

Rebase + add a comment

7bb39e4

Created using spr 1.3.5

sunshowers enabled auto-merge (squash) January 24, 2024 21:27

Rebase...

de24cca

Created using spr 1.3.5

sunshowers merged commit e69e6f6 into main Jan 25, 2024
22 checks passed

sunshowers deleted the sunshowers/spr/wip-nexus-add-support-for-ingesting-tuf-repos branch January 25, 2024 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nexus] add support for ingesting TUF repos #4690

[nexus] add support for ingesting TUF repos #4690

sunshowers commented Dec 13, 2023 •

edited

Loading

smklein left a comment

smklein Dec 20, 2023

sunshowers Jan 3, 2024

smklein Dec 20, 2023

sunshowers Jan 3, 2024

smklein Dec 20, 2023

sunshowers Jan 3, 2024

sunshowers Jan 17, 2024

sunshowers Jan 17, 2024

jgallagher Jan 18, 2024

sunshowers Jan 19, 2024

jgallagher Jan 18, 2024

sunshowers Jan 19, 2024

jgallagher Jan 19, 2024

davepacheco Jan 19, 2024

sunshowers Jan 19, 2024

jgallagher Jan 24, 2024

sunshowers commented Jan 23, 2024

jgallagher left a comment

sunshowers commented Jan 24, 2024

		/// This is equivalent to [`external::TufRepoMeta`], but is more
		/// Diesel-friendly.

		-- Identified by both a random uuid and its SHA256 hash. The hash could be the
		-- primary key, but it seems unnecessarily large and unwieldy.

[nexus] add support for ingesting TUF repos #4690

[nexus] add support for ingesting TUF repos #4690

Conversation

sunshowers commented Dec 13, 2023 • edited Loading

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunshowers commented Jan 23, 2024

jgallagher left a comment

Choose a reason for hiding this comment

sunshowers commented Jan 24, 2024

sunshowers commented Dec 13, 2023 •

edited

Loading