set up Recovery Silo and user during rack initialization #2943

davepacheco · 2023-04-27T23:06:08Z

This change plumbs configuration for the initial (Recovery) Silo (see RFD 234), the initial (privileged) user in that Silo, and the delegated DNS name for external DNS all the way from RSS to Nexus. Nexus now creates that SIlo, the user, grants the user Fleet Admin privileges, etc.

Fixes #2304.

…s more tests

smklein

Looks great, thanks for adding this!

smklein · 2023-05-02T16:32:02Z

nexus/db-queries/src/db/datastore/silo.rs

+        let conn = self.pool_authorized(opctx).await?;
+        self.silo_create_conn(
+            conn,
+            opctx,


I know this isn't totally new in this PR, but what's the justification for separately passing an opctx and a group_opctx?

I've always considered OpContext a representation of "who called this function, and with what authn/authz", so seeing two of them in a single function is a little jarring. After all, isn't only one "caller" asking for the silo to be created?

(I think this has something to do with "automatically giving admin access to a particular group within a newly created silo", I guess I'm just a little surprised to see that take the form of an OpContext, since I would have suspected that this group name would just be a "name" parameter. I think passing an OpContext may be totally fine here, but I think it's probably worth clarifying the distinction between two parameters to a function that happen to have the same type)

Yeah, the problem here is that you have two different actors performing this operation together: (1) the end user creating the Silo, and (2) Nexus itself, acting on behalf of that same user. The reason is that while the person is probably a Fleet Admin, they do not generally have the privilege to create a group inside the Silo (see #1580, #1681), and they certainly don't have permission to read and modify the fleet-wide DNS configuration. But we do want them to be able to do create a group and read/modify the DNS configuration in this specific case that is highly controlled. This is similar to the behavior that during provisioning, Nexus needs access to the list of Sleds in the system, but the user doing the provisioning doesn't have permission to see that. In both cases, Nexus assumes an identity as itself to do those operations. This one's a little more complicated because we need both OpContexts. We can't just use Nexus's because we want to use the user's OpContext to decide whether they're allowed to do other stuff as part of this operation, like create the Silo itself.

There's a comment about this in the caller:

omicron/nexus/src/app/silo.rs

Lines 85 to 88 in 2317079

// Silo group creation happens as Nexus's "external authn" context,

// not the user's context here. The user may not have permission to

// create arbitrary groups in the Silo, but we allow them to create

// this one in this case.

I will update that comment to talk about DNS and add some comments here to explain this.

Fixed in the latest commit.

smklein · 2023-05-02T16:39:38Z

nexus/db-queries/src/db/datastore/rack.rs

-            HashMap::new(),
-        )
-    }
+    /// returns a


Fixed in the latest commit.

smklein · 2023-05-02T16:41:49Z

nexus/db-queries/src/db/datastore/rack.rs

+        // It should *not* show up in the list because it's not discoverable.
+        assert_eq!(silos.len(), 0);
+        let (authz_silo, db_silo) = LookupPath::new(&opctx, &datastore)
+            .silo_name(&nexus_db_model::Name(
+                rack_init.recovery_silo.identity.name.clone(),
+            ))
+            .fetch()
+            .await
+            .expect("Failed to lookup Silo");
+        assert!(!db_silo.discoverable);


Do we confirm that it's not possible for an operator to create a new silo with the same name as the recovery silo?

Only by way of existing tests that verify that you cannot create a Silo with a name that conflicts with an existing Silo. I could add an explicit test here.

Yeah, I basically just wanted to confirm that all the otherwise-valid CRUD operations are unable to touch this one.

I filed #2985 for part of this. I'm also adding a test that we cannot re-use the name while it's in use. (Turns out there was a bug here where we would fail on a 500 because we were not handling the database error correctly. This is not specific to the recovery Silo.)

Test added in the latest commit.

nexus/passwords/src/lib.rs

smklein · 2023-05-02T16:50:21Z

nexus/test-utils/src/lib.rs

+    let recovery_silo = RecoverySiloConfig {
+        silo_name: silo_name.clone(),
+        user_name: user_name.clone(),
+        // The test suite's password is "oxide".  This password is only used by


Suppose we need to change this value, maybe because we're using a new hash function. How would I go about updating this value?

I noticed the string "oxide" hard-coded in some tests, so this seems a little subtle to change.

Good question. What belongs here is a password hash in PHC format using algorithm argon2id and parameters at least as strong as we currently use to generate password hashes for new users. The API documentation for this type explains this. It's a little vague about what "secure enough parameters" means because (1) I didn't want to encode that information in multiple places and have it go out of date, (2) it's an internal API anyway, and (3) if you get it wrong, the failure mode is very explicit (e.g.,

omicron/nexus/passwords/src/lib.rs

Lines 548 to 549 in 2317079

"password hash: parameter 'm': expected at least 98304 (KiB), \

found 4096"

).

As you pointed out, it also needs to be consistent with any places that use the password. There shouldn't be many of these. Presumably tests will fail (somewhat explicitly -- with a login failure) if that happens. There are a few different places that use the same password (e.g., omicron-dev run-all), but that's only for convenience. If you changed this one and "forgot" to update that one, that's fine.

Given these constraints, you can construct this hash with any tool that can generate Argon2 hashes. I use argon2.online for test stuff like this.

Background: I started from the position that it's safer to transfer the password hash in the APIs (from Wicket -> RSS and from RSS -> Nexus) rather than the password itself. That said, for the test suite and omicron-dev run-all, we could define a Rust constant which is the password itself, and then have us hash that and put that into the API calls. That might require more things outside of Nexus to depend on nexus-passwords, but maybe that's okay, and probably has to happen eventually if Wicket is to hash whatever password the user provides.

I'll think a bit more about whether to commonize some of the password and hash uses in the test suite, run-all, and sim sled agent. (I would probably not commonize these with the two config-rss.toml files, even though they use the same password, because (1) nothing depends on these being the same as each other or anything else, and (2) these are supposed to be exactly what would come in via RSS, so I don't want to put a cleartext password there, and (3) I don't think there's a way to refer to a Rust constant in those.

Updated in the latest commit. With each hardcoded hash I added a comment explaining what depends on it and referring people to the relevant docs. I also had the test suite itself encode the password rather than the hash so that it can generate its own hash and we don't need this hardcoded there.

… hashes

davepacheco · 2023-05-02T23:31:51Z

I think I've addressed all of the review feedback (and it was approved anyway), so I'm going to enable auto-merge. Feel free to disable if I've missed something, and of course I'm happy to iterate post-integration as needed.

set up Recovery Silo and user during rack initialization

e5a955d

This was referenced Apr 27, 2023

tracking issue for MVP IAM work #849

Closed

remove built-in silo, move built-in users to the test suite #2305

Open

davepacheco added 2 commits April 28, 2023 11:41

omicron-dev run-all could print out more about the silo and user, plu…

5181a64

…s more tests

Merge branch 'main' into dap/initial-setup

2317079

davepacheco marked this pull request as ready for review April 28, 2023 20:23

davepacheco requested a review from smklein April 28, 2023 21:17

davepacheco mentioned this pull request Apr 29, 2023

end-to-end tests could use external DNS and real authn #2951

Merged

smklein approved these changes May 2, 2023

View reviewed changes

davepacheco added 6 commits May 2, 2023 13:30

review feedback 1

aedcf25

review feedback: better commonize and document the hardcoded password…

9293ea7

… hashes

Merge branch 'main' into dap/initial-setup

7b3573b

fix mismerge

4e58398

Merge branch 'main' into dap/initial-setup

9833446

password could be simpler for interactive use

dd8fd1a

davepacheco enabled auto-merge (squash) May 2, 2023 23:32

davepacheco merged commit 3072e20 into main May 3, 2023

davepacheco deleted the dap/initial-setup branch May 3, 2023 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set up Recovery Silo and user during rack initialization #2943

set up Recovery Silo and user during rack initialization #2943

davepacheco commented Apr 27, 2023 •

edited

Loading

smklein left a comment

smklein May 2, 2023

davepacheco May 2, 2023

davepacheco May 2, 2023

smklein May 2, 2023

davepacheco May 2, 2023

davepacheco May 2, 2023

smklein May 2, 2023

davepacheco May 2, 2023

smklein May 2, 2023

davepacheco May 2, 2023

davepacheco May 2, 2023

smklein May 2, 2023

davepacheco May 2, 2023

davepacheco May 2, 2023

davepacheco commented May 2, 2023

	// Silo group creation happens as Nexus's "external authn" context,
	// not the user's context here. The user may not have permission to
	// create arbitrary groups in the Silo, but we allow them to create
	// this one in this case.

	"password hash: parameter 'm': expected at least 98304 (KiB), \
	found 4096"

set up Recovery Silo and user during rack initialization #2943

set up Recovery Silo and user during rack initialization #2943

Conversation

davepacheco commented Apr 27, 2023 • edited Loading

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented May 2, 2023

davepacheco commented Apr 27, 2023 •

edited

Loading