new internal lookup API #798

davepacheco · 2022-03-22T03:15:06Z

This has been a long time coming: this PR prototypes a new API for looking up resources. This would replace the zillion $foo_lookup_by_path, $foo_lookup_by_id, and $foo_fetch functions in datastore.rs.

I got pretty far into an implementation using macro_rules!, but I think I've reached the limit and need to dive into a proc macro. Before I do that, I want to get feedback on the API itself. Please take a look at the changes in nexus.rs, where I've changed a handful of cases to use the new API. If folks like this, I will dig into the proc macro approach. (I wouldn't recommend looking too hard at lookup.rs -- it's an undocumented mess that I would almost entirely replace if we like this API.)

I'd suggest starting with the summary below, then read through the changes to nexus.rs (it's pretty small, as I only updated a few cases), then take a look at the notes below.

Synopsis: lookups follow a builder pattern. Say you want to fetch an Instance. Previously, you'd use project_lookup_by_path(org_name, project_name) followed by instance_fetch(authz_project, instance_name). Now:

let (authz_org, authz_project, authz_instance) =
    LookupPath::new(opctx, &self.db_datastore)
        .organization_name(org_name)
        .project_name(project_name)
        .instance_name(instance_name)
        .fetch()
        .await?;

As a bonus: you get the authz_org and authz_project, too.

Now, say you already have a project_id (maybe using the authz_project.id() from above). You can look up the Project by id, then look up the Disk in that Project by name:

let (_, _, authz_disk, db_disk) =
    LookupPath::new(opctx, &self.db_datastore)
        .project_id(project_id)
        .disk_name(disk_name)
        .fetch()
        .await?;

The general pattern is that you start at the root (with LookupPath::new), at which point you can feed it (1) any object's id, or (2) the name of a top-level object (currently just organizations, but this will include users and other things). At each level, you'll have functions that let you descend to the next level. At each level, you can call either fetch() (which acts like "fetch" today, which does an authz check and fetches the database row for the object you've selected) or you can call lookup_for(authz::Action) (which does an authz check for the given action and returns the authz object for whatever you've selected, but not the whole database row). More on this below.

If you're still reading, at this point I'd recommend looking at the nexus.rs changes to see more examples. Then come back here.

Here's what I like about the new API/possible implementation:

Ergonomics: today, the only way to fetch a whole database row is to look up the parent container first, then fetch the child inside that container. The new API uses a builder pattern to let you say exactly what you want -- you can start with any kind of id (to jump straight to that object) or else an organization name (to walk down the tree by names). You can mix and match too: start with, say, a project_id, then get an Instance in that Project by name.
Security/maintainability: It's harder to accidentally expose functions that access the database without doing access control because they're private to a separate module. This was a big motivator, though it could also be achieved with a less invasive change that would not need to update all callers.
Security: There are no longer any cases where you're looking up a resource without doing any access check. (This doesn't mean you've done all required access checks.) This deserves some explanation! There are a few common cases for resource lookup:
- You're fetching the contents of a database row. Before this change, this required using foo_fetch(), which always did an access check. That's not very different in this change.
- You're looking up a container solely to fetch something else inside it. For example, you call organization_lookup_by_path in order to call project_fetch() to look up a Project in this Organization After this change, what you'd do instead is directly fetch the Project, without having to look up the Project first. Here's an example.
- You're looking up a container because you're going to wind up fetching several things inside it. Suppose you want to look up an Instance and Disk in the same Project. Before this change, it would be easy (but incorrect) to do a full instance_lookup_by_path and disk_lookup_by_path. These aren't transactional, so you could wind up selecting things in different Projects. Instead, existing code chains the lookups: first look up the Project, then do two fetches to get the Instance and Disk. The new code fetches one of the objects, but the new fetch API returns all the authz objects along the path. So if you look up the disk, you also get back the authz::Project, and you can start a new lookup from there to get to the Instance. Here's an example.
- You're looking up a container in order to create something inside it. Since there's no "fetch" happening, there didn't used to be any authz check here. But rather than keep the existing general-purpose un-authz-checked lookup() function, I've replaced that with lookup_for() which requires an authz::Action. This function does the lookup and an authz check for whatever action you give it. Here's an example. This isn't bulletproof of course, but it should make it harder to accidentally get this wrong.
It's worth restating that: previously, the lookup() functions were not authz-protected at all because you needed to call lookup() on, say, an Organization in order to do a fetch on a Project, but we didn't know at the time of the first lookup whether it would be allowed. By combining the lookup of the Organization with the fetch of the Project, we cover that case and no longer need to expose a general-purpose lookup function that does no authz check. Again, it's not bulletproof: you could still fetch an Instance, get the authz::Project, and then do something you shouldn't be able to do with the authz::Project. But this eliminates a pretty common case that'd be very easy to get wrong.
Maintainability: All the low-level lookup/fetch functions will be generated by macros, meaning that I expect this to be a significant net reduction in lines of code, and the implementations will be guaranteed to be uniform.
We have an OpContext when doing all lookups.
- This is going to be a necessary step for Silos anyway. That means even if we don't do this, we're going to wind up updating all callers of the datastore functions that this API would replace.
- This make it very easy to start using pool_authorized() instead of pool() everywhere, which has the advantage of skipping the database altogether for unauthenticated requests .
(Future) Performance: a consequence of the first point is that where callers used to have two explicit calls into the datastore, there's now only one call to the lookup API. That kind of fetch can be done with a single database query. The current implementation still makes multiple queries, but by having the API better match what the caller's trying to do, we can change the implementation in the future to make only one query.

Obviously there's a lot that I like about this. But why not do this? That's what I want feedback on. It's a different enough API that I think there's some risk we get pretty far into using it before realizing it doesn't handle some important case at all, which would be pretty sad. Two concrete reasons I can think why this might not be good:

Like I said, I changed lookup()/fetch() to return the entire path of authz objects. So if you fetch, say, a VpcSubnet, you'll get back (authz::Organization, authz::Project, authz::Vpc, authz::VpcSubnet, db::model::VpcSubnet). If you only really wanted the subnet, you'll wind up doing a lot of let (_, _, _, authz_subnet, db_subnet) stuff. We could make this a little neater if we reverse the order of these, though I think the reverse order is less intuitive.
This is going to have a huge blast radius (all callers of all lookup/fetch functions). But that needs to happen for Silos anyway. And this can be done incrementally, similar to what we did for authz. (I would initially keep the legacy lookup functions around and replace callers one-by-one with calls to this API.)

Anyway, I'd love to know if folks think this is a promising direction, in which case I'll proceed with fixing up the implementation. (note: there's a working version of this that doesn't return all the ancestors in the path. That's just a few commits back. It's returning the ancestors in a flat tuple where I think I've run into the limits of macro_rules!.)

…les!

nexus/src/db/lookup.rs

smklein · 2022-03-22T15:50:46Z

nexus/src/db/lookup.rs

+    pub fn organization_name<'b, 'c>(self, name: &'b Name) -> Organization<'c>
+    where
+        'a: 'c,
+        'b: 'c,
+    {
+        Organization { key: Key::Name(self, name) }
+    }
+
+    pub fn organization_id(self, id: Uuid) -> Organization<'a> {
+        Organization { key: Key::Id(self, id) }
+    }
+
+    pub fn project_id(self, id: Uuid) -> Project<'a> {
+        Project { key: Key::Id(self, id) }
+    }
+
+    pub fn instance_id(self, id: Uuid) -> Instance<'a> {
+        Instance { key: Key::Id(self, id) }
+    }
+
+    pub fn disk_id(self, id: Uuid) -> Disk<'a> {
+        Disk { key: Key::Id(self, id) }
+    }


So to clarify, if we extend this model to other types, we need to:

Add functions to LookupPath for each resource that can be looked up (generally "by ID")

Add a define_lookup_with_parent macro call

Add manual impls to look up names, for the auto-generated structs?

With the new proc macro implementation, the steps are:

add one function to LookupPath -- fn ${new_resource_name}_id()

add a call to the new lookup_resource macro to define everything else for the new resource

nexus/src/nexus.rs

smklein · 2022-03-22T15:56:09Z

I think the TL;DR of my comments are:

I like the API!
I would be thrilled if there was a way to do this by defining the structs which represent resources, and describing the relationships between them via traits. The scariest part of this patch (to me) is the combination of "manually-implemented" + "macro-implemented" structures.

davepacheco · 2022-03-22T18:33:21Z

For the record, I was able to make this work with macro_rules! and the latest version is fully working. I'm still not sure it's a great approach though.

davepacheco · 2022-03-30T23:01:46Z

I changed lookup()/fetch() to return the entire path of authz objects. So if you fetch, say, a VpcSubnet, you'll get back (authz::Organization, authz::Project, authz::Vpc, authz::VpcSubnet, db::model::VpcSubnet). If you only really wanted the subnet, you'll wind up doing a lot of let (_, _, _, authz_subnet, db_subnet) stuff. We could make this a little neater if we reverse the order of these, though I think the reverse order is less intuitive.

I do like having the full path of objects there. Maybe it should be a struct? You would lose the ordering, though. Reverse order is definitely worth considering. I have a feeling we'll know pretty quickly whether we want it.

I realized that you can do let (.., authz_subnet, db_subnet) just as easily as let (db_subnet, authz_subnet, ..), so the ordering doesn't matter, so I'm sticking with the one I find intuitive. I think a struct would add a lot more boilerplate to most callers.

davepacheco · 2022-03-30T23:06:05Z

Okay, I've made a lot of changes since the first round of reviews, but the API has not changed. I completely replaced the implementation with a proc macro. I think I've addressed all the feedback up to this point. Thanks @smklein @david-crespo and @andrewjstone for the reviews!

@smklein in particular would you mind taking a look at nexus/src/db/lookup.rs, since I know you found the previous version kind of icky? The new version is function-like proc macro that does nearly everything. To add a new resource, the only thing you have to do aside from calling the macro is add the ${resource}_id(&self, id: UUid) function on LookupPath, which lets you do a global lookup/fetch by id.

david-crespo · 2022-03-31T02:57:20Z

nexus/src/db/lookup.rs

+    ancestors = [ "Organization", "Project" ],
+    children = [],
+    authz_kind = Generic
+}


this is so great!

david-crespo · 2022-03-31T03:09:09Z

nexus/src/db/db-macros/src/lookup.rs

+    // For example: `authz_organization.project()`.  For "generic" resources
+    // (`AuthzKind::Generic`), the parent has a function called `child_generic`
+    // that's used to construct all child resources, and there's an extra
+    // `ResourceType` argument to say what resource it is.


FWIW in order to understand why there are two patterns, I had to read this comment, the one on enum AuthzKind on line 42, and then also go look at authz/api_resources.rs. It sounds like "typed" is resources that you can attach roles to (org and project) and generic is resources that have no permissions of their own, instead inheriting them from their parent. The reason they're called "typed" is you need to use the constructor specific to that resource to get an Authz object, whereas for generic children, we don't care what kind of child they are (e.g., disk or instance), we only care that they are children of the parent. (Summarizing to make sure I get it.)

I don't think the comments necessarily need to be changed, but it doesn't seem like there's a single main spot where this is explained. Just giving my experience of where the mental friction is as someone who did not write the code.

omicron/nexus/src/authz/api_resources.rs

Lines 141 to 146 in 52dfddb

/// Aside from Organizations (which you create with

/// [`Fleet::organization()`] instead), all instances of all types of Fleet

/// children are treated interchangeably by the authz subsystem. That's

/// because we do not currently support assigning roles to these resources,

/// so all that matters for authz is that they are a child of the Fleet.

pub fn child_generic(

That's basically right. Part of what's confusing is that I think of this as two different properties that happen to be correlated:

a resource cannot have roles directly attached to it (like anything under a Project)

a resource's authz:: type is constructed by a typed constructor vs. a generic one

Those are the same, in that the typed constructor is used iff the resource can have roles directly attached to it, and that's not really a coincidence. But it seemed like a pretty big abstraction leak to have the macro know anything about roles.

For what it's worth, another change I want to make soon is to have all of these types use a typed constructor. I realized recently that there's nothing to stop you from passing an authz::Instance where an authz::Disk is expected because they're both type aliases for authz::ProjectChild. That's dangerous, so I want to instead use the typed approach (which may mean another macro :-/)

Right, it's confusing to use names that are about the code to make a substantive distinction about roles. Typed for everything would make a lot of sense and eliminate a whole thing to understand.

I want to echo @david-crespo 's sentiments - I bumped into this when reading the lookup_resource arguments, trying to understand when I'd want to supply one AuthzKind vs another, and had to do a lot of digging to understand when to supply one vs the other.

The terms "typed" and "generic" seem to refer a lot more to "how we have implemented these objects" rather than the properties of the objects themselves.

Yeah, I get that this is more annoying than we'd like, and as I said, that's why I want to simplify the authz types to have a more uniform interface. Do you have a suggestion for the meantime?

I think it’s fine to aim to fix later.

david-crespo

yes

smklein · 2022-03-31T18:17:15Z

nexus/src/db/db-macros/src/lookup.rs

+// NOTE: this is only "pub" for the `cargo doc` link on [`lookup_resource!`].
+#[derive(serde::Deserialize)]
+pub struct Input {
+    /// Name of the resource (PascalCase)


The "PascalCase" part of this is kinda coincidental - the actual expectation is that:

name is the exact name of a type in nexus::db::model

name is the exact name of a type in nexus::authz

The resource exists in db::schema::<snake_case_name>::dsl

I tried declaring lookup_resource for a resource named MyNewResource - I wanted to use "compiler-driven development" just to see where the dependencies are, and it seems like there are quite a handful of these that are implied by the implementation details of the macro.

That's absolutely true. Since a bunch of those constraints are useful for developers anyway, it seemed prudent to take advantage of those. An alternative would be adding a bunch of extra configuration properties that you pass into the macro that we expect to be exactly the same. That feels like it would make it easy to get wrong, and also easy to violate the conventions that I do think are useful for developers (e.g., that the type in authz:: has the same name as the one in model::). Is there something else you'd propose?

In a599546 I've updated this comment to emphasize the assumptions that you mentioned. Thanks for pointing that out!

Re: the use of "PascalCase" here: I get what you mean by coincidental, but I'm not sure that's true. To be explicit, there are two versions of the name of a resource:

the Pascal case version is used for Rust type names and enum variant names (including authz::$Resource, db::model::$Resource, the ResourceType variant, and the struct created by this macro)

the snake case version is used for database table names, database column names, Rust module names, and Rust struct fields

As a developer, when I need to put the name of a resource somewhere, which one do I use? It depends on the (arbitrary) convention of the place where you're using it. This macro winds up needing to use it in almost all of the above places. So for the input form we just pick one and document it. As a reader, I find it much clearer to say "the PascalCase version of the name goes here" than "the version of the name that matches the db::model and authz types". Maybe that's a personal preference?

Anyway, I agree that it's useful to document the assumptions, so I've done that and led with those. I just wanted to explain why I've left "PascalCase" in the docs, both here and the other places where it appears.

FWIW, I do think documenting the expected behavior is a strong step in the right direction.

I personally find associated types ideal for this sort of association - we're effectively claiming that there exists a resource...

... with a model struct

... with a schema in the DB

... with a corresponding authz resource

but the connection between these pieces of code is implicit.

Like, that's why we're supplying AuthzKind by type to the macro, but all the other arguments are strings - it's because they aren't types, they must actually correspond to multiple types simultaneously.

To be clear - I don't think this should be blocking - I think the new lookup functions are great, and I love the new API, but I just view this as an area where cleanup could potentially make this more readable by depending on types rather than string-ified tokens.

Okay. I think I'll land this as it is now. I'm definitely open to cleanup here. Ideally, we wouldn't need a macro at all to generate all these functions (per our earlier conversation).

smklein · 2022-03-31T18:23:12Z

nexus/src/db/db-macros/src/lookup.rs

+    path_authz_names: Vec<syn::Ident>,
+
+    // Child resources
+    /// list of names of child resources (PascalCase, raw input to the macro)


Again, I don't think the "PascalCase" bit is the piece that matters - these are expected to be the names of types / structures which exist in specific modules, and need specific functions.

See above -- in a599546 I've updated this to reference the place where it's explained more fully, and left "PascalCase" there because I find it very helpful as a reader.

smklein · 2022-03-31T18:26:25Z

nexus/src/db/db-macros/src/lookup.rs

+    // For example: `authz_organization.project()`.  For "generic" resources
+    // (`AuthzKind::Generic`), the parent has a function called `child_generic`
+    // that's used to construct all child resources, and there's an extra
+    // `ResourceType` argument to say what resource it is.


I want to echo @david-crespo 's sentiments - I bumped into this when reading the lookup_resource arguments, trying to understand when I'd want to supply one AuthzKind vs another, and had to do a lot of digging to understand when to supply one vs the other.

The terms "typed" and "generic" seem to refer a lot more to "how we have implemented these objects" rather than the properties of the objects themselves.

davepacheco added 22 commits March 19, 2022 19:57

first cut

4610d1a

rename Lookup

ef932a1

formatting nit

9cc52b0

formatting

5845149

fix

49a10e3

some notes, PoC diesel impl

491f30c

get rid of Root

037b78f

make LookupPath reachable through the whole path

ebaefb1

some cleanup

ce4b362

flesh out macros

aae2da4

more macro cleanup

73f0f4b

more cleanup

68b51cf

more commonizing

6d265bb

cleanup

e4a5844

more commonizing

0e73f25

lookup done enough to start trying it out

a414c1f

prototype a few call sites in Nexus

c7941b3

some basic tests are working by accident

5690db3

cleanup

295182b

fix up prototyping

bcf57a6

replace one XXX with a comment

43d0102

in-progress update to return entire path -- cannot work with macro_ru…

0a494ae

…les!

davepacheco requested review from david-crespo, zephraph and bnaecker March 22, 2022 03:15

davepacheco mentioned this pull request Mar 22, 2022

Enable retrieving and modifying a resource by id #485

Closed

smklein reviewed Mar 22, 2022

View reviewed changes

it works with macro_rules!

013e981

davepacheco added 5 commits March 30, 2022 15:35

macro impl: authz type need not be first-classed

07eaab8

clippy

ea67303

macro impl: add comment

62e5eea

Merge branch 'main' into new-lookup

32eab6e

fix mismerge

fa0f1d6

davepacheco marked this pull request as ready for review March 30, 2022 23:06

davepacheco mentioned this pull request Mar 31, 2022

convert route endpoints to new lookup API #839

Merged

david-crespo reviewed Mar 31, 2022

View reviewed changes

davepacheco mentioned this pull request Mar 31, 2022

convert router endpoints to new lookup API #840

Merged

david-crespo approved these changes Mar 31, 2022

View reviewed changes

smklein reviewed Mar 31, 2022

View reviewed changes

davepacheco added 2 commits March 31, 2022 11:44

better explain use of input.name

a599546

cargo doc nit

f7ff93a

davepacheco enabled auto-merge (squash) March 31, 2022 19:21

davepacheco changed the title ~~prototype new lookup API~~ new lookup API Mar 31, 2022

davepacheco changed the title ~~new lookup API~~ new internal lookup API Mar 31, 2022

davepacheco disabled auto-merge March 31, 2022 19:21

davepacheco enabled auto-merge (squash) March 31, 2022 19:22

davepacheco merged commit 1c37c62 into main Mar 31, 2022

davepacheco deleted the new-lookup branch March 31, 2022 19:27

zephraph mentioned this pull request Jun 24, 2022

Add routes to lookup resources by id #1266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new internal lookup API #798

new internal lookup API #798

davepacheco commented Mar 22, 2022

smklein Mar 22, 2022

davepacheco Mar 30, 2022

smklein commented Mar 22, 2022

davepacheco commented Mar 22, 2022

davepacheco commented Mar 30, 2022

davepacheco commented Mar 30, 2022

david-crespo Mar 31, 2022

david-crespo Mar 31, 2022

davepacheco Mar 31, 2022

david-crespo Mar 31, 2022

smklein Mar 31, 2022

davepacheco Mar 31, 2022

david-crespo Mar 31, 2022

david-crespo left a comment

smklein Mar 31, 2022

davepacheco Mar 31, 2022

davepacheco Mar 31, 2022

smklein Mar 31, 2022

davepacheco Mar 31, 2022

smklein Mar 31, 2022

davepacheco Mar 31, 2022

smklein Mar 31, 2022

	/// Aside from Organizations (which you create with
	/// [`Fleet::organization()`] instead), all instances of all types of Fleet
	/// children are treated interchangeably by the authz subsystem. That's
	/// because we do not currently support assigning roles to these resources,
	/// so all that matters for authz is that they are a child of the Fleet.
	pub fn child_generic(

new internal lookup API #798

new internal lookup API #798

Conversation

davepacheco commented Mar 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Mar 22, 2022

davepacheco commented Mar 22, 2022

davepacheco commented Mar 30, 2022

davepacheco commented Mar 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-crespo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment