Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new internal lookup API #798

Merged
merged 57 commits into from
Mar 31, 2022
Merged

new internal lookup API #798

merged 57 commits into from
Mar 31, 2022

Conversation

davepacheco
Copy link
Collaborator

This has been a long time coming: this PR prototypes a new API for looking up resources. This would replace the zillion $foo_lookup_by_path, $foo_lookup_by_id, and $foo_fetch functions in datastore.rs.

I got pretty far into an implementation using macro_rules!, but I think I've reached the limit and need to dive into a proc macro. Before I do that, I want to get feedback on the API itself. Please take a look at the changes in nexus.rs, where I've changed a handful of cases to use the new API. If folks like this, I will dig into the proc macro approach. (I wouldn't recommend looking too hard at lookup.rs -- it's an undocumented mess that I would almost entirely replace if we like this API.)

I'd suggest starting with the summary below, then read through the changes to nexus.rs (it's pretty small, as I only updated a few cases), then take a look at the notes below.


Synopsis: lookups follow a builder pattern. Say you want to fetch an Instance. Previously, you'd use project_lookup_by_path(org_name, project_name) followed by instance_fetch(authz_project, instance_name). Now:

let (authz_org, authz_project, authz_instance) =
    LookupPath::new(opctx, &self.db_datastore)
        .organization_name(org_name)
        .project_name(project_name)
        .instance_name(instance_name)
        .fetch()
        .await?;

As a bonus: you get the authz_org and authz_project, too.

Now, say you already have a project_id (maybe using the authz_project.id() from above). You can look up the Project by id, then look up the Disk in that Project by name:

let (_, _, authz_disk, db_disk) =
    LookupPath::new(opctx, &self.db_datastore)
        .project_id(project_id)
        .disk_name(disk_name)
        .fetch()
        .await?;

The general pattern is that you start at the root (with LookupPath::new), at which point you can feed it (1) any object's id, or (2) the name of a top-level object (currently just organizations, but this will include users and other things). At each level, you'll have functions that let you descend to the next level. At each level, you can call either fetch() (which acts like "fetch" today, which does an authz check and fetches the database row for the object you've selected) or you can call lookup_for(authz::Action) (which does an authz check for the given action and returns the authz object for whatever you've selected, but not the whole database row). More on this below.

If you're still reading, at this point I'd recommend looking at the nexus.rs changes to see more examples. Then come back here.


Here's what I like about the new API/possible implementation:

  • Ergonomics: today, the only way to fetch a whole database row is to look up the parent container first, then fetch the child inside that container. The new API uses a builder pattern to let you say exactly what you want -- you can start with any kind of id (to jump straight to that object) or else an organization name (to walk down the tree by names). You can mix and match too: start with, say, a project_id, then get an Instance in that Project by name.
  • Security/maintainability: It's harder to accidentally expose functions that access the database without doing access control because they're private to a separate module. This was a big motivator, though it could also be achieved with a less invasive change that would not need to update all callers.
  • Security: There are no longer any cases where you're looking up a resource without doing any access check. (This doesn't mean you've done all required access checks.) This deserves some explanation! There are a few common cases for resource lookup:
    • You're fetching the contents of a database row. Before this change, this required using foo_fetch(), which always did an access check. That's not very different in this change.
    • You're looking up a container solely to fetch something else inside it. For example, you call organization_lookup_by_path in order to call project_fetch() to look up a Project in this Organization After this change, what you'd do instead is directly fetch the Project, without having to look up the Project first. Here's an example.
    • You're looking up a container because you're going to wind up fetching several things inside it. Suppose you want to look up an Instance and Disk in the same Project. Before this change, it would be easy (but incorrect) to do a full instance_lookup_by_path and disk_lookup_by_path. These aren't transactional, so you could wind up selecting things in different Projects. Instead, existing code chains the lookups: first look up the Project, then do two fetches to get the Instance and Disk. The new code fetches one of the objects, but the new fetch API returns all the authz objects along the path. So if you look up the disk, you also get back the authz::Project, and you can start a new lookup from there to get to the Instance. Here's an example.
    • You're looking up a container in order to create something inside it. Since there's no "fetch" happening, there didn't used to be any authz check here. But rather than keep the existing general-purpose un-authz-checked lookup() function, I've replaced that with lookup_for() which requires an authz::Action. This function does the lookup and an authz check for whatever action you give it. Here's an example. This isn't bulletproof of course, but it should make it harder to accidentally get this wrong.
  • It's worth restating that: previously, the lookup() functions were not authz-protected at all because you needed to call lookup() on, say, an Organization in order to do a fetch on a Project, but we didn't know at the time of the first lookup whether it would be allowed. By combining the lookup of the Organization with the fetch of the Project, we cover that case and no longer need to expose a general-purpose lookup function that does no authz check. Again, it's not bulletproof: you could still fetch an Instance, get the authz::Project, and then do something you shouldn't be able to do with the authz::Project. But this eliminates a pretty common case that'd be very easy to get wrong.
  • Maintainability: All the low-level lookup/fetch functions will be generated by macros, meaning that I expect this to be a significant net reduction in lines of code, and the implementations will be guaranteed to be uniform.
  • We have an OpContext when doing all lookups.
    • This is going to be a necessary step for Silos anyway. That means even if we don't do this, we're going to wind up updating all callers of the datastore functions that this API would replace.
    • This make it very easy to start using pool_authorized() instead of pool() everywhere, which has the advantage of skipping the database altogether for unauthenticated requests .
  • (Future) Performance: a consequence of the first point is that where callers used to have two explicit calls into the datastore, there's now only one call to the lookup API. That kind of fetch can be done with a single database query. The current implementation still makes multiple queries, but by having the API better match what the caller's trying to do, we can change the implementation in the future to make only one query.

Obviously there's a lot that I like about this. But why not do this? That's what I want feedback on. It's a different enough API that I think there's some risk we get pretty far into using it before realizing it doesn't handle some important case at all, which would be pretty sad. Two concrete reasons I can think why this might not be good:

  • Like I said, I changed lookup()/fetch() to return the entire path of authz objects. So if you fetch, say, a VpcSubnet, you'll get back (authz::Organization, authz::Project, authz::Vpc, authz::VpcSubnet, db::model::VpcSubnet). If you only really wanted the subnet, you'll wind up doing a lot of let (_, _, _, authz_subnet, db_subnet) stuff. We could make this a little neater if we reverse the order of these, though I think the reverse order is less intuitive.
  • This is going to have a huge blast radius (all callers of all lookup/fetch functions). But that needs to happen for Silos anyway. And this can be done incrementally, similar to what we did for authz. (I would initially keep the legacy lookup functions around and replace callers one-by-one with calls to this API.)

Anyway, I'd love to know if folks think this is a promising direction, in which case I'll proceed with fixing up the implementation. (note: there's a working version of this that doesn't return all the ancestors in the path. That's just a few commits back. It's returning the ancestors in a flat tuple where I think I've run into the limits of macro_rules!.)

Comment on lines 129 to 151
pub fn organization_name<'b, 'c>(self, name: &'b Name) -> Organization<'c>
where
'a: 'c,
'b: 'c,
{
Organization { key: Key::Name(self, name) }
}

pub fn organization_id(self, id: Uuid) -> Organization<'a> {
Organization { key: Key::Id(self, id) }
}

pub fn project_id(self, id: Uuid) -> Project<'a> {
Project { key: Key::Id(self, id) }
}

pub fn instance_id(self, id: Uuid) -> Instance<'a> {
Instance { key: Key::Id(self, id) }
}

pub fn disk_id(self, id: Uuid) -> Disk<'a> {
Disk { key: Key::Id(self, id) }
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, if we extend this model to other types, we need to:

  • Add functions to LookupPath for each resource that can be looked up (generally "by ID")
  • Add a define_lookup_with_parent macro call
  • Add manual impls to look up names, for the auto-generated structs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new proc macro implementation, the steps are:

  • add one function to LookupPath -- fn ${new_resource_name}_id()
  • add a call to the new lookup_resource macro to define everything else for the new resource

@smklein
Copy link
Collaborator

smklein commented Mar 22, 2022

I think the TL;DR of my comments are:

  • I like the API!
  • I would be thrilled if there was a way to do this by defining the structs which represent resources, and describing the relationships between them via traits. The scariest part of this patch (to me) is the combination of "manually-implemented" + "macro-implemented" structures.

@davepacheco
Copy link
Collaborator Author

For the record, I was able to make this work with macro_rules! and the latest version is fully working. I'm still not sure it's a great approach though.

@davepacheco
Copy link
Collaborator Author

I changed lookup()/fetch() to return the entire path of authz objects. So if you fetch, say, a VpcSubnet, you'll get back (authz::Organization, authz::Project, authz::Vpc, authz::VpcSubnet, db::model::VpcSubnet). If you only really wanted the subnet, you'll wind up doing a lot of let (_, _, _, authz_subnet, db_subnet) stuff. We could make this a little neater if we reverse the order of these, though I think the reverse order is less intuitive.

I do like having the full path of objects there. Maybe it should be a struct? You would lose the ordering, though. Reverse order is definitely worth considering. I have a feeling we'll know pretty quickly whether we want it.

I realized that you can do let (.., authz_subnet, db_subnet) just as easily as let (db_subnet, authz_subnet, ..), so the ordering doesn't matter, so I'm sticking with the one I find intuitive. I think a struct would add a lot more boilerplate to most callers.

@davepacheco
Copy link
Collaborator Author

Okay, I've made a lot of changes since the first round of reviews, but the API has not changed. I completely replaced the implementation with a proc macro. I think I've addressed all the feedback up to this point. Thanks @smklein @david-crespo and @andrewjstone for the reviews!

@smklein in particular would you mind taking a look at nexus/src/db/lookup.rs, since I know you found the previous version kind of icky? The new version is function-like proc macro that does nearly everything. To add a new resource, the only thing you have to do aside from calling the macro is add the ${resource}_id(&self, id: UUid) function on LookupPath, which lets you do a global lookup/fetch by id.

@davepacheco davepacheco marked this pull request as ready for review March 30, 2022 23:06
ancestors = [ "Organization", "Project" ],
children = [],
authz_kind = Generic
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so great!

// For example: `authz_organization.project()`. For "generic" resources
// (`AuthzKind::Generic`), the parent has a function called `child_generic`
// that's used to construct all child resources, and there's an extra
// `ResourceType` argument to say what resource it is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW in order to understand why there are two patterns, I had to read this comment, the one on enum AuthzKind on line 42, and then also go look at authz/api_resources.rs. It sounds like "typed" is resources that you can attach roles to (org and project) and generic is resources that have no permissions of their own, instead inheriting them from their parent. The reason they're called "typed" is you need to use the constructor specific to that resource to get an Authz object, whereas for generic children, we don't care what kind of child they are (e.g., disk or instance), we only care that they are children of the parent. (Summarizing to make sure I get it.)

I don't think the comments necessarily need to be changed, but it doesn't seem like there's a single main spot where this is explained. Just giving my experience of where the mental friction is as someone who did not write the code.

/// Aside from Organizations (which you create with
/// [`Fleet::organization()`] instead), all instances of all types of Fleet
/// children are treated interchangeably by the authz subsystem. That's
/// because we do not currently support assigning roles to these resources,
/// so all that matters for authz is that they are a child of the Fleet.
pub fn child_generic(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically right. Part of what's confusing is that I think of this as two different properties that happen to be correlated:

  1. a resource cannot have roles directly attached to it (like anything under a Project)
  2. a resource's authz:: type is constructed by a typed constructor vs. a generic one

Those are the same, in that the typed constructor is used iff the resource can have roles directly attached to it, and that's not really a coincidence. But it seemed like a pretty big abstraction leak to have the macro know anything about roles.

For what it's worth, another change I want to make soon is to have all of these types use a typed constructor. I realized recently that there's nothing to stop you from passing an authz::Instance where an authz::Disk is expected because they're both type aliases for authz::ProjectChild. That's dangerous, so I want to instead use the typed approach (which may mean another macro :-/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's confusing to use names that are about the code to make a substantive distinction about roles. Typed for everything would make a lot of sense and eliminate a whole thing to understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to echo @david-crespo 's sentiments - I bumped into this when reading the lookup_resource arguments, trying to understand when I'd want to supply one AuthzKind vs another, and had to do a lot of digging to understand when to supply one vs the other.

The terms "typed" and "generic" seem to refer a lot more to "how we have implemented these objects" rather than the properties of the objects themselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get that this is more annoying than we'd like, and as I said, that's why I want to simplify the authz types to have a more uniform interface. Do you have a suggestion for the meantime?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s fine to aim to fix later.

Copy link
Contributor

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

// NOTE: this is only "pub" for the `cargo doc` link on [`lookup_resource!`].
#[derive(serde::Deserialize)]
pub struct Input {
/// Name of the resource (PascalCase)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "PascalCase" part of this is kinda coincidental - the actual expectation is that:

  • name is the exact name of a type in nexus::db::model
  • name is the exact name of a type in nexus::authz
  • The resource exists in db::schema::<snake_case_name>::dsl

I tried declaring lookup_resource for a resource named MyNewResource - I wanted to use "compiler-driven development" just to see where the dependencies are, and it seems like there are quite a handful of these that are implied by the implementation details of the macro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's absolutely true. Since a bunch of those constraints are useful for developers anyway, it seemed prudent to take advantage of those. An alternative would be adding a bunch of extra configuration properties that you pass into the macro that we expect to be exactly the same. That feels like it would make it easy to get wrong, and also easy to violate the conventions that I do think are useful for developers (e.g., that the type in authz:: has the same name as the one in model::). Is there something else you'd propose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a599546 I've updated this comment to emphasize the assumptions that you mentioned. Thanks for pointing that out!

Re: the use of "PascalCase" here: I get what you mean by coincidental, but I'm not sure that's true. To be explicit, there are two versions of the name of a resource:

  • the Pascal case version is used for Rust type names and enum variant names (including authz::$Resource, db::model::$Resource, the ResourceType variant, and the struct created by this macro)
  • the snake case version is used for database table names, database column names, Rust module names, and Rust struct fields

As a developer, when I need to put the name of a resource somewhere, which one do I use? It depends on the (arbitrary) convention of the place where you're using it. This macro winds up needing to use it in almost all of the above places. So for the input form we just pick one and document it. As a reader, I find it much clearer to say "the PascalCase version of the name goes here" than "the version of the name that matches the db::model and authz types". Maybe that's a personal preference?

Anyway, I agree that it's useful to document the assumptions, so I've done that and led with those. I just wanted to explain why I've left "PascalCase" in the docs, both here and the other places where it appears.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I do think documenting the expected behavior is a strong step in the right direction.

I personally find associated types ideal for this sort of association - we're effectively claiming that there exists a resource...

  • ... with a model struct
  • ... with a schema in the DB
  • ... with a corresponding authz resource

but the connection between these pieces of code is implicit.

Like, that's why we're supplying AuthzKind by type to the macro, but all the other arguments are strings - it's because they aren't types, they must actually correspond to multiple types simultaneously.

To be clear - I don't think this should be blocking - I think the new lookup functions are great, and I love the new API, but I just view this as an area where cleanup could potentially make this more readable by depending on types rather than string-ified tokens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I think I'll land this as it is now. I'm definitely open to cleanup here. Ideally, we wouldn't need a macro at all to generate all these functions (per our earlier conversation).

path_authz_names: Vec<syn::Ident>,

// Child resources
/// list of names of child resources (PascalCase, raw input to the macro)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think the "PascalCase" bit is the piece that matters - these are expected to be the names of types / structures which exist in specific modules, and need specific functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above -- in a599546 I've updated this to reference the place where it's explained more fully, and left "PascalCase" there because I find it very helpful as a reader.

// For example: `authz_organization.project()`. For "generic" resources
// (`AuthzKind::Generic`), the parent has a function called `child_generic`
// that's used to construct all child resources, and there's an extra
// `ResourceType` argument to say what resource it is.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to echo @david-crespo 's sentiments - I bumped into this when reading the lookup_resource arguments, trying to understand when I'd want to supply one AuthzKind vs another, and had to do a lot of digging to understand when to supply one vs the other.

The terms "typed" and "generic" seem to refer a lot more to "how we have implemented these objects" rather than the properties of the objects themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants