Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

figure out where we should store gateway-assigned content commitments (and implement it) #138

Closed
Tracked by #135
travis opened this issue Sep 20, 2024 · 4 comments
Closed
Tracked by #135
Assignees

Comments

@travis
Copy link
Member

travis commented Sep 20, 2024

We don't currently have a solid plan for storing and indexing the "content commitments" that get delegated to the gateway and used to "prove" that an invocation requesting a given CID's data is valid.

I'd propose we use roughly the same pattern we use for access/claim-related delegations, but in Cloudflare - store the raw delegation data in R2 and create some rough indexes in KV to map from "auth token" to delegations.

I think the accounting service may be the right place for this to live - uploaders can tell it about new commitments to the gateway and the gateway can query it for commitments.

@travis travis changed the title figure out where we store gateway-assigned location commitments figure out where we store gateway-assigned location commitments (and implement it) Sep 20, 2024
@travis travis changed the title figure out where we store gateway-assigned location commitments (and implement it) figure out where we should store gateway-assigned location commitments (and implement it) Sep 20, 2024
@hannahhoward hannahhoward moved this to Sprint Backlog in Storacha Project Planning Sep 23, 2024
@travis travis changed the title figure out where we should store gateway-assigned location commitments (and implement it) figure out where we should store gateway-assigned content commitments (and implement it) Oct 4, 2024
@Peeja
Copy link
Member

Peeja commented Oct 4, 2024

What started as a comment and clarification became a whole proposal.

image

A Location Claim is a specific kind of Content Claim. A Content Claim is a verifiable (ie, signed) claim about a piece of content which can be distributed so that entities other than the original claimer can publish them and prove their validity. A Location Claim specifically claims that a piece of content may be located and retrieved via some method, one which does not need to know about anything web3-y—most commonly, an HTTP URL. A Location Claim tells anyone where to find the content.

A Location Claim does not authorize anyone to retrieve the content. That's a separate concern. Moreover, a Location Claim is an attestation made by a Storage Node that it can provide the content. It's not the Storage Node's responsibility to determine if a request is authorized, so the Location Claim can't implement that.

Who does decide if a request is authorized? A moment ago, I would have said the Gateway does. But after thinking it through a bit, I think that's not in line with how Storacha works. Storacha is built on self-sovereignty. Who decides who is authorized to access a piece of content? The owner of the content: the Space.

Who is the gatekeeper? Who looks at the authorization to decide who actually gets to access the content? It could be the Storage Node, as they're closest to the stored content, but we'd like to leave them out of the authorization system, at least for now—we want them to be able to act like an FTP server we keep stuff on. So the duty falls to the next step away from the content: the Gateway. The Gateway will be looking at UCAN's and deciding whether or not to fulfill requests based on authority delegated, ultimately, from the Space.

And so here's where the UCANs finally come in. The root authority to access content within a Space—call it /retrieve/content for now—lies with the Space; the Space is the subject (more on this in a moment). The Space must then delegate that authority, and it starts by delegating it (as *, or / in UCAN 1.0) to an Account, which eventually delegates it to an Agent. The Agent must then delegate it to…whom? In English, "Who is allowed to retrieve the content of the Space?" In English, "The Downloader is allowed to retrieve the content of the Space." Thus, we should logically delegate to the Downloader.

Now, what's a Downloader, again? The Downloader is the entity which retrieves content, with a notable caveat: the Downloader, much like the Storage Node at the other end of the chain, doesn't speak UCAN. It has only an HTTP URL, which we'd like to contain two things: a reference to the content (CID? Path?) and a Token. The Token is enough to authorize retrieval. The Token may have a lifetime, and the Token should be revokable, but while the Token is valid, simply providing it is enough to be able to authorize them for retrieval. It is thus a Bearer Token.

So the Downloader doesn't speak UCAN, but does have a form of authentication: the Token. Anyone holding the Token is authorized, so it must authenticate them as an entity which is authorized. Who is that entity?

I posit that the entity is the Token itself! Just as a keypair can be UCAN principal, so, I propose, can a Token. I propose a new DID method, bearer, which takes the form did:bearer:<bearer-token>. Thus, creating a Token might look like:

{
  "iss": "did:key:zAgent",
  "aud": "did:bearer:f78b5c9e-4913-4b6b-9600-1b28bde09ef6",
  "sub": "did:key:zSpace",
  "cmd": "/retrieve/content",
  "pol": [
    ["==", ".cid", "bafy...data"],
  ]
}

This communicates to the Gateway the intended authorization, by delegated authority of the Space. In English, "Whoever bears this Token may retrieve from the Space this piece of content." This is enforceable by the Gateway. If a request bears the token, it finds the content by way of Location Claim and proxies it to the Downloader. If it does not, or if the delegation has become invalid, the request is unauthorized.

Some notes:

  • I've proposed the Space as the subject here. Potentially, the content itself makes more sense, since it's the ultimate Resource in question, but currently a piece of content can't be a Principal. (I'm still not clear on how any Resource that's not a Principal can ever be a subject, since the delegation chain couldn't end at a self-issued delegation.) Using the Space as the subject has some advantages, though: notably, every token-authenticated request is bound to a Space, and thus a billing account, while the CID itself may be shared among Spaces; also we potentially have the full power of UCAN 1.0 policies available to let the customer authorize an entire Space, or some subset. [E: Ah, reading closer, in UCAN 1.0, a Subject must be a Principal, for just this reason. The Resource, if it's different from the Subject should be described using a policy, just like this. Despite being a section in the spec, this appears to be a best practice rather than its own distinct feature.]

  • It's unfortunate that this sticks the Agent in the delegation chain for the lifetime of the Token, since Agents are meant to be ephemeral. But that's a problem we already have elsewhere.

  • Note that anyone who knows the contents of the UCAN delegation knows the Token, and thus is themselves the audience. The delegation becomes, in essence, a "bearer delegation". But we don't need to ship these outside the service, so they become exactly as sensitive as the Token itself. The Token should not be terribly sensitive; it's meant to attach requests to billing, not secure sensitive data. (? Is this true? Will this continue to be true? Will we implement true private data with a Token, or something more secure?)

@hannahhoward
Copy link
Member

hannahhoward commented Oct 4, 2024

Currenty, I with @Peeja but I'd propose going a step farther -- I've been meaning to write this up but:
a. the thing we call "location claim" isn't a claim at all. It's a delegation from the storage node to the space that authorizes the space to download from it.
b. as @Peeja points out currently the storage node has no UCAN authorization -- this will not stay true. They are already working on putting UCAN in curio.
c. the actual "downloader" is whomever is downloading from the storage node. In this case, the downloader is ACTUALLY the gateway, on behalf of the space. This is why it makes sense for the audience to be the gateway. In the future, the downloader might be a direct user wanting to avoid the fees of going through our egress pipeline. The point being this delegation describes a relationship between the storage node and the person downloading data from that storage node.
d. This somewhat breaks the bearer token idea, but I do think petra is onto something there. We could have a double link, where the space (agent) authorizes the gateway to download from the storage node, and then the gateway issues a further delegation to download for the bearer token.

@hannahhoward
Copy link
Member

The only downside is that puts the generation of tokens on the gateway, taking some power out of the hands of the space

@travis
Copy link
Member Author

travis commented Oct 7, 2024

we had a video chat about this and @Peeja is currently writing up an RFC describing our latest thinking about how this should work - will link here when that's ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants