figure out where we should store gateway-assigned content commitments (and implement it) #138

travis · 2024-09-20T21:45:22Z

We don't currently have a solid plan for storing and indexing the "content commitments" that get delegated to the gateway and used to "prove" that an invocation requesting a given CID's data is valid.

I'd propose we use roughly the same pattern we use for access/claim-related delegations, but in Cloudflare - store the raw delegation data in R2 and create some rough indexes in KV to map from "auth token" to delegations.

I think the accounting service may be the right place for this to live - uploaders can tell it about new commitments to the gateway and the gateway can query it for commitments.

The text was updated successfully, but these errors were encountered:

Peeja · 2024-10-04T15:22:02Z

What started as a comment and clarification became a whole proposal.

A Location Claim is a specific kind of Content Claim. A Content Claim is a verifiable (ie, signed) claim about a piece of content which can be distributed so that entities other than the original claimer can publish them and prove their validity. A Location Claim specifically claims that a piece of content may be located and retrieved via some method, one which does not need to know about anything web3-y—most commonly, an HTTP URL. A Location Claim tells anyone where to find the content.

A Location Claim does not authorize anyone to retrieve the content. That's a separate concern. Moreover, a Location Claim is an attestation made by a Storage Node that it can provide the content. It's not the Storage Node's responsibility to determine if a request is authorized, so the Location Claim can't implement that.

Who does decide if a request is authorized? A moment ago, I would have said the Gateway does. But after thinking it through a bit, I think that's not in line with how Storacha works. Storacha is built on self-sovereignty. Who decides who is authorized to access a piece of content? The owner of the content: the Space.

Who is the gatekeeper? Who looks at the authorization to decide who actually gets to access the content? It could be the Storage Node, as they're closest to the stored content, but we'd like to leave them out of the authorization system, at least for now—we want them to be able to act like an FTP server we keep stuff on. So the duty falls to the next step away from the content: the Gateway. The Gateway will be looking at UCAN's and deciding whether or not to fulfill requests based on authority delegated, ultimately, from the Space.

And so here's where the UCANs finally come in. The root authority to access content within a Space—call it /retrieve/content for now—lies with the Space; the Space is the subject (more on this in a moment). The Space must then delegate that authority, and it starts by delegating it (as *, or / in UCAN 1.0) to an Account, which eventually delegates it to an Agent. The Agent must then delegate it to…whom? In English, "Who is allowed to retrieve the content of the Space?" In English, "The Downloader is allowed to retrieve the content of the Space." Thus, we should logically delegate to the Downloader.

Now, what's a Downloader, again? The Downloader is the entity which retrieves content, with a notable caveat: the Downloader, much like the Storage Node at the other end of the chain, doesn't speak UCAN. It has only an HTTP URL, which we'd like to contain two things: a reference to the content (CID? Path?) and a Token. The Token is enough to authorize retrieval. The Token may have a lifetime, and the Token should be revokable, but while the Token is valid, simply providing it is enough to be able to authorize them for retrieval. It is thus a Bearer Token.

So the Downloader doesn't speak UCAN, but does have a form of authentication: the Token. Anyone holding the Token is authorized, so it must authenticate them as an entity which is authorized. Who is that entity?

I posit that the entity is the Token itself! Just as a keypair can be UCAN principal, so, I propose, can a Token. I propose a new DID method, bearer, which takes the form did:bearer:<bearer-token>. Thus, creating a Token might look like:

{
  "iss": "did:key:zAgent",
  "aud": "did:bearer:f78b5c9e-4913-4b6b-9600-1b28bde09ef6",
  "sub": "did:key:zSpace",
  "cmd": "/retrieve/content",
  "pol": [
    ["==", ".cid", "bafy...data"],
  ]
}

This communicates to the Gateway the intended authorization, by delegated authority of the Space. In English, "Whoever bears this Token may retrieve from the Space this piece of content." This is enforceable by the Gateway. If a request bears the token, it finds the content by way of Location Claim and proxies it to the Downloader. If it does not, or if the delegation has become invalid, the request is unauthorized.

Some notes:

I've proposed the Space as the subject here. Potentially, the content itself makes more sense, since it's the ultimate Resource in question, but currently a piece of content can't be a Principal. (~~I'm still not clear on how any Resource that's not a Principal can ever be a subject, since the delegation chain couldn't end at a self-issued delegation.~~) Using the Space as the subject has some advantages, though: notably, every token-authenticated request is bound to a Space, and thus a billing account, while the CID itself may be shared among Spaces; also we potentially have the full power of UCAN 1.0 policies available to let the customer authorize an entire Space, or some subset. [E: Ah, reading closer, in UCAN 1.0, a Subject must be a Principal, for just this reason. The Resource, if it's different from the Subject should be described using a policy, just like this. Despite being a section in the spec, this appears to be a best practice rather than its own distinct feature.]
It's unfortunate that this sticks the Agent in the delegation chain for the lifetime of the Token, since Agents are meant to be ephemeral. But that's a problem we already have elsewhere.
Note that anyone who knows the contents of the UCAN delegation knows the Token, and thus is themselves the audience. The delegation becomes, in essence, a "bearer delegation". But we don't need to ship these outside the service, so they become exactly as sensitive as the Token itself. The Token should not be terribly sensitive; it's meant to attach requests to billing, not secure sensitive data. (? Is this true? Will this continue to be true? Will we implement true private data with a Token, or something more secure?)

hannahhoward · 2024-10-04T16:01:11Z

Currenty, I with @Peeja but I'd propose going a step farther -- I've been meaning to write this up but:
a. the thing we call "location claim" isn't a claim at all. It's a delegation from the storage node to the space that authorizes the space to download from it.
b. as @Peeja points out currently the storage node has no UCAN authorization -- this will not stay true. They are already working on putting UCAN in curio.
c. the actual "downloader" is whomever is downloading from the storage node. In this case, the downloader is ACTUALLY the gateway, on behalf of the space. This is why it makes sense for the audience to be the gateway. In the future, the downloader might be a direct user wanting to avoid the fees of going through our egress pipeline. The point being this delegation describes a relationship between the storage node and the person downloading data from that storage node.
d. This somewhat breaks the bearer token idea, but I do think petra is onto something there. We could have a double link, where the space (agent) authorizes the gateway to download from the storage node, and then the gateway issues a further delegation to download for the bearer token.

hannahhoward · 2024-10-04T16:02:29Z

The only downside is that puts the generation of tokens on the gateway, taking some power out of the hands of the space

travis · 2024-10-07T10:58:12Z

we had a video chat about this and @Peeja is currently writing up an RFC describing our latest thinking about how this should work - will link here when that's ready!

travis mentioned this issue Sep 20, 2024

🗂️ Infra to bill users for egress #135

Closed

8 tasks

travis changed the title ~~figure out where we store gateway-assigned location commitments~~ figure out where we store gateway-assigned location commitments (and implement it) Sep 20, 2024

travis mentioned this issue Sep 20, 2024

Gateway should only serve content when UCAN-authorized to #140

Open

travis changed the title ~~figure out where we store gateway-assigned location commitments (and implement it)~~ figure out where we should store gateway-assigned location commitments (and implement it) Sep 20, 2024

hannahhoward added this to Storacha Project Planning Sep 23, 2024

hannahhoward moved this to Sprint Backlog in Storacha Project Planning Sep 23, 2024

travis changed the title ~~figure out where we should store gateway-assigned location commitments (and implement it)~~ figure out where we should store gateway-assigned content commitments (and implement it) Oct 4, 2024

travis mentioned this issue Oct 7, 2024

feat: return Claim from ContentClaimsLocator#locate storacha/blob-fetcher#12

Closed

travis assigned travis and Peeja and unassigned travis Oct 9, 2024

hannahhoward closed this as completed Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figure out where we should store gateway-assigned content commitments (and implement it) #138

figure out where we should store gateway-assigned content commitments (and implement it) #138

travis commented Sep 20, 2024 •

edited

Loading

Peeja commented Oct 4, 2024 •

edited

Loading

hannahhoward commented Oct 4, 2024 •

edited

Loading

hannahhoward commented Oct 4, 2024

travis commented Oct 7, 2024

figure out where we should store gateway-assigned content commitments (and implement it) #138

figure out where we should store gateway-assigned content commitments (and implement it) #138

Comments

travis commented Sep 20, 2024 • edited Loading

Peeja commented Oct 4, 2024 • edited Loading

hannahhoward commented Oct 4, 2024 • edited Loading

hannahhoward commented Oct 4, 2024

travis commented Oct 7, 2024

travis commented Sep 20, 2024 •

edited

Loading

Peeja commented Oct 4, 2024 •

edited

Loading

hannahhoward commented Oct 4, 2024 •

edited

Loading