-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate blobs with read pipeline #49
Comments
Unfortunately no...not yet. It uses partition and inclusion claims to do the work that dudewhere/satnav were providing. When it knows the parts (shards) and the index info it assumes the location is the R2 bucket. In local.freeway I tweaked it to consider location claims, so we have the code to do what we need, it's just not in freeway yet. |
Synced with @alanshaw earlier today on this, in order to understand what is the planned path for integration between blobs with reads pipeline. Write side
Open questions:
Read sideOn the reads side, there are multiple interfaces, so let's see what would be the plan for each:
Open questions:
CompromisesOther "clients" of blobs that are not specific read interfaces, but either application layer or Filecoin dependencies will need to be compromised on first iteration. Tickets should be created to address these issues.
|
I have attempted to capture all the read points in the following document https://hackmd.io/@gozala/idx-publishing/edit I also made a visual map of the read pipeline here https://www.tldraw.com/s/v2_c_rU_WdpZ_BFEhY5VOspdE7?v=-564%2C-230%2C3028%2C2127&p=page |
I would propose implementing a unified index read / write interface so that we could switch all of the read interfaces. This should reduce a complexity of the system and require less contextual knowledge from the contributors. In terms of execution I suggest following plan
|
Missing: write to SQS multihashes queue for adding into IPNI
Interested to know what we'd write here (see further down in this comment for proposal)? Also, I'd encourage not writing directly to the backing store but via the ucanto handler.
Can you clarify? Do you mean we should write an entry for the blob itself, as well as for all the blocks it contains? If yes, then yes I agree 😄 .
We have Ucanto handlers for publishing all eixsting kinds of content claims.
I'd love this to run via the existing read interface. I put a lot of work into building it and it would be a shame to not see it being used (or adapted).
The HTTP API for content claims does this already. i.e. falls back to DynamoDB if no other claims exist. It'll need small tweaks.
Note that the spec'd DAG index doesn't really fit well with existing claims. It is more akin to a relation claim. Proposal for what claims to publish: Just fitting into existing claims:
It's kinda weird that the inclusion claims all point to the same CID but 🤷♂️ . Alternatievly, we could create a new claim that is an "index" claim?:
{
content: CID /* DAG root */,
index: CID /* w3-index CID */,
} So then you'd publish:
So then you publish fewer claims. Note: you still need the partition claim so you can have lcoation claims for the shards included in the response (see just below)
It already does this, and even provides options to get related claims via the
It already does this, needs minor tweaks.
If the existing content claims HTTP API is the read interface then there is nothing to do here 😄 |
Good point I have added storacha/w3up#1406 for this
Yes although we identified problem with a current spec that we'll need to address to do it see
Awesome I did discover that it is even used by filecoin pipeline already 😍
I want the same thing 🤩
We covered this in the call, but short version is that DAG Sharded index supposed to cover both partition and index claims, if something is missing we can extend it as needed to cover it all. |
Part of storacha/project-tracking#49 Note that currently Roundabout is used in production traffic for SPs to download Piece bytes, and is planned to be used by w3filecoin storefront to validate a Piece CID. ## SP reads 1. SPs request comes with a PieceCID, where we get equivalency claim for this Piece to some content. 2. In current world (`store/*` protocol), it will in most cases be a CAR CID that we can get from R2 `carpark-prod-0` as `carCid/carCid.car`. However, `store/add` does not really require this to be a CAR, so it could end up being other CIDs that are still stored with same key format in R2 bucket. 3. With new world (`blob/*` protocol), it will be a RAW CID that we can get from R2 `carpark-prod-0` as `b58btc(multihash)/b58btc(multihash).blob`. ## w3filecoin reads 1. `filecoin/offer` is performed with a given content CID 2. In current client world, a `CarCID` is provided on `filecoin/offer`. This CID is used to get bytes for the content, in order to derive Piece for validation. In addition, equivalency claim is issued with `CarCID` 3. With new world, we aim to have `filecoin/offer` to rely on RAW CIDs, which will be used for both reading content and issuing equivalency claims. ## This PR We need a transition period where we support both worlds. This PR enables roundabout to attempt to distinguish between a Blob and a CAR when it gets a retrieval request. If the CID requested is a CAR (or a Piece that equals a CAR), we can assume the old path and key format immediately. On the other hand, if CID requested is RAW, we may need to give back a Blob object or a "CAR" like stored object. For the transition period, this PR proposed that if we have a RAW content to locate, we MUST do a HEAD request to see if a Blob exists, and if so redirect to presigned URL for it. Otherwise, we need to fallback into old key formats. As an alternative, we could make the decision to make `store/add` handler not accept anymore non CAR CIDs, even though we would lose the ability to retrieve old things from Roundabout (which may be fine as well 🤔 ). Please note that this is still not hooked with content claims to figure out which bucket to use, and still relies on assumption of CF R2 `carpark-prod-0`. Just uses equivalency claims to map PieceCID to ContentCID
Pending deployment:
In progress PRs:
|
We need to make gateway and hoverboard support blobs stored in the system. Here is a current rough plan
ipni/offer
handler that will perform writes to bridge with current systemipni/offer
Tasks
The text was updated successfully, but these errors were encountered: