Skip to content

Commit

Permalink
feat: roundabout gets raw cids as blobs (#359)
Browse files Browse the repository at this point in the history
Part of storacha/project-tracking#49

Note that currently Roundabout is used in production traffic for SPs to
download Piece bytes, and is planned to be used by w3filecoin storefront
to validate a Piece CID.

## SP reads

1. SPs request comes with a PieceCID, where we get equivalency claim for
this Piece to some content.
2. In current world (`store/*` protocol), it will in most cases be a CAR
CID that we can get from R2 `carpark-prod-0` as `carCid/carCid.car`.
However, `store/add` does not really require this to be a CAR, so it
could end up being other CIDs that are still stored with same key format
in R2 bucket.
3. With new world (`blob/*` protocol), it will be a RAW CID that we can
get from R2 `carpark-prod-0` as
`b58btc(multihash)/b58btc(multihash).blob`.

## w3filecoin reads

1. `filecoin/offer` is performed with a given content CID
2. In current client world, a `CarCID` is provided on `filecoin/offer`.
This CID is used to get bytes for the content, in order to derive Piece
for validation. In addition, equivalency claim is issued with `CarCID`
3. With new world, we aim to have `filecoin/offer` to rely on RAW CIDs,
which will be used for both reading content and issuing equivalency
claims.

## This PR

We need a transition period where we support both worlds. 

This PR enables roundabout to attempt to distinguish between a Blob and
a CAR when it gets a retrieval request. If the CID requested is a CAR
(or a Piece that equals a CAR), we can assume the old path and key
format immediately. On the other hand, if CID requested is RAW, we may
need to give back a Blob object or a "CAR" like stored object.

For the transition period, this PR proposed that if we have a RAW
content to locate, we MUST do a HEAD request to see if a Blob exists,
and if so redirect to presigned URL for it. Otherwise, we need to
fallback into old key formats. As an alternative, we could make the
decision to make `store/add` handler not accept anymore non CAR CIDs,
even though we would lose the ability to retrieve old things from
Roundabout (which may be fine as well 🤔 ).

Please note that this is still not hooked with content claims to figure
out which bucket to use, and still relies on assumption of CF R2
`carpark-prod-0`. Just uses equivalency claims to map PieceCID to
ContentCID
  • Loading branch information
vasco-santos authored Apr 29, 2024
1 parent 5779161 commit cdb7c65
Show file tree
Hide file tree
Showing 7 changed files with 147 additions and 69 deletions.
15 changes: 15 additions & 0 deletions roundabout/constants.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import * as Raw from 'multiformats/codecs/raw'

export const RAW_CODE = Raw.code

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L140 */
export const CAR_CODE = 0x02_02

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L520 */
export const PIECE_V1_CODE = 0xf1_01

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L151 */
export const PIECE_V1_MULTIHASH = 0x10_12

/** https://github.com/multiformats/multicodec/pull/331/files */
export const PIECE_V2_MULTIHASH = 0x10_11
24 changes: 2 additions & 22 deletions roundabout/functions/redirect.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import * as Sentry from '@sentry/serverless'
import { S3Client } from '@aws-sdk/client-s3'
import { CID } from 'multiformats/cid'

import { getSigner } from '../index.js'
import { getSigner, contentLocationResolver } from '../index.js'
import { findEquivalentCids, asPieceCidV1, asPieceCidV2 } from '../piece.js'
import { getEnv, parseQueryStringParameters } from '../utils.js'

Expand All @@ -13,7 +13,7 @@ Sentry.AWSLambda.init({
})

/**
* AWS HTTP Gateway handler for GET /{cid} by CID or Piece CID
* AWS HTTP Gateway handler for GET /{cid} by CAR CID, RAW CID or Piece CID
*
* @param {import('aws-lambda').APIGatewayProxyEventV2} request
*/
Expand Down Expand Up @@ -86,26 +86,6 @@ async function resolvePiece (cid, locateContent) {
return { statusCode: 404, body: 'No content found for Piece CID' }
}

/**
* Creates a helper function that returns signed bucket url for a car cid,
* or undefined if the CAR does not exist in the bucket.
*
* @param {object} config
* @param {S3Client} config.s3Client
* @param {string} config.bucket
* @param {number} config.expiresIn
*/
function contentLocationResolver ({ s3Client, bucket, expiresIn }) {
const signer = getSigner(s3Client, bucket)
/**
* @param {CID} cid
*/
return async function locateContent (cid) {
const key = `${cid}/${cid}.car`
return signer.getUrl(key, { expiresIn })
}
}

/**
* AWS HTTP Gateway handler for GET /key/{key} by bucket key
*
Expand Down
38 changes: 36 additions & 2 deletions roundabout/index.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
import { getSignedUrl as getR2SignedUrl } from "@aws-sdk/s3-request-presigner"
import { GetObjectCommand } from "@aws-sdk/client-s3"
import { getSignedUrl as getR2SignedUrl } from '@aws-sdk/s3-request-presigner'
import { GetObjectCommand } from '@aws-sdk/client-s3'
import { base58btc } from 'multiformats/bases/base58'

import { RAW_CODE } from './constants.js'

/**
* @typedef {import('multiformats').CID} CID
* @typedef {import('@aws-sdk/client-s3').S3Client} S3Client
* @typedef {import('@aws-sdk/types').RequestPresigningArguments} RequestPresigningArguments
*/
Expand Down Expand Up @@ -31,3 +35,33 @@ export function getSigner (s3Client, bucketName) {
}
}
}

/**
* Creates a helper function that returns signed bucket url for content requested.
* It currently supports both `store/*` and `blob/*` protocol written content.
* Blobs are stored as `b58btc(multihash)/b58btc(multihash).blob` and requested to
* Roundabout via a RAW CID.
* Store protocol SHOULD receive CAR files that are stored as
* `carCid/carCid.car`.
*
* @param {object} config
* @param {S3Client} config.s3Client
* @param {string} config.bucket
* @param {number} config.expiresIn
*/
export function contentLocationResolver ({ s3Client, bucket, expiresIn }) {
const signer = getSigner(s3Client, bucket)
/**
* @param {CID} cid
*/
return async function locateContent (cid) {
const carKey = `${cid}/${cid}.car`

if (cid.code === RAW_CODE) {
const encodedMultihash = base58btc.encode(cid.multihash.bytes)
const blobKey = `${encodedMultihash}/${encodedMultihash}.blob`
return signer.getUrl(blobKey, { expiresIn })
}
return signer.getUrl(carKey, { expiresIn })
}
}
26 changes: 2 additions & 24 deletions roundabout/piece.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,8 @@
import './globals.js'

import { read } from '@web3-storage/content-claims/client'
import * as Raw from 'multiformats/codecs/raw'

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L140 */
export const CAR_CODE = 0x02_02

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L520 */
export const PIECE_V1_CODE = 0xf1_01

/** https://github.com/multiformats/multicodec/blob/master/table.csv#L151 */
export const PIECE_V1_MULTIHASH = 0x10_12

/** https://github.com/multiformats/multicodec/pull/331/files */
export const PIECE_V2_MULTIHASH = 0x10_11
import { PIECE_V1_CODE, PIECE_V1_MULTIHASH, PIECE_V2_MULTIHASH, RAW_CODE } from './constants.js'

/**
* @typedef {import('multiformats/cid').CID} CID
Expand All @@ -28,7 +17,7 @@ export const PIECE_V2_MULTIHASH = 0x10_11
* @param {CID} cid
*/
export function asPieceCidV2 (cid) {
if (cid.multihash.code === PIECE_V2_MULTIHASH && cid.code === Raw.code) {
if (cid.multihash.code === PIECE_V2_MULTIHASH && cid.code === RAW_CODE) {
return cid
}
}
Expand All @@ -44,17 +33,6 @@ export function asPieceCidV1 (cid) {
}
}

/**
* Return the cid if it is a CAR CID or undefined if not
*
* @param {CID} cid
*/
export function asCarCid(cid) {
if (cid.code === CAR_CODE) {
return cid
}
}

/**
* Find the set of CIDs that are claimed to be equivalent to the Piece CID.
*
Expand Down
91 changes: 71 additions & 20 deletions roundabout/test/index.test.js
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
import { test } from './helpers/context.js'

import {
PutObjectCommand,
} from '@aws-sdk/client-s3'

import { PutObjectCommand } from '@aws-sdk/client-s3'
import { encode } from 'multiformats/block'
import { CID } from 'multiformats/cid'
import { base58btc } from 'multiformats/bases/base58'
import { identity } from 'multiformats/hashes/identity'
import { sha256 as hasher } from 'multiformats/hashes/sha2'
import * as pb from '@ipld/dag-pb'
import { CarBufferWriter } from '@ipld/car'
import * as CAR from '@ucanto/transport/car'

import { getSigner } from '../index.js'
import { RAW_CODE } from '../constants.js'
import { getSigner, contentLocationResolver } from '../index.js'
import {
parseQueryStringParameters,
MAX_EXPIRES_IN,
Expand All @@ -27,17 +27,18 @@ test.before(async t => {
t.context.s3Client = client
})

test('can create signed url for object in bucket', async t => {
test('can create signed url for CAR in bucket and get it', async t => {
const bucketName = await createBucket(t.context.s3Client)
const carCid = await putCarToBucket(t.context.s3Client, bucketName)
const expiresIn = 3 * 24 * 60 * 60 // 3 days in seconds

const signer = getSigner(t.context.s3Client, bucketName)
const key = `${carCid}/${carCid}.car`
const signedUrl = await signer.getUrl(key, {
const locateContent = contentLocationResolver({
bucket: bucketName,
s3Client: t.context.s3Client,
expiresIn
})

const signedUrl = await locateContent(carCid)
if (!signedUrl) {
throw new Error('presigned url must be received')
}
Expand All @@ -48,6 +49,30 @@ test('can create signed url for object in bucket', async t => {
t.assert(fetchResponse.ok)
})

test('can create signed url for Blob in bucket and get it', async t => {
const bucketName = await createBucket(t.context.s3Client)
const blobCid = await putBlobToBucket(t.context.s3Client, bucketName)
const expiresIn = 3 * 24 * 60 * 60 // 3 days in seconds

const locateContent = contentLocationResolver({
bucket: bucketName,
s3Client: t.context.s3Client,
expiresIn
})

const signedUrl = await locateContent(blobCid)
if (!signedUrl) {
throw new Error('presigned url must be received')
}
t.truthy(signedUrl?.includes(`X-Amz-Expires=${expiresIn}`))

const encodedMultihash = base58btc.encode(blobCid.multihash.bytes)
t.truthy(signedUrl?.includes(`${encodedMultihash}/${encodedMultihash}.blob`))

const fetchResponse = await fetch(signedUrl)
t.assert(fetchResponse.ok)
})

test('fails to fetch from signed url for object not in bucket', async t => {
const bucketName = await createBucket(t.context.s3Client)
const carCid = CID.parse('bagbaiera222226db4v4oli5fldqghzgbv5rqv3n4ykyfxk7shfr42bfnqwua')
Expand Down Expand Up @@ -116,30 +141,56 @@ test('fails to parse expires query parameter when not acceptable value', t => {
t.throws(() => parseQueryStringParameters(queryParamsSmaller))
})

/**
* @param {import('@aws-sdk/client-s3').S3Client} s3Client
* @param {string} bucketName
*/
async function putCarToBucket (s3Client, bucketName) {
// Write original car to origin bucket
async function getContent () {
const id = await encode({
value: pb.prepare({ Data: 'a red car on the street!' }),
codec: pb,
hasher: identity,
})
const parent = await encode({
return await encode({
value: pb.prepare({ Links: [id.cid] }),
codec: pb,
hasher,
})
}

/**
* @param {import('@aws-sdk/client-s3').S3Client} s3Client
* @param {string} bucketName
*/
async function putBlobToBucket (s3Client, bucketName) {
// Write original car to origin bucket
const content = await getContent()
const encodedMultihash = base58btc.encode(content.cid.multihash.bytes)
const key = `${encodedMultihash}/${encodedMultihash}.blob`
await s3Client.send(
new PutObjectCommand({
Bucket: bucketName,
Key: key,
Body: content.bytes,
})
)

// Return RAW CID
return new CID(1, RAW_CODE, content.cid.multihash, content.cid.multihash.bytes)
}

/**
* @param {import('@aws-sdk/client-s3').S3Client} s3Client
* @param {string} bucketName
*/
async function putCarToBucket (s3Client, bucketName) {
// Write original car to origin bucket
const content = await getContent()
const car = CarBufferWriter.createWriter(Buffer.alloc(1000), {
roots: [parent.cid],
roots: [content.cid],
})
car.write(parent)
car.write(content)

const Body = car.close()
const carCid = await CAR.codec.link(Body)

const key = `${parent.cid.toString()}/${parent.cid.toString()}.car`
const key = `${carCid.toString()}/${carCid.toString()}.car`
await s3Client.send(
new PutObjectCommand({
Bucket: bucketName,
Expand All @@ -148,5 +199,5 @@ async function putCarToBucket (s3Client, bucketName) {
})
)

return parent.cid
return carCid
}
4 changes: 3 additions & 1 deletion roundabout/test/piece.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ import * as Raw from 'multiformats/codecs/raw'
import { sha256 } from 'multiformats/hashes/sha2'
import * as Digest from 'multiformats/hashes/digest'
import { Piece, MIN_PAYLOAD_SIZE } from '@web3-storage/data-segment'
import { findEquivalentCids, asCarCid, asPieceCidV1, asPieceCidV2, CAR_CODE } from '../piece.js'
import { asCarCid } from '../utils.js'
import { CAR_CODE } from '../constants.js'
import { findEquivalentCids, asPieceCidV1, asPieceCidV2 } from '../piece.js'

test('findEquivalentCids', async t => {
const bytes = new Uint8Array(MIN_PAYLOAD_SIZE)
Expand Down
18 changes: 18 additions & 0 deletions roundabout/utils.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
import { CAR_CODE } from './constants.js'

// Per https://developers.cloudflare.com/r2/api/s3/presigned-urls/
export const MAX_EXPIRES_IN = 3 * 24 * 60 * 60 // 7 days in seconds
export const MIN_EXPIRES_IN = 1
export const DEFAULT_EXPIRES_IN = 3 * 24 * 60 * 60 // 3 days in seconds by default

export const VALID_BUCKETS = ['dagcargo']

/**
* @typedef {import('multiformats/cid').CID} CID
**/

/**
* @param {import('aws-lambda').APIGatewayProxyEventPathParameters | undefined} queryStringParameters
*/
Expand Down Expand Up @@ -51,3 +57,15 @@ function mustGetEnv (name) {
if (!value) throw new Error(`Missing env var: ${name}`)
return value
}

/**
* Return the cid if it is a CAR CID or undefined if not
*
* @param {CID} cid
*/
export function asCarCid(cid) {
if (cid.code === CAR_CODE) {
return cid
}
}

0 comments on commit cdb7c65

Please sign in to comment.