add CidFromReader #126

rvagg · 2021-06-21T05:50:20Z

We now have 2 CID decoder functions in go-car that really belong here and we should move (and properly test) them. The basic functionality is that I have either an io.Reader or just a []byte but I don't know how big the CID is but I'm pretty sure I know where the start of it is. I should be able to extract the CID and get an offset to the end of the parsed CID bytes.

ReadCid(buf []byte) (cid.Cid, int, error) - read a Cid from buf and tell me the offset after read: https://github.com/ipld/go-car/blob/71cfa2fc2a619d646606373c5946282934270bd4/util/util.go#L22

ReadCid(store io.ReaderAt, at int64) (cid.Cid, int, error) - read a Cid from store and tell me the offset after read: https://github.com/ipld/go-car/blob/wip/v2/v2/internal/io/cid.go (wip/v2 branch, not yet in master).

We have decodeFirst(bytes) in js-multiformats to serve a very similar purpose. Having it in the core library has uncovered some other uses outside of CAR decoding too.

Suggestions for more explicit naming of these functions welcome!

The text was updated successfully, but these errors were encountered:

mvdan · 2021-06-21T09:24:45Z

I think we should just expose the []byte based API; one can always write the ReaderAt version in terms of it, assuming we can somewhat reliably predict how many bytes the CID will be. I guess technically the hash could be many many bytes, but in practice I imagine one could read a chunk of bytes (like 1024) and be pretty sure that the whole CID is contained within that read buffer.

rvagg · 2021-06-21T11:14:32Z

True, although the increasing use of identity CIDs makes this a bit messy. But, I now recall that I coupled decodeFirst() with another utility function in js-multiformats, inspectBytes() which could tell you how many you needed if you're in a situation where you might need to be concerned: https://github.com/multiformats/js-multiformats/blob/2fb9b6815d4b9cbb8212e2f1975b355d2d370e4b/src/cid.js#L312-L324

It only needs the first few bytes, a maximum of maybe 10. So it's a nice combo that could be used to safely deal with the flexibility challenges.

A maximal implementation could look like this and could also be used internally by a ReadCid() to do part of the decoding since it needs all of these things which are gleaned from the first few bytes anyway (at least this is what we do in JS now):

InspectBytes(buf []byte) (version int, codec int, multihashCode int, digestSize int, multihashSize int, size int)

That's probably too many return values and probably should be a struct and then I guess there'll be arguments about too many allocations, so a minimal form could just return the size I suppose.

rvagg · 2021-06-21T12:24:39Z

Thanks to @ribasushi for reminding me of a relevant thread on identity lengths just now! multiformats/multihash#130

mvdan · 2021-06-21T12:32:58Z

What you say makes sense; an "inspect" API for the cases where one wants to support potentially huge CIDs, and a "read" API for the cases where one doesn't need to support them - so erroring with e.g. io.ErrUnexpectedEOF would be fine.

arguments about too many allocations

Returning a non-pointer struct shouldn't allocate; it should be practically equivalent to returning the list of parameters. It's more of a question of which is nicer in terms of API. You could also deduplicate the types in the return parameters, like:

InspectBytes(buf []byte) (version, codec, multihashCode, digestSize, multihashSize, size int)

Stebalien · 2021-06-25T16:06:18Z

We already have CidFromBytes which I believe is what you're looking for. We could also add a CidFromReader if you need it to work with an io.Reader.

rvagg · 2021-06-26T01:08:20Z

mm, you're right, I thought that was just for a strictly correct length number of bytes but that seems to do mostly what's needed here. We should try and use it in go-car to see what the limits are and what else we need to add.

mvdan · 2021-07-01T16:00:24Z

Confirming that CidFromBytes works for the cases where we're decoding from a buffer: ipld/go-car#131

That said, I think a CidFromReader would still be very useful. We need that for stream-like reading of CARv1 files, for example: https://ipld.io/specs/transport/car/carv1/#cid

Right now we implement that in an internal package inside go-car, which is not ideal. Plus, we didn't implement CIDv0 decoding in that copy, so that's biting @raulk when he tries to use our carv2 module 🤦

I'll send a PR for CidFromReader shortly.

mvdan · 2021-07-01T16:07:08Z

I'm also not discarding having an "inspect" API in the future, but it's not useful for go-car right now, and it doesn't consume a fixed number of bytes, so... It's not clear to me that it's a clear win and needed right now. I'd put it off until someone actually needs it.

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

mvdan · 2021-07-02T10:01:41Z

Now at #127.

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

And reuse two CidFromBytes tests for it, which includes both CIDv0 and CIDv1 cases as inputs, as well as some inputs that should error. Fixes #126.

rvagg added the need/triage Needs initial labeling and prioritization label Jun 21, 2021

rvagg mentioned this issue Jun 21, 2021

Implement initial read-only blockstore ipld/go-car#88

Merged

ipfs deleted a comment from welcome bot Jun 21, 2021

masih mentioned this issue Jun 21, 2021

Move CID decoding functionality to go-cid ipld/go-car#91

Closed

aschmahmann added status/ready Ready to be worked need/analysis Needs further analysis before proceeding and removed need/triage Needs initial labeling and prioritization status/ready Ready to be worked labels Jun 25, 2021

mvdan changed the title ~~TODO: implement ReadCid(bytes) and ReadCid(io.ReaderAt)~~ add CidFromReader Jul 1, 2021

mvdan added a commit that referenced this issue Jul 2, 2021

implement CidFromReader

e3b3357

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

mvdan added a commit that referenced this issue Jul 2, 2021

implement CidFromReader

f0fa7c0

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

mvdan added a commit that referenced this issue Jul 2, 2021

implement CidFromReader

793bca4

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

mvdan added a commit that referenced this issue Jul 2, 2021

implement CidFromReader

41f2377

And reuse a CidFromBytes test for it, which includes both CIDv0 and CIDv1 cases as inputs. Fixes #126.

mvdan added a commit that referenced this issue Jul 14, 2021

implement CidFromReader

a00a1e5

And reuse two CidFromBytes tests for it, which includes both CIDv0 and CIDv1 cases as inputs, as well as some inputs that should error. Fixes #126.

mvdan added a commit that referenced this issue Jul 14, 2021

implement CidFromReader

8f4ec9e

And reuse two CidFromBytes tests for it, which includes both CIDv0 and CIDv1 cases as inputs, as well as some inputs that should error. Fixes #126.

mvdan closed this as completed in c4c8760 Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CidFromReader #126

add CidFromReader #126

rvagg commented Jun 21, 2021

mvdan commented Jun 21, 2021

rvagg commented Jun 21, 2021

rvagg commented Jun 21, 2021

mvdan commented Jun 21, 2021 •

edited

Loading

Stebalien commented Jun 25, 2021

rvagg commented Jun 26, 2021

mvdan commented Jul 1, 2021

mvdan commented Jul 1, 2021

mvdan commented Jul 2, 2021

add CidFromReader #126

add CidFromReader #126

Comments

rvagg commented Jun 21, 2021

mvdan commented Jun 21, 2021

rvagg commented Jun 21, 2021

rvagg commented Jun 21, 2021

mvdan commented Jun 21, 2021 • edited Loading

Stebalien commented Jun 25, 2021

rvagg commented Jun 26, 2021

mvdan commented Jul 1, 2021

mvdan commented Jul 1, 2021

mvdan commented Jul 2, 2021

mvdan commented Jun 21, 2021 •

edited

Loading