-
Notifications
You must be signed in to change notification settings - Fork 3
wip: migrate to latest multiformats #16
base: master
Are you sure you want to change the base?
Conversation
It's so hard to review this when most of it is just indentation changes! It seems that the major changes are really just in hoisting the |
@rvagg There is this really handy thing in case you were not aware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mikeal for updating this and other libraries to changes in multiformats. I'm hesitant to express strong opinions here. I personally think that compositional approach illustrated in dag.js
from multiformats/js-multiformats#38 is a better alternative to this, but I am obviously biased.
For better clarity in review communications I use following iconography in my reviews
- 🚨 Problem that needs to be addressed (please change)
- 💣 Could be a problem (please use your own judgment to decide)
- 💭 Just a though or on opinion (take it or leave it)
- 📝 Note (do what you want with it)
- ❓Question (likely a signal that code comment would be good there)
index.js
Outdated
const b = coerce(value) | ||
return coerce(b.buffer.slice(b.byteOffset, b.byteOffset + b.byteLength)) | ||
} | ||
const reader = createReader(CID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓Any reason to pass CID in vs just let createReader
import it instead ? If there is a the reason it would be good to have a comment explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, i was just trying to get it working again but there’s no need for the dep injection anymore.
index.js
Outdated
get codec () { | ||
if (this.opts.code) { | ||
this.opts.codec = multicodec.get(this.opts.code).name | ||
get hasher () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I think it would make things a lot cleaner if there were dedicated Block
so that when you create it with CID an appropriate hasher would be pulled and set as a property etc... That way if invariants don't hold (nor cid nor codec was provided, or cid is provided and hasher is not available) error is thrown at construction rather than later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there’s a bigger refactor that I’m going to want to do to the construct and class methods that i haven’t done yet. I wanted to first see the existing implementation ported over without much API changes, but I’m still debating with myself whether or not i’m going to tackle the bigger constructor refactor before i get this merged or just do it now so that I can combine the breaking changes to all the APIs.
index.js
Outdated
this.opts.codec = multicodec.get(this.opts.cid.code).name | ||
} | ||
return this.opts.codec | ||
if (!this.opts.hasher) throw new Error('Do not have hash implementation') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💣 I think throwing error from property accessory is a bad pattern, APIs that do that tend to be really painful to work with (e.g. some DOM bindings that do it). I would suggest return null
here and and do the throw from the method that needs to use hasher instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is a bit of a hack that I plan on factoring out.
index.js
Outdated
if (this.opts.cid) return this.opts.cid.code | ||
if (!this.opts.code) { | ||
this.opts.code = multicodec.get(this.codec).code | ||
get codec () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓Why not return codec from here just like hasher
returns hasher
? User can lookup .name
if that's what the y need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t want to take that breaking change yet because it’s going to cause a lot of churn in the tests. I also hate to change the value of existing properties rather than just migrating to new properties because the errors during migration tend to be much more painful, but maybe it’s worth it here since codec
is really the only good name for this.
index.js
Outdated
const data = this.encodeUnsafe() | ||
const hash = await this.hasher.digest(data) | ||
if (bytes.equals(cid.multihash.bytes, hash.bytes)) return true | ||
throw new Error('Bytes do not match') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I think this another case where dedicated constructors would make more sense. E.g. you could have a Block.createUnsafe(cid, bytes)
and Block.createSafe(cid, bytes)
where later delegates to former after validation. That would also get rid of this in cases where you know no validation is needed like when you create block e.g from JS value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on moving to dedicated constructors but hadn’t identified these as separate methods, but I like it. Although I think I’m just going to go with create()
and createUnsafe()
to match the other unsafe API forms.
index.js
Outdated
async equals (block) { | ||
if (block === this) return true | ||
const cid = await this.cid() | ||
if (block.asCID === block) return cid.equals(block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I find the choice to return true
when comparing block to cid questionable, but it was the already so 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be cleaned up by just renaming the variable. The API is meant to take either a CID or a Block, which you may not like, but that’s the intention and the fact that the variable is named block
makes this very confusing.
I don’t want to diverge too far from the current approach in terms of presenting a class with along with some methods to help you create instances, along with a registry of codecs. It’s not that I’m adamant that this is the right approach for every case, but I do think that this approach works quite well in some cases and I want to make sure the libraries underneath However, I’d prefer to have this library presenting and producing block instances that conform to a lower level block interface defined by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeal per your request I have provided some feedback and made several suggestions. I do still however feel that this Block
class tries to combine following:
- Encode JS value
- Decode block from bytes
- Decode block from CID + bytes
I understand that this allows recipient of the Block
not to care which from the above 3 scenarios took place (which is great), but unfortunately it is forced upon, meaning recipient has no way of choosing alternative code paths optimized for specific scenario because there is no way to distinguish between them (without probing internal properties). I think it is possible to both:
- Free recipient from having to know which scenario took place (by providing common interface)
- Allow scenario specific optimizations when recipient cares.
That is why implementation I have been proposing attempted to separate those use case from each other while putting those under the same Block
base class.
Another thing that I find to be questionable is that laziness is weaved into all this. Don't get me wrong lazy data structures are great, but unpredictable performance characteristics aren't (as many haskellers would testify). So I would highly recommend further separating concerns there by having:
Block
as in materialized instance that has bothdata
andbytes
, so it no longer needscodec
(but may still need a hasher).BlockDecoder
that is a lazyBlock
that performs decode on demand. It could even subclassBlock
and overridetoValue()
method (or something along those lines).BlockEncoder
that is a lazyBlock
that performs encode on demand. It could also can subclassBlock
and overridetoBytes()
method (or something along those lines).
That would allow consumer not to care when they don't need to, but would enable appropriate treatment when that makes sense. Furthermore this creates an opportunity to do non-lazy decode / encode where that is more appropriate.
if (!codec) throw new Error('Missing codec') | ||
return new Block({ data, codec, hasher }) | ||
} | ||
Block.createUnsafe = (data, cid, { hasher, codec } = {}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think createUnsafeDecoder
is more descriptive name, and also makes it clear why codec
needs to be provided. I think it should not take hasher
because it does not need it.
Block.createUnsafe = (data, cid, { hasher, codec } = {}) => { | |
Block.createUnsafeDecoder = (data, cid, { codec } = {}) => { |
if (!codec) throw new Error(`Missing codec ${cid.code}`) | ||
return new Block({ data, cid, codec, hasher: hasher || null }) | ||
} | ||
Block.create = async (data, cid, { hasher, codec } = {}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling this decode
would be more descriptive and explain why it's async.
Block.create = async (data, cid, { hasher, codec } = {}) => { | |
Block.decode = async (data, cid, { hasher, codec } = {}) => { |
BlockWithIs.multiformats = multiformats | ||
BlockWithIs.CID = CID | ||
return BlockWithIs | ||
return Block.createUnsafe(data, cid, { hasher, codec }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this this is a reason why createUnsafe
to takes hasher
. Just using constructor here can simplify createUnsafe
.
return Block.createUnsafe(data, cid, { hasher, codec }) | |
return new Block({ cid, data, codec, hasher }) |
index.js
Outdated
Block.createUnsafe = (data, cid, { hasher, codec } = {}) => { | ||
codec = codec || Block.codecs.get(cid.code) | ||
if (!codec) throw new Error(`Missing codec ${cid.code}`) | ||
return new Block({ data, cid, codec, hasher: hasher || null }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires change above.
return new Block({ data, cid, codec, hasher: hasher || null }) | |
return new Block({ data, cid, codec }) |
As broader question, I’m curios what are the use cases for lazy block encode and lazy block decode. Are there cases where you'd want to create a Block but never materialize it (not talking about CID, but turning bytes to JS value or other way round) ? I am inclined to think that if you do never end up materializing a block you're probably choosing a wrong interface for your library or a program. If you do end up materializing it later on in the program than all you achieve by deferring is making performance characteristics less deterministic, because bunch of computations can (and usually will) accumulate and then will be forced in one large batch that tends to create spikes on load. That is just to say that it's best to make conscious choice when deferring computations. It also makes reasoning about code (as in when reading it) a lot easier (less context is required to infer it's behavior) |
Co-authored-by: Irakli Gozalishvili <[email protected]>
Co-authored-by: Irakli Gozalishvili <[email protected]>
Co-authored-by: Irakli Gozalishvili <[email protected]>
Co-authored-by: Irakli Gozalishvili <[email protected]>
Creating blocks you never encode is surprisingly common in multi-block data structures during mutation. You often have interfaces that create a bunch of blocks on mutation, but many of those are then orphaned by another mutation operation before they are ever written to the block store. Ideally you could factor this out using bulk operations but we don’t have reasonable bulk operations for some data structures (like our HAMT). Creating blocks you never decode is very common because that’s what happens during a lot of replication operations. Whenever you’re moving data from one store to another you can avoid a decode. |
I wonder if this indicative of the other gap multiformats/js-multiformats#37
|
This sounds similar to what I tried to represent via |
Please note that current approach complicates things for both cases:
Note: That I fully understand that when building up dags you may have some pre-existing (already materialized) blocks interleaved with new (not yet materialized) blocks. But argument I'm making is that actor doing the dag building needs to be able to be able to distinguish so it can choose the best strategy to transfer the recipe in the most effective way to another actor (which may be on the separate thread, process or even machine). In fact you'd likely even want some way to represent remote blocks so that they don't need to roundtrip and having to do all this through lazy |
@Gozala latest push splits the BlockEncoder and BlockDecoder classes out. |
class BlockEncoder extends Block { | ||
encodeUnsafe () { | ||
if (this._data) return this._data | ||
if (!this._codec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is obsolete because it's validated at the construction site.
class BlockDecoder extends Block { | ||
decodeUnsafe () { | ||
if (typeof this.opts._source !== 'undefined') return this._source | ||
if (!this._codec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is obsolete because validation occurs at construction site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out, there is a case in which you want to create a Block without a codec attached. i actually have to drop the constructor error and move that check to the encoder/decoder methods, but it’ll still be necessary here so that you get a good error if you try to decode a block that doesn’t have a codec attached
if (cid) setImmutable(this, '_cid', cid) | ||
if (data) setImmutable(this, '_data', data) | ||
if (!source && (!data || !cid)) throw new Error('Missing required argument') | ||
if (source && (!codec || !hasher)) throw new Error('Missing required argument') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think things would be a lot cleaner if Block.encoder
and Block.decoder
would do above invariant checks instead of deferring that to shared constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, i’m going to move all of these in the new patch to js-multiformats
.
Still need to port some codecs and fill in some coverage, but all the tests are passing against
dag-cbor
.This gets the
Block
interface up to date with @Gozala’s latest refactor. I’d like to take a moment to note how good thework has been on that refactor. You can see it here in the diff, everywhere that we touch multiformats it’s a little bit
cleaner and also a little more explicit and even adds some features. All of this while getting rid of a lot of the harder to work
with parts of this API that came from the prior multiformats dependency injection, and we still get to keep all the same features and codec registry just hoisted to the Block interface. This was a some really great work on @Gozala’s part and it really shows.