wip: migrate to latest multiformats #16

mikeal · 2020-10-01T04:56:19Z

Still need to port some codecs and fill in some coverage, but all the tests are passing against dag-cbor.

This gets the Block interface up to date with @Gozala’s latest refactor. I’d like to take a moment to note how good the
work has been on that refactor. You can see it here in the diff, everywhere that we touch multiformats it’s a little bit
cleaner and also a little more explicit and even adds some features. All of this while getting rid of a lot of the harder to work
with parts of this API that came from the prior multiformats dependency injection, and we still get to keep all the same features and codec registry just hoisted to the Block interface. This was a some really great work on @Gozala’s part and it really shows.

rvagg · 2020-10-01T05:11:54Z

It's so hard to review this when most of it is just indentation changes! It seems that the major changes are really just in hoisting the Block.codecs registry up here, with add() and friends, but otherwise it's the same. Correct?

Gozala · 2020-10-01T17:37:17Z

It's so hard to review this when most of it is just indentation changes!

@rvagg There is this really handy thing in case you were not aware

Gozala

Thanks @mikeal for updating this and other libraries to changes in multiformats. I'm hesitant to express strong opinions here. I personally think that compositional approach illustrated in dag.js from multiformats/js-multiformats#38 is a better alternative to this, but I am obviously biased.

For better clarity in review communications I use following iconography in my reviews

🚨 Problem that needs to be addressed (please change)

💣 Could be a problem (please use your own judgment to decide)

💭 Just a though or on opinion (take it or leave it)

📝 Note (do what you want with it)

❓Question (likely a signal that code comment would be good there)

Gozala · 2020-10-01T17:39:58Z

index.js

+  const b = coerce(value)
+  return coerce(b.buffer.slice(b.byteOffset, b.byteOffset + b.byteLength))
+}
+const reader = createReader(CID)


❓Any reason to pass CID in vs just let createReader import it instead ? If there is a the reason it would be good to have a comment explaining.

good point, i was just trying to get it working again but there’s no need for the dep injection anymore.

index.js

Gozala · 2020-10-01T17:52:02Z

index.js

-    get codec () {
-      if (this.opts.code) {
-        this.opts.codec = multicodec.get(this.opts.code).name
+  get hasher () {


💭 I think it would make things a lot cleaner if there were dedicated Block so that when you create it with CID an appropriate hasher would be pulled and set as a property etc... That way if invariants don't hold (nor cid nor codec was provided, or cid is provided and hasher is not available) error is thrown at construction rather than later on.

I agree, there’s a bigger refactor that I’m going to want to do to the construct and class methods that i haven’t done yet. I wanted to first see the existing implementation ported over without much API changes, but I’m still debating with myself whether or not i’m going to tackle the bigger constructor refactor before i get this merged or just do it now so that I can combine the breaking changes to all the APIs.

Gozala · 2020-10-01T17:55:57Z

index.js

-        this.opts.codec = multicodec.get(this.opts.cid.code).name
-      }
-      return this.opts.codec
+      if (!this.opts.hasher) throw new Error('Do not have hash implementation')


💣 I think throwing error from property accessory is a bad pattern, APIs that do that tend to be really painful to work with (e.g. some DOM bindings that do it). I would suggest return null here and and do the throw from the method that needs to use hasher instead.

I agree, this is a bit of a hack that I plan on factoring out.

Gozala · 2020-10-01T18:01:42Z

index.js

-      if (this.opts.cid) return this.opts.cid.code
-      if (!this.opts.code) {
-        this.opts.code = multicodec.get(this.codec).code
+  get codec () {


❓Why not return codec from here just like hasher returns hasher ? User can lookup .name if that's what the y need.

I didn’t want to take that breaking change yet because it’s going to cause a lot of churn in the tests. I also hate to change the value of existing properties rather than just migrating to new properties because the errors during migration tend to be much more painful, but maybe it’s worth it here since codec is really the only good name for this.

Gozala · 2020-10-01T18:08:18Z

index.js

+    const data = this.encodeUnsafe()
+    const hash = await this.hasher.digest(data)
+    if (bytes.equals(cid.multihash.bytes, hash.bytes)) return true
+    throw new Error('Bytes do not match')


💭 I think this another case where dedicated constructors would make more sense. E.g. you could have a Block.createUnsafe(cid, bytes) and Block.createSafe(cid, bytes) where later delegates to former after validation. That would also get rid of this in cases where you know no validation is needed like when you create block e.g from JS value.

I was planning on moving to dedicated constructors but hadn’t identified these as separate methods, but I like it. Although I think I’m just going to go with create() and createUnsafe() to match the other unsafe API forms.

Gozala · 2020-10-01T18:14:38Z

index.js

+  async equals (block) {
+    if (block === this) return true
+    const cid = await this.cid()
+    if (block.asCID === block) return cid.equals(block)


💭 I find the choice to return true when comparing block to cid questionable, but it was the already so 🤷‍♂️

This can be cleaned up by just renaming the variable. The API is meant to take either a CID or a Block, which you may not like, but that’s the intention and the fact that the variable is named block makes this very confusing.

index.js

mikeal · 2020-10-01T19:30:16Z

Thanks @mikeal for updating this and other libraries to changes in multiformats. I'm hesitant to express strong opinions here. I personally think that compositional approach illustrated in dag.js from multiformats/js-multiformats#38 is a better alternative to this, but I am obviously biased.

I don’t want to diverge too far from the current approach in terms of presenting a class with along with some methods to help you create instances, along with a registry of codecs. It’s not that I’m adamant that this is the right approach for every case, but I do think that this approach works quite well in some cases and I want to make sure the libraries underneath Block (codecs and multiformats) support this approach.

However, I’d prefer to have this library presenting and producing block instances that conform to a lower level block interface defined by multiformats. And that interface should be designed to support this approach as well as the approach you’re exploring in dag.js. That obviously isn’t what is here yet, but I wanted to see what this looks like on top of the latest multiformats types before I tried to bring design considerations back into a muliformats/block interface.

Gozala

@mikeal per your request I have provided some feedback and made several suggestions. I do still however feel that this Block class tries to combine following:

Encode JS value
Decode block from bytes
Decode block from CID + bytes

I understand that this allows recipient of the Block not to care which from the above 3 scenarios took place (which is great), but unfortunately it is forced upon, meaning recipient has no way of choosing alternative code paths optimized for specific scenario because there is no way to distinguish between them (without probing internal properties). I think it is possible to both:

Free recipient from having to know which scenario took place (by providing common interface)
Allow scenario specific optimizations when recipient cares.

That is why implementation I have been proposing attempted to separate those use case from each other while putting those under the same Block base class.

Another thing that I find to be questionable is that laziness is weaved into all this. Don't get me wrong lazy data structures are great, but unpredictable performance characteristics aren't (as many haskellers would testify). So I would highly recommend further separating concerns there by having:

Block as in materialized instance that has both data and bytes, so it no longer needs codec (but may still need a hasher).
BlockDecoder that is a lazy Block that performs decode on demand. It could even subclass Block and override toValue() method (or something along those lines).
BlockEncoder that is a lazy Block that performs encode on demand. It could also can subclass Block and override toBytes() method (or something along those lines).

That would allow consumer not to care when they don't need to, but would enable appropriate treatment when that makes sense. Furthermore this creates an opportunity to do non-lazy decode / encode where that is more appropriate.

index.js

Gozala · 2020-10-05T21:00:04Z

index.js

+  if (!codec) throw new Error('Missing codec')
+  return new Block({ data, codec, hasher })
+}
+Block.createUnsafe = (data, cid, { hasher, codec } = {}) => {


I think createUnsafeDecoder is more descriptive name, and also makes it clear why codec needs to be provided. I think it should not take hasher because it does not need it.

Suggested change

Block.createUnsafe = (data, cid, { hasher, codec } = {}) => {

Block.createUnsafeDecoder = (data, cid, { codec } = {}) => {

Gozala · 2020-10-05T21:03:16Z

index.js

+  if (!codec) throw new Error(`Missing codec ${cid.code}`)
+  return new Block({ data, cid, codec, hasher: hasher || null })
+}
+Block.create = async (data, cid, { hasher, codec } = {}) => {


I think calling this decode would be more descriptive and explain why it's async.

Suggested change

Block.create = async (data, cid, { hasher, codec } = {}) => {

Block.decode = async (data, cid, { hasher, codec } = {}) => {

Gozala · 2020-10-05T21:04:23Z

index.js

-  BlockWithIs.multiformats = multiformats
-  BlockWithIs.CID = CID
-  return BlockWithIs
+  return Block.createUnsafe(data, cid, { hasher, codec })


I think this this is a reason why createUnsafe to takes hasher. Just using constructor here can simplify createUnsafe.

Suggested change

return Block.createUnsafe(data, cid, { hasher, codec })

return new Block({ cid, data, codec, hasher })

Gozala · 2020-10-05T21:07:01Z

index.js

+Block.createUnsafe = (data, cid, { hasher, codec } = {}) => {
+  codec = codec || Block.codecs.get(cid.code)
+  if (!codec) throw new Error(`Missing codec ${cid.code}`)
+  return new Block({ data, cid, codec, hasher: hasher || null })


Requires change above.

Suggested change

return new Block({ data, cid, codec, hasher: hasher || null })

return new Block({ data, cid, codec })

Gozala · 2020-10-05T21:46:53Z

As broader question, I’m curios what are the use cases for lazy block encode and lazy block decode. Are there cases where you'd want to create a Block but never materialize it (not talking about CID, but turning bytes to JS value or other way round) ? I am inclined to think that if you do never end up materializing a block you're probably choosing a wrong interface for your library or a program. If you do end up materializing it later on in the program than all you achieve by deferring is making performance characteristics less deterministic, because bunch of computations can (and usually will) accumulate and then will be forced in one large batch that tends to create spikes on load.

That is just to say that it's best to make conscious choice when deferring computations. It also makes reasoning about code (as in when reading it) a lot easier (less context is required to infer it's behavior)

Co-authored-by: Irakli Gozalishvili <[email protected]>

mikeal · 2020-10-05T21:53:46Z

Are there cases where you'd want to create a Block but never materialize it (not talking about CID, but turning bytes to JS value or other way round)

Creating blocks you never encode is surprisingly common in multi-block data structures during mutation. You often have interfaces that create a bunch of blocks on mutation, but many of those are then orphaned by another mutation operation before they are ever written to the block store. Ideally you could factor this out using bulk operations but we don’t have reasonable bulk operations for some data structures (like our HAMT).

Creating blocks you never decode is very common because that’s what happens during a lot of replication operations. Whenever you’re moving data from one store to another you can avoid a decode.

Gozala · 2020-10-05T21:56:43Z

Creating blocks you never decode is very common because that’s what happens during a lot of replication operations. Whenever you’re moving data from one store to another you can avoid a decode.

I wonder if this indicative of the other gap multiformats/js-multiformats#37

It is also something I tried to represent via BlockView type.

Gozala · 2020-10-05T22:03:58Z

Creating blocks you never encode is surprisingly common in multi-block data structures during mutation. You often have interfaces that create a bunch of blocks on mutation, but many of those are then orphaned by another mutation operation before they are ever written to the block store. Ideally you could factor this out using bulk operations but we don’t have reasonable bulk operations for some data structures (like our HAMT).

This sounds similar to what I tried to represent via BlockDraft interface. I think you could have layer for building up dags and materializing or whiting them in one operation, but that does not necessarily mean that what you build up with should be a block. In fact the reason my BlockDraft just holds value codec code and hasher code is so that you could build things up on one thread and materialize them on the other one (where you do have codecs and hashers available). If you choose to represent that via Blocks as well doing this on different threads (or processes) becomes a lot more trickier.

Gozala · 2020-10-05T22:22:05Z

Please note that current approach complicates things for both cases:

Actor that does replication becomes concerned with codecs and hashers, which is incidental, it could just replicate CID+Block|Code+Block pairs, it's just that does not have a canonical representation, which then Block attempts to fill.
Actor that is building up dag structure, does not really need to be concerned with codecs or hashers. It just needs to create compact recipe so that another actor could materialize the dag.

Note: That I fully understand that when building up dags you may have some pre-existing (already materialized) blocks interleaved with new (not yet materialized) blocks. But argument I'm making is that actor doing the dag building needs to be able to be able to distinguish so it can choose the best strategy to transfer the recipe in the most effective way to another actor (which may be on the separate thread, process or even machine). In fact you'd likely even want some way to represent remote blocks so that they don't need to roundtrip and having to do all this through lazy Block representation isn't necessarily helpful.

mikeal · 2020-10-06T02:44:46Z

@Gozala latest push splits the BlockEncoder and BlockDecoder classes out.

Gozala · 2020-10-06T23:01:17Z

index.js

+class BlockEncoder extends Block {
+  encodeUnsafe () {
+    if (this._data) return this._data
+    if (!this._codec) {


This check is obsolete because it's validated at the construction site.

Gozala · 2020-10-06T23:05:41Z

index.js

+class BlockDecoder extends Block {
+  decodeUnsafe () {
+    if (typeof this.opts._source !== 'undefined') return this._source
+    if (!this._codec) {


This check is obsolete because validation occurs at construction site.

turns out, there is a case in which you want to create a Block without a codec attached. i actually have to drop the constructor error and move that check to the encoder/decoder methods, but it’ll still be necessary here so that you get a good error if you try to decode a block that doesn’t have a codec attached

Gozala · 2020-10-06T23:11:33Z

index.js

+    if (cid) setImmutable(this, '_cid', cid)
+    if (data) setImmutable(this, '_data', data)
+    if (!source && (!data || !cid)) throw new Error('Missing required argument')
+    if (source && (!codec || !hasher)) throw new Error('Missing required argument')


I think things would be a lot cleaner if Block.encoder and Block.decoder would do above invariant checks instead of deferring that to shared constructor.

agreed, i’m going to move all of these in the new patch to js-multiformats.

mikeal · 2020-10-06T23:44:30Z

multiformats/js-multiformats#40

wip: migrate to latest multiformats

062a284

Gozala reviewed Oct 1, 2020

View reviewed changes

wip: refactor constructor

fed6eb6

Gozala reviewed Oct 5, 2020

View reviewed changes

mikeal and others added 4 commits October 5, 2020 14:48

fix: better cid

b86ea53

Co-authored-by: Irakli Gozalishvili <[email protected]>

fix: cid reference

f0c2e44

Co-authored-by: Irakli Gozalishvili <[email protected]>

fix: remove support for string cids

c8749a5

Co-authored-by: Irakli Gozalishvili <[email protected]>

fix: don’t throw on comparison

22eb756

Co-authored-by: Irakli Gozalishvili <[email protected]>

feat: separate encoder and decoder classes

a94c4d4

Gozala reviewed Oct 6, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: migrate to latest multiformats #16

wip: migrate to latest multiformats #16

mikeal commented Oct 1, 2020

rvagg commented Oct 1, 2020

Gozala commented Oct 1, 2020

Gozala left a comment •

edited

Loading

Gozala Oct 1, 2020

mikeal Oct 1, 2020

Gozala Oct 1, 2020

mikeal Oct 1, 2020

Gozala Oct 1, 2020

mikeal Oct 1, 2020

Gozala Oct 1, 2020

mikeal Oct 1, 2020

Gozala Oct 1, 2020

mikeal Oct 1, 2020

Gozala Oct 1, 2020

mikeal Oct 1, 2020

mikeal commented Oct 1, 2020

Gozala left a comment

Gozala Oct 5, 2020

Gozala Oct 5, 2020

Gozala Oct 5, 2020

Gozala Oct 5, 2020

Gozala commented Oct 5, 2020 •

edited

Loading

mikeal commented Oct 5, 2020

Gozala commented Oct 5, 2020 •

edited

Loading

Gozala commented Oct 5, 2020

Gozala commented Oct 5, 2020

mikeal commented Oct 6, 2020

Gozala Oct 6, 2020

Gozala Oct 6, 2020

mikeal Oct 6, 2020

Gozala Oct 6, 2020

mikeal Oct 6, 2020

mikeal commented Oct 6, 2020

	Block.createUnsafe = (data, cid, { hasher, codec } = {}) => {
	Block.createUnsafeDecoder = (data, cid, { codec } = {}) => {

	Block.create = async (data, cid, { hasher, codec } = {}) => {
	Block.decode = async (data, cid, { hasher, codec } = {}) => {

	return Block.createUnsafe(data, cid, { hasher, codec })
	return new Block({ cid, data, codec, hasher })

	return new Block({ data, cid, codec, hasher: hasher \|\| null })
	return new Block({ data, cid, codec })

wip: migrate to latest multiformats #16

Are you sure you want to change the base?

wip: migrate to latest multiformats #16

Conversation

mikeal commented Oct 1, 2020

rvagg commented Oct 1, 2020

Gozala commented Oct 1, 2020

Gozala left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal commented Oct 1, 2020

Gozala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala commented Oct 5, 2020 • edited Loading

mikeal commented Oct 5, 2020

Gozala commented Oct 5, 2020 • edited Loading

Gozala commented Oct 5, 2020

Gozala commented Oct 5, 2020

mikeal commented Oct 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal commented Oct 6, 2020

Gozala left a comment •

edited

Loading

Gozala commented Oct 5, 2020 •

edited

Loading

Gozala commented Oct 5, 2020 •

edited

Loading