Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Manager interface to correctly support new Swarm content hash #113

Closed
acud opened this issue Mar 19, 2019 · 28 comments
Closed

Update the Manager interface to correctly support new Swarm content hash #113

acud opened this issue Mar 19, 2019 · 28 comments

Comments

@acud
Copy link

acud commented Mar 19, 2019

Hi there,

We have requested changes on the multicodecs repository in order to neatly support EIP-1577.

Swarm CIDs are using bmt hash (0xd6) and the default codec we would like to encode at this point is the swarm-manifest codec (0xfa).

We know this is also linked to how the js dependencies behave, and so supporting PRs have been already opened:
pldespaigne/content-hash#14
multiformats/js-multicodec#40
multiformats/js-multihash#64

Related:
ethersphere/swarm#1232
ethereum/EIPs#1577
ethersphere/swarm#940

@mcdee
Copy link

mcdee commented Mar 19, 2019

@justelad I didn't see the multicodecs PR but I'm a little confused as to what bmt represents. From my reading of multihash types they define the hashing algorithm that created the hash. Is bmt a hashing algorithm or a marker to represent what the hash represents? If the latter then I think you're overloading the purpose of multihash.

@Arachnid
Copy link
Member

@justelad What is it that you need us to do? Based on our conversation in Gitter, I assume we need to change how we handle translation from text format Swarm hashes to CIDs, but I don't see a specification for how to do that (or library code) anywhere. Can you point to one?

@acud
Copy link
Author

acud commented Mar 20, 2019

@justelad I didn't see the multicodecs PR but I'm a little confused as to what bmt represents. From my reading of multihash types they define the hashing algorithm that created the hash. Is bmt a hashing algorithm or a marker to represent what the hash represents? If the latter then I think you're overloading the purpose of multihash.

@mcdee BMT uses Keccak256 under the hood, however it is not a classic Keccak256 hash per-se. As the name suggests it is a binary merkle tree built for future inclusion proofs.

@justelad What is it that you need us to do? Based on our conversation in Gitter, I assume we need to change how we handle translation from text format Swarm hashes to CIDs, but I don't see a specification for how to do that (or library code) anywhere. Can you point to one?

The specification, AFAIK and understand is in the EIP itself. Just that right now the manager encodes the CID with a keccak256 multihash and a dag-pb codec which is irrelevant to Swarm, and should be changed to be swarm-manifest as the linked PR suggests.

@mcdee
Copy link

mcdee commented Mar 20, 2019

@mcdee BMT uses Keccak256 under the hood, however it is not a classic Keccak256 hash per-se. As the name suggests it is a binary merkle tree built for future inclusion proofs.

@justelad My understanding is that multihash is there to provide specific information about method and length of hash, not about the purpose of the hash or anything about the underlying data from which the hash was calculated. Note that every other entry in https://github.com/multiformats/multicodec/blob/master/table.csv with a tag of multihash is a specification of the hash and nothing to do with its purpose.

The codec swarm-manifest should be enough for implementers to work out what to do with the hash, so in my opinion the codec should be swarm-manifest and the multihash remain keccak-256 (and bmt should be removed/deprecated, but that's a different issue).

@nolash
Copy link

nolash commented Mar 20, 2019

method and length of hash

@mcdee so a hash resulting from keccak256(keccak256(data)) would still be keccak256 in multiformat, in your opinion?

@mcdee
Copy link

mcdee commented Mar 20, 2019 via email

@acud
Copy link
Author

acud commented Mar 20, 2019

I see your point, and specifically I have raised this exact question in different forums. The thing is that usage of keccak256 would be misleading in this case as people might assume that this is a plain keccak256 sum of their content, which is not the case.

@nolash
Copy link

nolash commented Mar 20, 2019

@justelad But it is, kinda...

@mcdee
Copy link

mcdee commented Mar 20, 2019

The thing is that usage of keccak256 would be misleading in this case as people might assume that this is a plain keccak256 sum of their content, which is not the case.

@justelad stating that the hash is of a Merkle tree rather than the content should be part of the definition of the swarm-manifest codec.

Taking a different approach: imagine that a year down the line Swarm moves to a different hashing algorithm with different output length. There are now two different types of hash, legacy and current, so what does bmt mean? Both types of hash are of Merkle trees, but have different derivations and sizes so it can't apply to both of them.

If you use swarm-manifest:keccak256 then it's easy to move to swarm-manifest:newhash512, where newhash512 is known and supported generically by the multihash implementations. Moving from swarm-manifest:bmt to swarm-manifest2:bmt or swarm-manifest:bmt2 or whatever variation you come up with is more work for every implementation of multihash and/or multicodec, and has no gain that I can see.

@Arachnid
Copy link
Member

The specification, AFAIK and understand is in the EIP itself. Just that right now the manager encodes the CID with a keccak256 multihash and a dag-pb codec which is irrelevant to Swarm, and should be changed to be swarm-manifest as the linked PR suggests.

At present, EIP-1577 doesn't specify an encoding for Swarm hashes. It would be much quicker and simpler if you can just provide an example text and binary hash, such as exists in EIP-1577 for IPFS, or a specification for how Swarm hashes are represented and encoded.

@acud
Copy link
Author

acud commented Mar 25, 2019

@Arachnid the way we encode the hashes is thoroughly shown in the PR I asked you to review (about the EIP implementation). Namely in:

https://github.com/ethereum/go-ethereum/blob/2f5b6cb442dd1ab20279603de938e5b60e3122fb/contracts/ens/cid.go#L103-L121

Please note that now I'm PRing the change as per the discussion above to revert bmt to keccak256. There's also a Swarm command to convert a Swarm hash to an ENS contentHash in hex which I also introduced in that PR: swarm hash ens contenthash <swarm-hash>.

@Arachnid
Copy link
Member

Arachnid commented Mar 27, 2019

@justelad You can't possibly expect everyone implementing support for Swarm identifiers to ENS to look there though, can you?

We're happy to modify EIP-1577 if you don't want to document the format anywhere else.

@Arachnid
Copy link
Member

@justelad This encoding doesn't appear to be correct. The multicodec ID is uvarint encoded, meaning that an ID of 0xe3 needs to be encoded as 0xe301. The same applies for the encoding of the multihash values 0xfa and 0xd6. Sorry I missed that in my review of the PR.

@acud
Copy link
Author

acud commented Mar 27, 2019

This encoding doesn't appear to be correct. The multicodec ID is uvarint encoded, meaning that an ID of 0xe3 needs to be encoded as 0xe301.

Not sure what you mean here. It is indeed being encoded as 0xe401, the padding is being handled by the go binary.PutUvarint. Also, as you would have noticed in that exact PR, there is a unit test that checks and decodes the exact hash you put in the EIP spec.

https://github.com/ipfs/go-cid/blob/e7e67e08cfba888a4297224402e12832bdc15ea0/cid.go#L162

An EIP-1577-style spec on how to encode Swarm Hashes with an example ENS contenthash encoded by swarm hash:

Input data:

storage system: Swarm (0xe4)
CID version: 1 (0x01)
content type: swarm-manifest (0xfa)
hash function: keccak256 (0x1b)
hash length: 32 bytes (0x20)
hash: d1de9994b4d039f6548d191eb26786769f580809256b4685ef316805265ea162
Binary format:

0xe40101fa011b20d1de9994b4d039f6548d191eb26786769f580809256b4685ef316805265ea162

Text format:

bzz://d1de9994b4d039f6548d191eb26786769f580809256b4685ef316805265ea162

Example usage with swarm hash:

$ swarm hash ens contenthash d1de9994b4d039f6548d191eb26786769f580809256b4685ef316805265ea162                                                                                                                                     
> e40101fa011b20d1de9994b4d039f6548d191eb26786769f580809256b4685ef316805265ea162

Let me know if something still doesn't make sense. If it does - please update the EIP with the example above, because

@justelad You can't possibly expect everyone implementing support for Swarm identifiers to ENS to look there though, can you?

I can't expect them to look for it in the ENS manager app github issues section either.

@Arachnid
Copy link
Member

Not sure what you mean here. It is indeed being encoded as 0xe401, the padding is being handled by the go binary.PutUvarint. Also, as you would have noticed in that exact PR, there is a unit test that checks and decodes the exact hash you put in the EIP spec.

Sorry, I missed that too. That looks good; I was misled by the use of a byte array rather than an array of ints.

I can't expect them to look for it in the ENS manager app github issues section either.

Right, which is why I wanted a spec I could include in the EIP, or link to from it. Thanks for providing one, I'll add it to 1577.

@acud
Copy link
Author

acud commented Mar 27, 2019

Cheers 🎊 🎉 🍾
What else needs to be done from our side for this to get shipped?
I already requested the removal of bmt from multicodecs and hopefully will get the supporting libraries to update the underlying libraries in the coming days.

@pldespaigne
Copy link
Contributor

Hi ✋ ! So are we ok or not to use swarm-manifest as the default multicodec for Swarm instead of dag-pb ?
Just to know if I should update the content-hash lib as requested here?

@acud
Copy link
Author

acud commented Apr 1, 2019

Yes please

@pldespaigne
Copy link
Contributor

Sorry but for now I will not update to swarm-manifest because the EIP explicitly say wich protocode to use :

The encoding of the value depends on the content type specified by the protoCode. Values with protocodes of 0xe3 and 0xe4 represent IPFS and Swarm content;

@nonsense
Copy link

nonsense commented Apr 9, 2019

@pldespaigne as far as I understand the EIP, the only thing that is mentioned in it related to protocode is the namespace for each system - IPFS (0xe3) and Swarm (0xe4) as example.

However you need more parameters than just the namespace in order to generate a multihash, and an example is given for IPFS:

storage system: IPFS (0xe3)
CID version: 1 (0x01)
content type: dag-pb (0x70)
hash function: sha2-256 (0x12)
hash length: 32 bytes (0x20)
hash: 29f2d17be6139079dc48696d1f582a8530eb9805b561eda517e22a892c7e3f1f

This PR you asked about (pldespaigne/content-hash#14) is about updating the content type, which currently is hardcoded to dag-pb.

I don't really know what dag-pb is beyond MerkleDAG protobuf - whether MerkleDAG is the same data structure as a Swarm manifest is out of scope for this PR I think.

As of now there is no dag-pb content type in Swarm. In Swarm there is swarm-manifest, and it is already part of the multicodec repo.

Bottom-line the way I understand the spec is that each namespace contains different content types and it is not necessary that a content type present in IPFS should be implemented in Swarm, or the other way around. Therefore I think dag-pb is IPFS-specific, whereas swarm-manifest is Swarm specific.

I hope that makes sense.

@nonsense
Copy link

nonsense commented Apr 9, 2019

@Arachnid @mcdee @justelad feel free to correct me if I said something wrong :)

@acud
Copy link
Author

acud commented Apr 9, 2019

@Arachnid said he will update the EIP and that was 2 weeks ago.
@pldespaigne if you could please read the messages on this specific issue you'll see all of the resolutions we've made in detail. I'm not sure what is there more to add here. There's a clear spec just the way Nick asked for and his agreement on this. What else do you guys want?

@mcdee
Copy link

mcdee commented Apr 9, 2019

Agree with @nonsense here.

The standard isn't changing; the storage system remains Swarm (0xe4). But what forms a Swarm identifier is up to the Swarm guys, and as per @justelad they want to use swarm-manifest in place of dag-pb. As such @pldespaigne it makes sense to change your implementation otherwise it isn't going to be compatible with Swarm identifiers generated by other systems.

@pldespaigne
Copy link
Contributor

it's merged 😀

@nonsense
Copy link

Now that the content-hash repo is updated, on next update of the ENS Manager (since content-hash is a dependency for it), the generated multihashes should work and be recognised on Swarm. Thank you all!

@makoto
Copy link
Member

makoto commented Apr 17, 2019

@pldespaigne npm module does not seem to be updated https://www.npmjs.com/package/content-hash

@nonsense
Copy link

nonsense commented Apr 17, 2019

@makoto yes, my comment is confusing. content-hash npm module update is blocked by its dependencies - see last comment at pldespaigne/content-hash#14

@pldespaigne
Copy link
Contributor

@makoto content-hash 2.3.1 is available on npm 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants