-
Notifications
You must be signed in to change notification settings - Fork 5
Major API changes #50
Changes from all commits
128decd
190de47
930398b
888cd27
3e79d97
55585d4
f04424d
bf2184d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,16 +14,12 @@ | |
- [Badge](#badge) | ||
- [Definitions](#definitions) | ||
- [API](#api) | ||
- [IPLD format utils](#ipld-format-utils) | ||
- [`util.serialize(dagNode, callback)`](#utilserializedagnode-callback) | ||
- [`util.deserialize(binaryBlob, callback)`](#utildeserializebinaryblob-callback) | ||
- [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback) | ||
- [Local resolver methods](#local-resolver-methods) | ||
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Large naming changes might be out of the scope here, but if it isn't I would favor following terminology: |
||
- [`serialize(IpldNode)`](#serializeipldnode) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
- [`deserialize(binaryBlob)`](#deserializebinaryblob) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
- [`cid(binaryBlob, [options])`](#cidbinaryblob-options) | ||
- [Properties](#properties) | ||
- [`defaultHashAlg`](#defaulthashalg) | ||
- [`multicodec`](#multicodec) | ||
- [`format`](#format) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
- [Maintainers](#maintainers) | ||
- [Contribute](#contribute) | ||
- [License](#license) | ||
|
@@ -52,87 +48,62 @@ Include this badge in your readme if you make a new module that implements inter | |
|
||
## Definitions | ||
|
||
- **dagNode**: The implementation specific representation of a deserialized block. | ||
- **dagNode**: The implementation specific representation of a deserialized blob. | ||
|
||
## API | ||
|
||
A valid (read: that follows this interface) IPLD format implementation the following API. | ||
|
||
### IPLD format utils | ||
IPLD Format APIs are restricted to a single IPLD Node, they never access any linked IPLD Nodes. | ||
|
||
#### `util.serialize(dagNode, callback)` | ||
|
||
> serializes a dagNode of an IPLD format into its binary format | ||
### `serialize(IpldNode)` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe |
||
|
||
`callback` must have the signature `function (err, binaryBlob)`, where `err` is an Error is the function fails and `binaryBlob` is a Buffer containing the serialized version. | ||
> Serializes the internal representation of an IPLD Node into a binary blob. | ||
|
||
#### `util.deserialize(binaryBlob, callback)` | ||
`IpldNode` is a previously deserialized binary blob. | ||
|
||
> deserializes a binary blob into the instance | ||
Returns a Promise containing a `Buffer` with the serialized version of the given IPLD Node. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would really encourage to remove requirement to return Buffer in favor of Uint8Array. This requirement makes browser use case unnecessarily complicated, and in practice isn't really necessary. Other IPFS libraries can do the casting where necessary. P.S. node Buffer is also a valid Uint8Array so node case will require no change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would love if there was some word about errors as well. Specifically how decoder can return error and how that error propagates. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving from Node.js to Browser based primitives is certainly on my todo list, but out of scope for these API changes. I don't want to do everything at once. |
||
|
||
`callback` must have the signature `function (err, dagNode)`, where `err` is an Error if the function fails and `dagNode` is the dagNode that got deserialized in the process. | ||
|
||
#### `util.cid(binaryBlob[, options], callback)` | ||
### `deserialize(binaryBlob)` | ||
|
||
> get the CID of a binary blob | ||
> Deserialize into internal representation. | ||
|
||
Options include: | ||
- version - the CID version to be used (defaults to 1) | ||
- hashAlg - the hash algorithm to be used (default to the one set by the format) | ||
The result is a JavaScript object, which may also be a Proxy object in case the data shouldn’t be deserialized as a whole. Its fields are the public API that can be resolved through. It’s up to the format to add convenient methods for manipulating the data. The returned object may also be a Proxy object in case the data shouldn’t be deserialized as a whole. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did wrote more about this above but I think I want add one more note here - Requirement for implementation to provide a proxy just to avoid full decoding makes a high bar, in fact I would argue that custom There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not a requirement, "may also be a Proxy object", but I'm very negative on Proxies regardless. Aside from performance concerns, I think we've learned from many previous attempts in other languages that things like Proxies should be reserved for metaprogramming and other reflection use-cases rather than general magical data structures. Pitfalls abound. Alternatively go the container route which is much nicer--I'm not sure let data = await format.deserialize(blob)
data.get('foo') // could be lazy
data.has('/deeply/A0/01/FC-nested/path.txt') // could be lazy
data.toObject() // skip laziness and recursively instantiate
// { foo: 'bar', deeply: { A0: { '01': { 'FC-nested': { 'path.txt': UInt8Array<00, 00, 00, 00> } } } } } It would require specification, and even provision of a custom class, maybe just a simple extension of For deserialization where you know the result is cheap and you want it all you could just instantiate it in the same call: let copy = Object.extend((await format.deserialize(blob)).toObject(), { lastUpdate: new Date() })
Are there possible use-cases where getters might need to be async or is it safe to frontload that in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you may have misunderstood what I meant by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The whole point of IPLD is to provide common interface for traversing linked data no ? If each implementation is free to design one, the common interface is gone & I think that’s a step in a wrong direction - I much rather have common imperfect interface that can be used across arbitrary formats than have to learn each one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But this then won't be lazy for nested objects, will it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For implementing eager case this will make no difference. For lazy case avoiding proxies and lazy getters would be significantly simpler than needing to have them There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When I wrote my initial response in this thread I was trying to imagine a use-case where the properties would need to be asynchronously accessed and I couldn't see an obvious one except for very large blobs that need to be stored on disk--but those wouldn't be passed in to a So again, it looks like premature optimisation in search of a use-case.
Agreed, and it'll help make quicker progress on this PR! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I care about JS implementation in browser context (on server go is likely a better choice anyway). In browser crypto APIs are async (for good reasons) ignoring that seems strange to me. And honestly it’s not about optimizations, just about not making currently well supported use cases second class. P.S. I am working on crypto heavy use case for browsers right now and this API channge will require me to switch to handrolled solution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Gozala ok this is good context, excuse my ignorance. In your use case, is it necessary that the crypto be handled at the format layer or is it reasonable to push it up to ipld itself? I was imagining ipld/ipld#64 being something in ipld and the blocks just providing the encrypted binary data. Perhaps an encrypted block coming from a format gets transposed into an unencrypted ipld block where the key is provided but this happens at the ipld layer, not in the format. Or are you seeing a different mechanism? Again, I'm fairly new to thinking about this stuff so additional insight is appreciated! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've created #51 which hopefully accommodates all the points brought up here. It's a smaller change to the current API. Let's have a discussion over there. |
||
|
||
`callback` must have the signature `function (err, cid)`, where `err` is an Error if the function fails and `cid` is a CID instance of the binary blob. | ||
All enumerable properties (the ones that are returned by a `Object.keys()` call) of the deserialized object are considered for resolving IPLD Paths. They must only return values whose type is one of the [IPLD Data Model](https://github.com/ipld/specs/blob/master/IPLD-Data-Model-v1.md). | ||
|
||
### Local resolver methods | ||
Returns a Promise containing the Javascript object. This object must be able to be serialized with a `serialize()` call. | ||
|
||
#### `resolver.resolve(binaryBlob, path, callback)` | ||
|
||
> resolves a path in block, returns the value and or a link and the partial missing path. This way the IPLD Resolver can fetch the link and continue to resolve. | ||
### `cid(binaryBlob, [options])` | ||
|
||
`callback` must have the signature `function (err, result)`, where `err` is an Error if the function fails and `result` is an object with the following keys: | ||
> Calculate the CID of the binary blob. | ||
|
||
- value: <> - The value resolved or an IPLD link if it was unable to resolve it through. | ||
- remainderPath: <> - The remaining path that was not resolved under block scope. | ||
Possible `options` are: | ||
- `version` (`number`, default: 1): the CID version to be used | ||
- `hashAlg` (`Multicodec`, default: the one the format specifies): the hash algorithm to be used | ||
|
||
If `path` is the root `/`, the result is a nested object that contains all paths that `tree()` returns. The values are the same as accessing them directly with the full path. Example: | ||
This can be used to verify that some data actually has a certain CID. | ||
|
||
`tree()` returns: | ||
Returns a Promise containing the calculated CID of the given binary blob. | ||
|
||
```JSON | ||
["author/name", "author/email"] | ||
``` | ||
|
||
`resolve(binaryblob, "/", callback)` would then have as a result: | ||
|
||
```JSON | ||
{ | ||
"author": { | ||
"name": "vmx", | ||
"email": "[email protected]" | ||
} | ||
} | ||
``` | ||
|
||
Numbers within a path are interpreted as an array. | ||
|
||
#### `resolver.tree(binaryBlob, callback)` | ||
|
||
> returns all the paths available in this block. | ||
|
||
`callback` must have the signature `function (err, result)`, where `err` is an Error if the function fails and `result` is a list of path such as `["/foo", "/bar", "/author/name", ...]`. | ||
|
||
### Properties | ||
|
||
#### `defaultHashAlg` | ||
|
||
> Default hash algorithm of the format | ||
> Default hash algorithm of the format, | ||
|
||
Most formats have one specific hash algorithm, e.g. Bitcoin’s is `dbl-sha2-256`. CBOR can be used with any hash algorithm, though the default in the IPFS world is `sha256`. `defaultHashAlg` is used in the `cid()` call if no hash algorithm is given. The value of `defaultHashAlg` is of type `Multicodec` should be one code defined in the [Multihash Table](https://github.com/multiformats/multihash#table-for-multihash). | ||
|
||
Most formats have one specific hash algorithm, e.g. Bitcoin’s is `dbl-sha2-256`. CBOR can be used with any hash algorithm, though the default in the IPFS world is `sha256`. `defaultHashAlg` is used in the `util.cid()` call if no hash algorithm was given. The value of `defaultHashAlg` must be one defined in the [Multihash Table](https://github.com/multiformats/multihash#table-for-multihash-v100-rc-semver). | ||
#### `format` | ||
|
||
#### `multicodec` | ||
> Identifier for the format implementation. | ||
|
||
> Identifier for the format implementation | ||
The `format` property of type `Multicodec` is used to register a format implementation in IPLD. It should be one of the codes specified in the [Multicodec Table](https://github.com/multiformats/multicodec#multicodec-table). | ||
|
||
The `multicodec` property is used to register a format implementation in IPLD. It needs to be one specified in the [Multicodec Table](https://github.com/multiformats/multicodec#multicodec-table). | ||
|
||
## Maintainers | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did these methods go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see the new
js-ipld
with the formats updated and resolving through things with all of these updates. It seems that we are preemptively removing things but I might be missing something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the commit message:
So instead of
resolve.resolve(binaryBlob, '/deeply/nested/path)' you deserialize the node first (which resolve also needs to do) and then on the resolved node, let's cal it
ipldNodeyou do
ipldNode.deeply.nested.path`.For
tree()
-like functionality (if needed) you do what dag-json is doing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now it is the user that needs to convert an IPLD Path into JS notation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that dag-pb supported resolve as in
cid/name-of-the-link-to-the-other-node
andcid/links/0/hash/
. It was basically packing a transformation inside the format. It will be an extra breaking change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could we make it better?
What's the browser support for that like? I found babel-plugin-proxy but the README says it's got really bad performance and don't recommend using it in production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of the named links traversal (I call it "IPFS traversal") or put it under a field (e.g.
ipfs
) so it won't collide. Though I think being just as broken as we are currently is good enough for now, especially since this is again dag-pb specific and shouldn't be a problem with other formats.Pretty good: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy#Browser_compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a crazy idea. It's only of limited use for IPFS because there isn't a one-to-one mapping between IPFS paths and IPLD paths for HAMT shards.
E.g.
Might end up needing to be
..and you don't know what the prefix for each segment is going to be until you descend into that level of the trie as it's dependent on the siblings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly kept the named links to show that such things are still possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally do think that move to JS based traversal has following disadvantages:
As @achingbrain pointed out that implies that that whole node is de-serialized in one step. Sure JS proxies could be utilized but they are very slow and are really difficult to get right as in ensure that they you don't end up creating memory leaks (we used them a lot for cross compartment boundaries in gecko & in my opinion for non trivial cases e.g
process.env
they're not worth it).I have being wanted to implement ipld resolver for flatbuffers who's primary advantage over proto-buffs is laziness - only relevant fields can be decoded without instantiating objects for intermediaries. It seems like with this move taking advantage of this would be impossible.
Proxies only work if you can decode everything synchronously, if that is not the case then you wind up with bunch of promises you have to await on to get to the nested data, which IMO is less convenient than it is to provide a path.