-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add non-dag json codec #152
Conversation
and follow-up question is if you have any concerns with doing the same thing (a parallel variant that skips the link logic) for dagcbor, @warpfork |
codec/dagjson/multicodec.go
Outdated
@@ -17,13 +17,23 @@ var ( | |||
|
|||
func init() { | |||
multicodec.RegisterEncoder(0x0129, Encode) | |||
multicodec.RegisterEncoder(0x0200, Encode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to think that for the json codec, if we encounter a Link in the middle of a structure, we should have the codec error explicitly. It's not going to roundtrip via that same codec, and explicit errors are better than unstated incorrect behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it produces a reasonable serialization into json of a dag. making the caller traverse the dag to understand there's no links in it, rather than just flatting them isn't clear to me as a win
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think callers generally have to do anything. They would just... get an error if they try to encode data that's unencodable with this format.
In general, things that are unencodable should result in errors. That's a rule we'd follow anywhere[‡] that isn't JSON, and I think we should follow it for JSON too.
Failure to have bijections for data that we emit tends to cause unhappiness and confusion. The problems are magnified by the fact they often sweep under the rug for a while and then often reappear at points in time distant from when the data was first written.
I'd say the direction of burden-of-proof should be overwhelming weighted such that doing something looseygoosey that's non-bijective is the case that needs support. Maintaining bijectiveness, even if it comes at the cost of more errors, should be the default design bias.
[‡] - except where floating point numbers are involved. But that's the exception that really proves the reason for the rule, isn't it -- float consistency is a horrific minefield that we would deeply prefer didn't exist, and there are horrific amounts of time wasted by this seemingly small problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warpfork my understanding was that people should be able to take DAG-JSON data and by changing the codec to JSON they'd just have an object where there are no links. This is similar to how we can take an existing CID and slap a Raw codec on it and get the data in that block back as raw bytes.
This case is mostly covered by the decode case, it seems a bit weird that I wouldn't be able to roundtrip data since Encode(Decode(dagJSONDataAsJSON)) would error. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you decode as json you don't get a link. when you re-encode the map with a string in it you'd get the same data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"/":"bafyquux"}
is a map with one key in JSON. When you deserialize that block with a JSON codec, you get a map with one key; when you serialize it with a JSON codec, you get the same block.
(@aschmahmann , the reason I thumbs-down'd your comment is that there's no error in this case above. One just emits this as a map. That's normal.)
{"/":"bafyquux"}
is a link in DAG-JSON. When you deserialize that block with a DAG-JSON codec, you get a link; when you serialize it with a DAG-JSON codec, you get the same block.
Now, I have a link, and I attempt serialize it with a JSON codec, I want that to yield an error, because that's not really encodable in JSON. As a user, if I attempt to do this, I have made a serious mistake, and I want to be informed immediately.
If we didn't do this, and instead smash the link into a map with one entry during the encode process, we've changed the data. When we deserialize it later with the same codec we serialized it with, we would get different data. This would not be good. And as a user, one might not notice what had happened until much later, making it very much not gooder.
I don't really care what happens when we serialize something with a JSON codec and attempt to later deserialize it with a DAG-JSON codec. If that gives different data... well, yes. A different codec was used. That's gonna happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may help to be reminded that neither DAG-JSON nor JSON are total supersets of each other. This sucks, and we often wish it wasn't so, but wishes... well. Wishes. They have limited power.
DAG-JSON can't encode maps with single entries that happen to have the key "/". JSON can. This makes JSON larger.
In the other direction, DAG-JSON can encode links. JSON can't. This makes DAG-JSON larger.
This is generally a headache. But I'm fairly certain we make the headache bigger rather than smaller if we were to add more silent coercions to the picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, just wanted to clarify what you were actually intending/trying to do. I have no problem with a codec's Encode
function erroring when trying to convert an IPLD data model object to bytes when that codec doesn't support part of the data model used in that object (e.g. JSON doesn't support links).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DAG-JSON can't encode maps with single entries that happen to have the key "/".
@warpfork maybe I'm missing it, but it doesn't look like DAG-JSON's Encode
function errors with these map entries. Should that be fixed in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... 😩 yes, you're quite correct
Added an issue for this: #155
Yeah, this seems mostly fine. In a perfect world there's a lot of DRY work I want to do between both this and the cbor codecs, but it would be silly to hold this up on that. |
Thoughts about adding a (very small) The code is largely shared, but I think having a separate package per codec is feels clean and consistent and generally reasonable. It would also make it possible to implement the |
@warpfork is it okay to change the public |
I almost objected to that, but caught myself in time (there's a convention that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this.
There are some specks here and there I'd clean up in a bigger refactor (e.g. separate decoder state vs config; both are now present in one struct), but there's plenty of other things in that same heading, so I'm fine leaving these things on the wishlist until they can all be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, hang on, where'd the conditional branch for encode end up?
(i didn't do that yet 😁 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this acceptable as a first-pass at the 'non-dag' json?