Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add non-dag json codec #152

Merged
merged 3 commits into from
Mar 23, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions codec/dagjson/multicodec.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,23 @@ var (

func init() {
multicodec.RegisterEncoder(0x0129, Encode)
multicodec.RegisterEncoder(0x0200, Encode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to think that for the json codec, if we encounter a Link in the middle of a structure, we should have the codec error explicitly. It's not going to roundtrip via that same codec, and explicit errors are better than unstated incorrect behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it produces a reasonable serialization into json of a dag. making the caller traverse the dag to understand there's no links in it, rather than just flatting them isn't clear to me as a win

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think callers generally have to do anything. They would just... get an error if they try to encode data that's unencodable with this format.

In general, things that are unencodable should result in errors. That's a rule we'd follow anywhere[‡] that isn't JSON, and I think we should follow it for JSON too.

Failure to have bijections for data that we emit tends to cause unhappiness and confusion. The problems are magnified by the fact they often sweep under the rug for a while and then often reappear at points in time distant from when the data was first written.

I'd say the direction of burden-of-proof should be overwhelming weighted such that doing something looseygoosey that's non-bijective is the case that needs support. Maintaining bijectiveness, even if it comes at the cost of more errors, should be the default design bias.

[‡] - except where floating point numbers are involved. But that's the exception that really proves the reason for the rule, isn't it -- float consistency is a horrific minefield that we would deeply prefer didn't exist, and there are horrific amounts of time wasted by this seemingly small problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warpfork my understanding was that people should be able to take DAG-JSON data and by changing the codec to JSON they'd just have an object where there are no links. This is similar to how we can take an existing CID and slap a Raw codec on it and get the data in that block back as raw bytes.

This case is mostly covered by the decode case, it seems a bit weird that I wouldn't be able to roundtrip data since Encode(Decode(dagJSONDataAsJSON)) would error. Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you decode as json you don't get a link. when you re-encode the map with a string in it you'd get the same data.

Copy link
Collaborator

@warpfork warpfork Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"/":"bafyquux"} is a map with one key in JSON. When you deserialize that block with a JSON codec, you get a map with one key; when you serialize it with a JSON codec, you get the same block.

(@aschmahmann , the reason I thumbs-down'd your comment is that there's no error in this case above. One just emits this as a map. That's normal.)

{"/":"bafyquux"} is a link in DAG-JSON. When you deserialize that block with a DAG-JSON codec, you get a link; when you serialize it with a DAG-JSON codec, you get the same block.

Now, I have a link, and I attempt serialize it with a JSON codec, I want that to yield an error, because that's not really encodable in JSON. As a user, if I attempt to do this, I have made a serious mistake, and I want to be informed immediately.

If we didn't do this, and instead smash the link into a map with one entry during the encode process, we've changed the data. When we deserialize it later with the same codec we serialized it with, we would get different data. This would not be good. And as a user, one might not notice what had happened until much later, making it very much not gooder.

I don't really care what happens when we serialize something with a JSON codec and attempt to later deserialize it with a DAG-JSON codec. If that gives different data... well, yes. A different codec was used. That's gonna happen.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may help to be reminded that neither DAG-JSON nor JSON are total supersets of each other. This sucks, and we often wish it wasn't so, but wishes... well. Wishes. They have limited power.

DAG-JSON can't encode maps with single entries that happen to have the key "/". JSON can. This makes JSON larger.

In the other direction, DAG-JSON can encode links. JSON can't. This makes DAG-JSON larger.

This is generally a headache. But I'm fairly certain we make the headache bigger rather than smaller if we were to add more silent coercions to the picture.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, just wanted to clarify what you were actually intending/trying to do. I have no problem with a codec's Encode function erroring when trying to convert an IPLD data model object to bytes when that codec doesn't support part of the data model used in that object (e.g. JSON doesn't support links).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAG-JSON can't encode maps with single entries that happen to have the key "/".

@warpfork maybe I'm missing it, but it doesn't look like DAG-JSON's Encode function errors with these map entries. Should that be fixed in another PR?

Copy link
Collaborator

@warpfork warpfork Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... 😩 yes, you're quite correct

Added an issue for this: #155

multicodec.RegisterDecoder(0x0129, Decode)
multicodec.RegisterDecoder(0x0200, decodeNonDagJSON)
}

func Decode(na ipld.NodeAssembler, r io.Reader) error {
return decode(na, r, true)
}

func decodeNonDagJSON(na ipld.NodeAssembler, r io.Reader) error {
return decode(na, r, false)
}

func decode(na ipld.NodeAssembler, r io.Reader, parseLinks bool) error {
// Shell out directly to generic builder path.
// (There's not really any fastpaths of note for json.)
err := Unmarshal(na, json.NewDecoder(r))
err := unmarshal(na, json.NewDecoder(r), parseLinks)
if err != nil {
return err
}
Expand All @@ -49,7 +59,6 @@ func Decode(na ipld.NodeAssembler, r io.Reader) error {
return err
}
}
return err
}

func Encode(n ipld.Node, w io.Writer) error {
Expand Down
26 changes: 17 additions & 9 deletions codec/dagjson/unmarshal.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@ import (
// tokens before deciding what kind of value to create).

func Unmarshal(na ipld.NodeAssembler, tokSrc shared.TokenSource) error {
return unmarshal(na, tokSrc, true)
}

func unmarshal(na ipld.NodeAssembler, tokSrc shared.TokenSource, parseLinks bool) error {
var st unmarshalState
st.parseLinks = parseLinks
done, err := tokSrc.Step(&st.tk[0])
if err != nil {
return err
Expand All @@ -32,8 +37,9 @@ func Unmarshal(na ipld.NodeAssembler, tokSrc shared.TokenSource) error {
}

type unmarshalState struct {
tk [4]tok.Token // mostly, only 0'th is used... but [1:4] are used during lookahead for links.
shift int // how many times to slide something out of tk[1:4] instead of getting a new token.
tk [4]tok.Token // mostly, only 0'th is used... but [1:4] are used during lookahead for links.
shift int // how many times to slide something out of tk[1:4] instead of getting a new token.
parseLinks bool
}

// step leaves a "new" token in tk[0],
Expand Down Expand Up @@ -120,7 +126,7 @@ func (st *unmarshalState) linkLookahead(na ipld.NodeAssembler, tokSrc shared.Tok
if err != nil {
return false, err
}
if err := na.AssignLink(cidlink.Link{elCid}); err != nil {
if err := na.AssignLink(cidlink.Link{Cid: elCid}); err != nil {
return false, err
}
return true, nil
Expand All @@ -135,12 +141,14 @@ func (st *unmarshalState) unmarshal(na ipld.NodeAssembler, tokSrc shared.TokenSo
case tok.TMapOpen:
// dag-json has special needs: we pump a few tokens ahead to look for dag-json's "link" pattern.
// We can't actually call BeginMap until we're sure it's not gonna turn out to be a link.
gotLink, err := st.linkLookahead(na, tokSrc)
if err != nil { // return in error if any token peeks failed or if structure looked like a link but failed to parse as CID.
return err
}
if gotLink {
return nil
if st.parseLinks {
gotLink, err := st.linkLookahead(na, tokSrc)
if err != nil { // return in error if any token peeks failed or if structure looked like a link but failed to parse as CID.
return err
}
if gotLink {
return nil
}
}

// Okay, now back to regularly scheduled map logic.
Expand Down