-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fastpath cbor marshalers #88
Conversation
Note: go-ipld-cbor does not use this (yet) and this will not speed up any existing applications as is. Once things start using these methods all over, we should notice improvements. |
"strings" | ||
|
||
mbase "github.com/multiformats/go-multibase" | ||
mh "github.com/multiformats/go-multihash" | ||
|
||
cbg "github.com/whyrusleeping/cbor-gen" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
But go doesn't make this easy, does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative was to have the codegen stuff spit out the methods it needs into the generated file, which i'm not opposed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Honestly, that may be a better approach. That way, we can optimize everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Stebalien but even more intensely: I don't think this is viable at all. Having go-cid import cbor packages seems almost backwards. It would put cbor things on the transitive import requirements for not-at-all-cbor things and that's not okay. An import like this from one of our most imported libraries would be a huge lock-in on that package as well, and I don't think that's something we should go into lightly, and probably isn't even intentional.
This has to go in the generated file. Not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only blocking complaint is that the minimum CID size needs to be reliably enforced and stored in a constant.
cid.go
Outdated
@@ -522,6 +525,74 @@ func (c Cid) Prefix() Prefix { | |||
} | |||
} | |||
|
|||
func (c Cid) MarshalCBOR(w io.Writer) error { | |||
tag := cbg.CborEncodeMajorType(6, 42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to be manually parsing CBOR, let's at least get these into constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we'd have one function per major type:
- WriteTag
- WriteInt
- ReadTag
- ReadInt
- WriteBytes
...
) | ||
|
||
// UnsupportedVersionString just holds an error message | ||
const UnsupportedVersionString = "<unsupported cid version>" | ||
|
||
const CidMaxLen = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this affect cids using identity multihashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, do you make identity CIDs that are longer than 256 bytes? If so, that seems rather obscene... but I could be convinced to raise this to 512 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we use it:
https://github.com/Peergos/Peergos/blob/master/src/peergos/shared/crypto/FragmentedPaddedCipherText.java#L71
Non huge directories and files < 4096 bytes are inlined to save a network round trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't be a root object, but inside another ipld object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any restriction here that is not also enforced on identity multihashes would also mean we could construct Cids that would look fine, but just fail to serialize at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the value is 4112 bytes for that example. Here is an example of a 4096 byte file being encrypted and wrapped in some other cbor (so it includes a 4112 byte identity multihash inlined) - in hex in the file here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianopolous please don't do that. We set the default max to 32 in go-ipfs for a reason.
Any restriction here that is not also enforced on identity multihashes would also mean we could construct Cids that would look fine, but just fail to serialize at runtime.
You're right, we should be checking this in the NewCidV1
function as well (and probably when we decode CIDs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien I asked back in March, when I implemented this, and was told inlining 4 KiB identity mulithashes was fine. Now all our users are on it, so breaking that (it is a breaking change in go-ipfs) would break every Peergos user. Current go-ipfs happily accepts much larger values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did I say that that? Go-ipfs doesn't explicitly forbid this anywhere, it's just a bad idea. I had thought we left something somewhere saying inlining anything over 100 bytes was a bad idea but I can't find that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was @whyrusleeping I asked. It makes a bunch of stuff a lot faster for us.
Actually, question: How should we handle 'empty' Cids? Right now, its not handling that case properly. I think currently refmt errors out on this too, saying you can't have an undefined Cid serialized. Do we want to retain that behavior? |
We should return null, that's what we're currently using empty CIDs for. |
@warpfork (who has so far abstained from commented) is really against having a cbor method on cids. I lean towards agreeing, even though it makes my life a bit harder. The solution i'm going to move towards now is to have a separate function that serialized a cid in a different package, and then anything wanting to cbor serialize a CID will have to know to use that method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really, really, really be doing this from the outside. If the constructors exported by this package aren't good enough to do this from outside this package, that's a bug we should fix, and then do this feature from outside this package.
There are a host of reasons for this:
-
go-cid is a very foundational package for us. Its transitive imports should be as minimal as possible, to make it easier to adopt, rather than being perceived as "heavy". (Even if the dependency is "small": the sheer count should go down, not up from its already-too-high number.)
-
anything go-cid depends on becomes a highly-depended upon thing instantly, and becomes therefore very difficult to change. This means adding a dependency on a library that is itself new and has partial feature coverage should be regarded with extreme trepidation.
-
CIDs don't depend on CBOR and having the code do so when the concept doesn't should set off all sorts of alarm bells.
-
the sheer use of the word "CBOR" is memetically problematic here. Having the word "CBOR" appear in this library implies that there's one possible way that go-cid relates to CBOR, and that's just not correct.
-
CBOR isn't even a codec (in the sense of multicodec) -- "dag-cbor" is! Seeing the number "42" appear in the go-cid package is incorrect; and seeing the multibase byte appear in the go-cid package is also incorrect. These are details of a codec. Someday we might have another multicodec which is still visibly cbor-ish -- 'dag-cbor-harder' or something -- and it could choose to omit the multiformat null byte, for example; and this would be unambiguous and a valid design choice by virtue of being a different codec. Therefore we should not have details in go-cid that are dag-cbor specific, nor should we have concepts attached to the word cbor, because both are extremely likely to suggest inaccurate relationships and confuse in the future.
We should not pursue this patch.
Instead: let's do this, but do it from other packages that use go-cid. This should be especially approachable because one of the main reasons we're looking at this is because we're also exploring code-gen tactics across our other libraries and consuming applications.
For example: supposing we have even one other structure which surrounds the use of the CID:
type MyProtocol struct {
sauce cid.Cid
}
func (x *MyProtocol) DecodeDagCborFastPath(io.Reader) { /*...*/ }
Since from this first structure, we already have a control flow defined by MyProtocol.DecodeDagCborFastPath
, it can already call another custom function on the cid field -- and this can be done without that function being a method on cid.Cid
.
This from-the-outside approach will lead to much better outcomes, because it neither adds a dependency to go-cid, nor locks us in to any codecs nor particular implementations of them, nor (most importantly) introduces suggestions of conceptual relationships that are semantically backwards.
EDIT: ahheh, sorry. I'm a slow typer.
This is good stuff and the code generator looks great. Is the plan to have go-ipld-cbor do an interface check for MarshalCBOR and use that if it exists? If so, is there a timeline from that and are PRs welcome? |
@whyrusleeping what's the state of this? |
Closing because stale. But, there could be a case to pick this up again and just implement the |
This is the first PR as part of a broader effort to improve the speed of cbor-ipld.
I implemented the benchmark here elsewhere using the existing go-ipld-cbor, and get the following results:
The benchmarks here give me:
I'm sure it can be improved upon, but CIDs arent even the slow part that i'm trying to optimize.