Add fastpath cbor marshalers #88

whyrusleeping · 2019-08-20T00:15:32Z

This is the first PR as part of a broader effort to improve the speed of cbor-ipld.

I implemented the benchmark here elsewhere using the existing go-ipld-cbor, and get the following results:

why@why-MS-7B46 ~/c/c/cidtest> go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/whyrusleeping/cbor-gen/cidtest
BenchmarkCBORMarshal-16      	 2000000	       712 ns/op	     449 B/op	       9 allocs/op
BenchmarkCBORUnmarshal-16    	 1000000	      1080 ns/op	     480 B/op	       9 allocs/op

The benchmarks here give me:

why@why-MS-7B46 ~/g/s/g/i/go-cid> go test -run=XXX -bench=.
goos: linux
goarch: amd64
pkg: github.com/ipfs/go-cid
BenchmarkCBORMarshal-16      	20000000	       165 ns/op	     128 B/op	       5 allocs/op
BenchmarkCBORUnmarshal-16    	 5000000	       317 ns/op	     208 B/op	       4 allocs/op

I'm sure it can be improved upon, but CIDs arent even the slow part that i'm trying to optimize.

whyrusleeping · 2019-08-20T00:16:26Z

Note: go-ipld-cbor does not use this (yet) and this will not speed up any existing applications as is. Once things start using these methods all over, we should notice improvements.

Stebalien · 2019-08-20T00:18:01Z

cid.go

 	"strings"

 	mbase "github.com/multiformats/go-multibase"
 	mh "github.com/multiformats/go-multihash"
+
+	cbg "github.com/whyrusleeping/cbor-gen"


😭

But go doesn't make this easy, does it.

The alternative was to have the codegen stuff spit out the methods it needs into the generated file, which i'm not opposed to.

Hm. Honestly, that may be a better approach. That way, we can optimize everything.

I agree with @Stebalien but even more intensely: I don't think this is viable at all. Having go-cid import cbor packages seems almost backwards. It would put cbor things on the transitive import requirements for not-at-all-cbor things and that's not okay. An import like this from one of our most imported libraries would be a huge lock-in on that package as well, and I don't think that's something we should go into lightly, and probably isn't even intentional.

This has to go in the generated file. Not here.

Stebalien

My only blocking complaint is that the minimum CID size needs to be reliably enforced and stored in a constant.

cid.go

Stebalien · 2019-08-20T00:23:02Z

cid.go

@@ -522,6 +525,74 @@ func (c Cid) Prefix() Prefix {
 	}
 }

+func (c Cid) MarshalCBOR(w io.Writer) error {
+	tag := cbg.CborEncodeMajorType(6, 42)


If we're going to be manually parsing CBOR, let's at least get these into constants.

Ideally, we'd have one function per major type:

WriteTag

WriteInt

ReadTag

ReadInt

WriteBytes

...

cid.go

ianopolous · 2019-08-20T01:52:55Z

cid.go

 )

 // UnsupportedVersionString just holds an error message
 const UnsupportedVersionString = "<unsupported cid version>"

+const CidMaxLen = 256


will this affect cids using identity multihashes?

Yes, do you make identity CIDs that are longer than 256 bytes? If so, that seems rather obscene... but I could be convinced to raise this to 512 bytes.

This is where we use it:
https://github.com/Peergos/Peergos/blob/master/src/peergos/shared/crypto/FragmentedPaddedCipherText.java#L71
Non huge directories and files < 4096 bytes are inlined to save a network round trip.

That won't be a root object, but inside another ipld object.

Any restriction here that is not also enforced on identity multihashes would also mean we could construct Cids that would look fine, but just fail to serialize at runtime.

Actually the value is 4112 bytes for that example. Here is an example of a 4096 byte file being encrypted and wrapped in some other cbor (so it includes a 4112 byte identity multihash inlined) - in hex in the file here:

https://alpha.peergos.net/#{"secretLink":true,"link":"#6MDZhRRPT4ugkJuUfceM6bPnpQKEj5dB2NqLxD1RxFn3oA3CusXayN8RReauEh/6MDZhRRPT4ugkJuUfcRzRbPpFimcBNJx2N9TJDnL4W3ETYhwdsWdvgCkXkwipF/JAMW71tpHDbgcrYrb7SkzeoeD4u9KWLR5PpUvPiBkVuK/5Pf7SupJAsP3FoAbmb5rakBBPZDz8zA9K8siNbbZLKnKijW3pWf"}

@ianopolous please don't do that. We set the default max to 32 in go-ipfs for a reason.

Any restriction here that is not also enforced on identity multihashes would also mean we could construct Cids that would look fine, but just fail to serialize at runtime.

You're right, we should be checking this in the NewCidV1 function as well (and probably when we decode CIDs).

@Stebalien I asked back in March, when I implemented this, and was told inlining 4 KiB identity mulithashes was fine. Now all our users are on it, so breaking that (it is a breaking change in go-ipfs) would break every Peergos user. Current go-ipfs happily accepts much larger values.

Where did I say that that? Go-ipfs doesn't explicitly forbid this anywhere, it's just a bad idea. I had thought we left something somewhere saying inlining anything over 100 bytes was a bad idea but I can't find that.

It was @whyrusleeping I asked. It makes a bunch of stuff a lot faster for us.

whyrusleeping · 2019-08-20T05:03:25Z

Actually, question: How should we handle 'empty' Cids?

Right now, its not handling that case properly.

I think currently refmt errors out on this too, saying you can't have an undefined Cid serialized.

Do we want to retain that behavior?

Stebalien · 2019-08-20T15:06:20Z

Do we want to retain that behavior?

We should return null, that's what we're currently using empty CIDs for.

whyrusleeping · 2019-08-20T18:12:52Z

@warpfork (who has so far abstained from commented) is really against having a cbor method on cids. I lean towards agreeing, even though it makes my life a bit harder.

The solution i'm going to move towards now is to have a separate function that serialized a cid in a different package, and then anything wanting to cbor serialize a CID will have to know to use that method.

warpfork

We should really, really, really be doing this from the outside. If the constructors exported by this package aren't good enough to do this from outside this package, that's a bug we should fix, and then do this feature from outside this package.

There are a host of reasons for this:

go-cid is a very foundational package for us. Its transitive imports should be as minimal as possible, to make it easier to adopt, rather than being perceived as "heavy". (Even if the dependency is "small": the sheer count should go down, not up from its already-too-high number.)
anything go-cid depends on becomes a highly-depended upon thing instantly, and becomes therefore very difficult to change. This means adding a dependency on a library that is itself new and has partial feature coverage should be regarded with extreme trepidation.
CIDs don't depend on CBOR and having the code do so when the concept doesn't should set off all sorts of alarm bells.
the sheer use of the word "CBOR" is memetically problematic here. Having the word "CBOR" appear in this library implies that there's one possible way that go-cid relates to CBOR, and that's just not correct.
CBOR isn't even a codec (in the sense of multicodec) -- "dag-cbor" is! Seeing the number "42" appear in the go-cid package is incorrect; and seeing the multibase byte appear in the go-cid package is also incorrect. These are details of a codec. Someday we might have another multicodec which is still visibly cbor-ish -- 'dag-cbor-harder' or something -- and it could choose to omit the multiformat null byte, for example; and this would be unambiguous and a valid design choice by virtue of being a different codec. Therefore we should not have details in go-cid that are dag-cbor specific, nor should we have concepts attached to the word cbor, because both are extremely likely to suggest inaccurate relationships and confuse in the future.

We should not pursue this patch.

Instead: let's do this, but do it from other packages that use go-cid. This should be especially approachable because one of the main reasons we're looking at this is because we're also exploring code-gen tactics across our other libraries and consuming applications.

For example: supposing we have even one other structure which surrounds the use of the CID:

type MyProtocol struct {
    sauce cid.Cid
}
func (x *MyProtocol) DecodeDagCborFastPath(io.Reader) { /*...*/ }

Since from this first structure, we already have a control flow defined by MyProtocol.DecodeDagCborFastPath, it can already call another custom function on the cid field -- and this can be done without that function being a method on cid.Cid.

This from-the-outside approach will lead to much better outcomes, because it neither adds a dependency to go-cid, nor locks us in to any codecs nor particular implementations of them, nor (most importantly) introduces suggestions of conceptual relationships that are semantically backwards.

EDIT: ahheh, sorry. I'm a slow typer.

tobowers · 2019-09-01T12:12:28Z

This is good stuff and the code generator looks great. Is the plan to have go-ipld-cbor do an interface check for MarshalCBOR and use that if it exists? If so, is there a timeline from that and are PRs welcome?

Stebalien · 2020-01-10T14:24:39Z

@whyrusleeping what's the state of this?

rvagg · 2023-04-04T04:22:13Z

Closing because stale. But, there could be a case to pick this up again and just implement the MarshalCBOR and UnmarshalCBOR methods, even without the dependency and needing to codegen and it'd be pretty straightforward.

Add fastpath cbor marshalers

aa319eb

whyrusleeping requested review from Stebalien and Kubuxu August 20, 2019 00:15

Stebalien reviewed Aug 20, 2019

View reviewed changes

Stebalien requested changes Aug 20, 2019

View reviewed changes

use constants and enforce max length elsewhere

a5acc01

ianopolous reviewed Aug 20, 2019

View reviewed changes

whyrusleeping mentioned this pull request Aug 20, 2019

Add cbor marshaling fastpath methods on several types filecoin-project/lotus#150

Merged

ci: remove gx support

02253f5

whyrusleeping mentioned this pull request Aug 20, 2019

use protobufs quorumcontrol/go-hamt-ipld#6

Merged

warpfork requested changes Aug 20, 2019

View reviewed changes

ianopolous mentioned this pull request Jul 9, 2020

Size limit of identity hash multiformats/multihash#130

Open

rvagg closed this Apr 4, 2023

rvagg deleted the feat/cbor-marshal branch April 4, 2023 04:22

rvagg restored the feat/cbor-marshal branch April 4, 2023 04:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fastpath cbor marshalers #88

Add fastpath cbor marshalers #88

whyrusleeping commented Aug 20, 2019

whyrusleeping commented Aug 20, 2019

Stebalien Aug 20, 2019

whyrusleeping Aug 20, 2019

Stebalien Aug 20, 2019

warpfork Aug 20, 2019

Stebalien left a comment

Stebalien Aug 20, 2019

Stebalien Aug 20, 2019

ianopolous Aug 20, 2019

whyrusleeping Aug 20, 2019

ianopolous Aug 20, 2019

ianopolous Aug 20, 2019

ianopolous Aug 20, 2019

ianopolous Aug 20, 2019

Stebalien Aug 20, 2019

ianopolous Aug 20, 2019

Stebalien Aug 20, 2019

ianopolous Aug 20, 2019

whyrusleeping commented Aug 20, 2019

Stebalien commented Aug 20, 2019

whyrusleeping commented Aug 20, 2019

warpfork left a comment •

edited

Loading

tobowers commented Sep 1, 2019

Stebalien commented Jan 10, 2020

rvagg commented Apr 4, 2023

Add fastpath cbor marshalers #88

Add fastpath cbor marshalers #88

Conversation

whyrusleeping commented Aug 20, 2019

whyrusleeping commented Aug 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Aug 20, 2019

Stebalien commented Aug 20, 2019

whyrusleeping commented Aug 20, 2019

warpfork left a comment • edited Loading

Choose a reason for hiding this comment

tobowers commented Sep 1, 2019

Stebalien commented Jan 10, 2020

rvagg commented Apr 4, 2023

warpfork left a comment •

edited

Loading