-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a lot simpler sorted uint64 codec #2716
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 4 unresolved discussions (waiting on @golangcibot)
binary/codec.go, line 7 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
func
packBlock
is unused
Done.
binary/codec.go, line 27 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
func
Encode
is unused
Done.
binary/codec.go, line 43 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
func
NumUids
is unused
Done.
binary/codec.go, line 51 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
func
Decode
is unused
Done.
This PR contains a simple encoder/decoder using concepts from binary packing building on protobuf's default varint encoding. We currently use bp128 package, a modification on a 3rd party library, which uses a complex piece of code, that hasn't been touched since it was written. It is hard to understand and debug, given a lot of it is using SIMD instructions generated via a python library. We have been seeing panics during rollups, and reads which indicate bugs in this implementation. Instead of trying to fix that up, I wrote this PR, which uses concepts from Binary Packing and the fact that protobufs use Varint encoding to write a very simple encoder/decoder for posting lists. This PR just contains the implementation, tests and benchmarks. Next PR would modify posting lists to replace the current bp128 with the new binary package. Benchmark output with artificial dataset list: ``` goos: linux goarch: amd64 pkg: github.com/dgraph-io/dgraph/binary BenchmarkGzip-4 10 112428082 ns/op --- BENCH: BenchmarkGzip-4 codec_test.go:77: Dataset Len=1000000. Size: 8.0 MB codec_test.go:95: Output size: 2.6 MB. Compression: 0.33 codec_test.go:77: Dataset Len=1000000. Size: 8.0 MB codec_test.go:95: Output size: 2.6 MB. Compression: 0.33 BenchmarkUidPack/encode/128-4 100 14213068 ns/op --- BENCH: BenchmarkUidPack/encode/128-4 codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB codec_test.go:115: Output size: 1.1 MB. Compression: 0.13 codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB codec_test.go:115: Output size: 1.1 MB. Compression: 0.13 BenchmarkUidPack/encode/256-4 100 12399005 ns/op --- BENCH: BenchmarkUidPack/encode/256-4 codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB codec_test.go:115: Output size: 1.0 MB. Compression: 0.13 codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB codec_test.go:115: Output size: 1.0 MB. Compression: 0.13 BenchmarkUidPack/decode/128-4 1000 2213527 ns/op --- BENCH: BenchmarkUidPack/decode/128-4 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.1 MB. Compression: 0.13 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.1 MB. Compression: 0.13 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.1 MB. Compression: 0.13 BenchmarkUidPack/decode/256-4 1000 2014434 ns/op --- BENCH: BenchmarkUidPack/decode/256-4 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.0 MB. Compression: 0.13 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.0 MB. Compression: 0.13 codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB codec_test.go:145: Output size: 1.0 MB. Compression: 0.13 PASS ok github.com/dgraph-io/dgraph/binary 17.786s ```
We currently use bp128 package, a modification on a 3rd party library, which uses a complex piece of code, that hasn't been touched since it was written. It is hard to understand and debug, given a lot of it is using SIMD instructions, generated via a python library.
We have been seeing panics during rollups, and reads which indicate bugs in this implementation. Instead of trying to fix that up, I wrote this PR, which uses concepts from Binary Packing and the fact that protobufs use Varint encoding, to write a very simple encoder/decoder for posting lists.
This PR just contains the implementation, tests and benchmarks. Next PR would modify posting lists to replace the current bp128 with the new binary package.
This change is