-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce encoding and decoding allocations #16
Conversation
c48fb8e
to
e50eac4
Compare
I reduced the allocation count by 1. Adding a scratch buffer to the Encoder/decoder. The scratch buffer is used as a temporary buffer. Before that, temporary buffers were allocated (in the heap since they escaped the stack) for basic type encoding/decoding (e.g. `DecodeInt()` `EncodeInt()` 2. Making `DecodeFixedOpaque()` decode in-place instead of allocating the result. Apart from reducing allocations, this removes the need of copying (as shown by `decodeFixedArray()`. The pre-existing (and admitedly very limited) benchmarks show a sizeable improvement: Before: ``` goos: darwin goarch: amd64 pkg: github.com/stellar/go-xdr/xdr3 cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz BenchmarkUnmarshal BenchmarkUnmarshal-8 1445005 853.5 ns/op 18.75 MB/s 152 B/op 11 allocs/op BenchmarkMarshal BenchmarkMarshal-8 1591068 643.0 ns/op 24.88 MB/s 104 B/op 10 allocs/op PASS ``` After: ``` goos: darwin goarch: amd64 pkg: github.com/stellar/go-xdr/xdr3 cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz BenchmarkUnmarshal BenchmarkUnmarshal-8 1753954 661.4 ns/op 24.19 MB/s 144 B/op 8 allocs/op BenchmarkMarshal BenchmarkMarshal-8 2103310 537.1 ns/op 29.79 MB/s 104 B/op 8 allocs/op PASS ``` Both decoding and encoding both go down to 8 allocations from 11 (decoding) and 10 (encoding) allocations per operation. More context at stellar/go#4022 (comment)
e50eac4
to
4590e56
Compare
2d3a0ed
to
4bdb086
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I have one ask (❗), and a suggestion (💡), and some questions (❔) and thoughts (💭). Other than the ask it looks good to me. I think @bartekn should review this too, and he may have thoughts on the testing method.
b := enc.scratchBuf[:pad] | ||
for i := 0; i < pad; i++ { | ||
b[i] = 0x0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I wonder what bugs using a shared scratch buffer will open up. Would it impact performance to use a function to get at the buf which zeroed the buf everytime? Maybe not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would surely impact performance, but I don't know to what extent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency problems came to mind when I added the buffer, but that would require multiple goroutines to share the same encoder/decoder, which I don't think makes sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I don't think this needs to support concurrency. It's unlikely a reader would support that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
94b38a1
to
87cb50a
Compare
87cb50a
to
d53a34f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Awesome work.
I reduced the allocation count by:
Adding a scratch buffer to the Encoder/decoder. The scratch
buffer is used as a temporary buffer. Before that, temporary
buffers were allocated (in the heap since they escaped the stack)
for basic type encoding/decoding (e.g.
DecodeInt()
EncodeInt()
Making
DecodeFixedOpaque()
decode in-place instead ofallocating the result. Apart from reducing allocations,
this removes the need of copying (as shown by
decodeFixedArray()
.The pre-existing (and admitedly very limited) benchmarks show a sizeable improvement:
Before:
After:
Decoding goes down to 8 allocations from 11 allocations per operation.
Encoding goes down to 7 allocations from 10 allocations per operation.
More context at stellar/go#4022 (comment)