-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an alternative serialization method #58
Conversation
Backed by an array of bytes
if (pos > endPos - 8) { | ||
throw new EOFException(); | ||
} | ||
long value = 0; | ||
value |= Byte.toUnsignedLong(array[pos]); | ||
value |= Byte.toUnsignedLong(array[pos + 1]) << 8; | ||
value |= Byte.toUnsignedLong(array[pos + 2]) << 16; | ||
value |= Byte.toUnsignedLong(array[pos + 3]) << 24; | ||
value |= Byte.toUnsignedLong(array[pos + 4]) << 32; | ||
value |= Byte.toUnsignedLong(array[pos + 5]) << 40; | ||
value |= Byte.toUnsignedLong(array[pos + 6]) << 48; | ||
value |= Byte.toUnsignedLong(array[pos + 7]) << 56; | ||
pos += 8; | ||
return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While these get coalesced into a single load in go this is expensive on the JVM. Consider going multi-release for this - use Unsafe
on JDK8 and MethodHandles.byteArrayViewVarHandle
- javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note - if you use ByteBuffer.putLong
you get the right thing automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note - if you use ByteBuffer.putLong you get the right thing automatically
I may be mistaken, but looking at the implementation, I don't think Java 8 does it. Later versions do it however.
I'd like to keep those optimizations outside of this PR (so as not to block it), but here is something I think we could do based on your feedback. Let me know if that's in line with what you have in mind. I'm also wondering about possible drawbacks of actually using Unsafe
(I can see a couple of warnings when building).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was fixed since JDK8, precisely because assembling this operation can be slow.
As did DataDog/sketches-go#42, this implements a serialization method that is generally faster and produces smaller serialized sketches than the protobuf method.
Here are things that we could work on and that I did not implement as part of this PR:
Input
andOutput
that are backed by (1)ByteBuffer
, (2)java.io.InputStream
/java.io.OutputStream
Store.decodeAndMergeWith()
, especially inBufferedPaginatedStore
(it should make encoding and decoding more performant).Benchmarks
Serialized size
From DataDog/sketches-go#42.
Computational cost
There is a significant gain when using the paginated store, likely because the encoding format is closer to the internal representation of the paginated stores (pages can be independently encoded as contiguous counts).