Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an alternative serialization method #58

Merged
merged 15 commits into from
Jul 22, 2021
Merged

Conversation

CharlesMasson
Copy link
Contributor

@CharlesMasson CharlesMasson commented Jul 13, 2021

As did DataDog/sketches-go#42, this implements a serialization method that is generally faster and produces smaller serialized sketches than the protobuf method.

Here are things that we could work on and that I did not implement as part of this PR:

  • add implementations of Input and Output that are backed by (1) ByteBuffer, (2) java.io.InputStream/java.io.OutputStream
  • add specialized implementations of Store.decodeAndMergeWith(), especially in BufferedPaginatedStore (it should make encoding and decoding more performant).

Benchmarks

Serialized size

From DataDog/sketches-go#42.

    ddsketch_test.go:513: test case                                      proto custom custom_no_mapping
    ddsketch_test.go:521: dense/empty                                       17     17                 0
    ddsketch_test.go:521: dense/small_int_count                           1116    161               144
    ddsketch_test.go:521: dense/small_non_int_count                       1116    192               175
    ddsketch_test.go:521: sparse/single_value                               31     22                 5
    ddsketch_test.go:521: sparse/small_int_count                            66     30                13
    ddsketch_test.go:521: sparse/log_normal_int_count                    10124   2600              2598
    ddsketch_test.go:521: buffered_paginated/empty                          17     17                 0
    ddsketch_test.go:521: buffered_paginated/single_value                   31     21                 4
    ddsketch_test.go:521: buffered_paginated/small_int_count                66     27                10
    ddsketch_test.go:521: buffered_paginated/small_non_int_count            66    156               139
    ddsketch_test.go:521: buffered_paginated/int_count_linear             6100    866               849
    ddsketch_test.go:521: buffered_paginated/log_normal_int_count        10250   1531              1514
    ddsketch_test.go:521: buffered_paginated/log_normal_non_int_count    10054   6285              6268

Computational cost

Benchmark                (count)                (generator)  (sketchOption)  Mode  Cnt   Score   Error  Units
Serialize.toProto         100000                    POISSON       PAGINATED  avgt    3   8.973 ± 3.771  us/op
Serialize.serialize       100000                    POISSON       PAGINATED  avgt    3   3.158 ± 0.164  us/op
Serialize.encode          100000                    POISSON       PAGINATED  avgt    3   2.007 ± 0.067  us/op
Serialize.encodeReusing   100000                    POISSON       PAGINATED  avgt    3   1.581 ± 0.148  us/op
Serialize.toProto         100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3  44.719 ± 4.861  us/op
Serialize.serialize       100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3  13.049 ± 0.956  us/op
Serialize.encode          100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3   5.067 ± 0.301  us/op
Serialize.encodeReusing   100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3   4.082 ± 0.782  us/op
Serialize.toProto         100000            TRIMODAL_NORMAL       PAGINATED  avgt    3  14.577 ± 2.183  us/op
Serialize.serialize       100000            TRIMODAL_NORMAL       PAGINATED  avgt    3   6.033 ± 0.837  us/op
Serialize.encode          100000            TRIMODAL_NORMAL       PAGINATED  avgt    3   2.278 ± 0.280  us/op
Serialize.encodeReusing   100000            TRIMODAL_NORMAL       PAGINATED  avgt    3   1.502 ± 0.159  us/op
Serialize.toProto         100000                    POISSON        BALANCED  avgt    3   3.655 ± 4.965  us/op
Serialize.serialize       100000                    POISSON        BALANCED  avgt    3   0.999 ± 0.125  us/op
Serialize.encode          100000                    POISSON        BALANCED  avgt    3   1.673 ± 0.213  us/op
Serialize.encodeReusing   100000                    POISSON        BALANCED  avgt    3   1.011 ± 0.155  us/op
Serialize.toProto         100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   4.986 ± 1.923  us/op
Serialize.serialize       100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   1.660 ± 0.164  us/op
Serialize.encode          100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   3.551 ± 0.597  us/op
Serialize.encodeReusing   100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   2.296 ± 0.145  us/op
Serialize.toProto         100000            TRIMODAL_NORMAL        BALANCED  avgt    3   3.518 ± 1.629  us/op
Serialize.serialize       100000            TRIMODAL_NORMAL        BALANCED  avgt    3   1.102 ± 0.173  us/op
Serialize.encode          100000            TRIMODAL_NORMAL        BALANCED  avgt    3   1.789 ± 0.126  us/op
Serialize.encodeReusing   100000            TRIMODAL_NORMAL        BALANCED  avgt    3   1.273 ± 0.058  us/op

There is a significant gain when using the paginated store, likely because the encoding format is closer to the internal representation of the paginated stores (pages can be independently encoded as contiguous counts).

Benchmark                  (count)                (generator)  (sketchOption)  Mode  Cnt   Score    Error  Units
Deserialize.fromProto       100000                    POISSON       PAGINATED  avgt    3   8.207 ±  4.893  us/op
Deserialize.decode          100000                    POISSON       PAGINATED  avgt    3   2.990 ±  0.963  us/op
Deserialize.decodeReusing   100000                    POISSON       PAGINATED  avgt    3   1.936 ±  0.344  us/op
Deserialize.fromProto       100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3  34.745 ± 17.140  us/op
Deserialize.decode          100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3   5.773 ±  1.660  us/op
Deserialize.decodeReusing   100000  COMPOSITE_POISSON_EXTREME       PAGINATED  avgt    3   5.274 ±  9.821  us/op
Deserialize.fromProto       100000            TRIMODAL_NORMAL       PAGINATED  avgt    3  16.349 ±  4.144  us/op
Deserialize.decode          100000            TRIMODAL_NORMAL       PAGINATED  avgt    3   2.980 ±  0.240  us/op
Deserialize.decodeReusing   100000            TRIMODAL_NORMAL       PAGINATED  avgt    3   2.333 ±  0.335  us/op
Deserialize.fromProto       100000                    POISSON        BALANCED  avgt    3   6.017 ±  2.188  us/op
Deserialize.decode          100000                    POISSON        BALANCED  avgt    3   2.830 ±  1.243  us/op
Deserialize.decodeReusing   100000                    POISSON        BALANCED  avgt    3   1.904 ±  1.352  us/op
Deserialize.fromProto       100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3  13.687 ± 10.720  us/op
Deserialize.decode          100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   6.769 ±  1.126  us/op
Deserialize.decodeReusing   100000  COMPOSITE_POISSON_EXTREME        BALANCED  avgt    3   4.346 ±  4.982  us/op
Deserialize.fromProto       100000            TRIMODAL_NORMAL        BALANCED  avgt    3   6.772 ±  6.477  us/op
Deserialize.decode          100000            TRIMODAL_NORMAL        BALANCED  avgt    3   3.535 ±  2.058  us/op
Deserialize.decodeReusing   100000            TRIMODAL_NORMAL        BALANCED  avgt    3   2.395 ±  0.628  us/op

Comment on lines +48 to +61
if (pos > endPos - 8) {
throw new EOFException();
}
long value = 0;
value |= Byte.toUnsignedLong(array[pos]);
value |= Byte.toUnsignedLong(array[pos + 1]) << 8;
value |= Byte.toUnsignedLong(array[pos + 2]) << 16;
value |= Byte.toUnsignedLong(array[pos + 3]) << 24;
value |= Byte.toUnsignedLong(array[pos + 4]) << 32;
value |= Byte.toUnsignedLong(array[pos + 5]) << 40;
value |= Byte.toUnsignedLong(array[pos + 6]) << 48;
value |= Byte.toUnsignedLong(array[pos + 7]) << 56;
pos += 8;
return value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While these get coalesced into a single load in go this is expensive on the JVM. Consider going multi-release for this - use Unsafe on JDK8 and MethodHandles.byteArrayViewVarHandle - javadoc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note - if you use ByteBuffer.putLong you get the right thing automatically

Copy link
Contributor Author

@CharlesMasson CharlesMasson Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note - if you use ByteBuffer.putLong you get the right thing automatically

I may be mistaken, but looking at the implementation, I don't think Java 8 does it. Later versions do it however.

I'd like to keep those optimizations outside of this PR (so as not to block it), but here is something I think we could do based on your feedback. Let me know if that's in line with what you have in mind. I'm also wondering about possible drawbacks of actually using Unsafe (I can see a couple of warnings when building).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was fixed since JDK8, precisely because assembling this operation can be slow.

@CharlesMasson CharlesMasson merged commit e19ada9 into master Jul 22, 2021
@CharlesMasson CharlesMasson deleted the cmasson/serialization branch July 22, 2021 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants