Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree: Improve performance of tree.AsJSON #87968

Merged
merged 2 commits into from
Sep 17, 2022
Merged

Conversation

miretskiy
Copy link
Contributor

Improve performance of tree.AsJSON method.

These improvements are important for any query that produces
large number of JSON objects, as well as to changefeeds, which
rely on this function when producing JSON encoded feed.

Most of the changes revolved around modifying underlying types
(s.a. date/timestamp types, box2d, etc) to favor using functions
that append to bytes buffer, instead of relying on slower
functions, such as fmt.Sprintf. The conversion
performance improved around 5-10% for most of the types, and
as high as 50% for time types:

Benchmark            old t/op      new t/op    delta
AsJSON/box2d-10    578ns ± 3%    414ns ± 2%   -28.49%  (p=0.000 n=10+9)
AsJSON/box2d[]-10  1.64µs ± 3%   1.19µs ± 4%  -27.14%  (p=0.000 n=10+10)
AsJSON/time-10     232ns ± 2%    103ns ± 1%   -55.61%  (p=0.000 n=10+10)
AsJSON/time[]-10   687ns ± 4%    342ns ± 4%   -50.17%  (p=0.000 n=10+10)

Note: Some types in the local benchmark show slight slow down in speed.
No changes were made in those types, and in general, the encoding speed
of these types might be too fast to reliable detect changes:

Benchmark            old t/op      new t/op       delta
AsJSON/bool[]-10    65.9ns ± 1%   67.7ns ± 2%    +2.79%  (p=0.001 n=8+9)

The emphasis was also placed on reducing allocations.
By relying more heavily on a pooled FmtCtx, which contains
bytes buffer, some conversions resulted in amortized
elimination of allocations (time):

Benchmark               old B/op      new t/op    delta
AsJSON/timestamp-10    42.1B ± 3%      0.0B      -100.00%  (p=0.000 n=10+10)
AsJSON/timestamp[]-10  174B ± 4%      60B ± 1%   -65.75%  (p=0.000 n=10+10)

Release Note: None
Release Justification: performance improvement

@miretskiy miretskiy requested a review from a team as a code owner September 14, 2022 21:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miretskiy
Copy link
Contributor Author

Full benchmark stat:
json_delta.txt

@miretskiy
Copy link
Contributor Author

name                                                old time/op    new time/op    delta
AsJSON/"char"-10                                      17.8ns ± 1%    17.9ns ± 2%      ~     (p=0.127 n=10+10)
AsJSON/"char"[]-10                                     109ns ± 1%     112ns ± 2%    +3.22%  (p=0.000 n=9+10)
AsJSON/anyelement[]-10                                10.1µs ±27%    10.0µs ±15%      ~     (p=0.796 n=10+10)
AsJSON/bit-10                                          132ns ± 2%     132ns ± 3%      ~     (p=0.565 n=10+10)
AsJSON/bit[]-10                                        411ns ± 5%     414ns ± 8%      ~     (p=0.684 n=10+10)
AsJSON/bool-10                                        4.31ns ± 4%    4.35ns ± 2%      ~     (p=0.218 n=10+10)
AsJSON/bool[]-10                                      65.9ns ± 1%    67.7ns ± 2%    +2.79%  (p=0.001 n=8+9)
AsJSON/box2d-10                                        578ns ± 3%     414ns ± 2%   -28.49%  (p=0.000 n=10+9)
AsJSON/box2d[]-10                                     1.64µs ± 3%    1.19µs ± 4%   -27.14%  (p=0.000 n=10+10)
AsJSON/bytes-10                                       80.0ns ± 2%    79.0ns ± 1%    -1.36%  (p=0.009 n=10+10)
AsJSON/bytes[]-10                                      274ns ± 4%     274ns ± 3%      ~     (p=1.000 n=10+10)
AsJSON/char-10                                        17.9ns ± 0%    17.9ns ± 1%      ~     (p=0.162 n=9+10)
AsJSON/char[]-10                                       109ns ± 4%     111ns ± 3%      ~     (p=0.128 n=10+10)
AsJSON/date-10                                         205ns ± 2%     204ns ± 1%      ~     (p=0.510 n=10+9)
AsJSON/date[]-10                                       613ns ± 1%     614ns ± 5%      ~     (p=0.780 n=10+10)
AsJSON/decimal-10                                     25.4ns ± 1%    25.3ns ± 1%      ~     (p=0.101 n=10+10)
AsJSON/decimal[]-10                                    131ns ± 2%     132ns ± 3%      ~     (p=0.284 n=8+10)
AsJSON/float-10                                        417ns ± 3%     416ns ± 1%      ~     (p=0.396 n=10+8)
AsJSON/float[]-10                                     1.16µs ± 3%    1.17µs ± 4%      ~     (p=0.725 n=10+10)
AsJSON/float4-10                                       419ns ± 2%     413ns ± 1%    -1.50%  (p=0.001 n=9+9)
AsJSON/float4[]-10                                    1.17µs ± 3%    1.18µs ± 3%      ~     (p=0.412 n=10+9)
AsJSON/inet-10                                         169ns ± 1%     168ns ± 2%      ~     (p=0.557 n=8+10)
AsJSON/inet[]-10                                       521ns ± 4%     515ns ± 4%      ~     (p=0.190 n=10+10)
AsJSON/int-10                                         32.9ns ± 1%    32.3ns ± 1%    -1.83%  (p=0.000 n=10+9)
AsJSON/int[]-10                                        150ns ± 5%     151ns ± 2%      ~     (p=0.643 n=10+10)
AsJSON/int2-10                                        34.0ns ± 8%    32.4ns ± 1%    -4.68%  (p=0.000 n=10+10)
AsJSON/int2[]-10                                       151ns ± 3%     152ns ± 4%      ~     (p=0.436 n=10+10)
AsJSON/int2vector-10                                   150ns ± 2%     152ns ± 3%      ~     (p=0.254 n=10+10)
AsJSON/int4-10                                        32.6ns ± 1%    32.5ns ± 2%      ~     (p=0.143 n=10+10)
AsJSON/int4[]-10                                       150ns ± 3%     151ns ± 5%      ~     (p=0.516 n=10+10)
AsJSON/interval-10                                     459ns ± 2%     462ns ± 2%      ~     (p=0.286 n=10+9)
AsJSON/interval[]-10                                  1.29µs ± 4%    1.29µs ± 3%      ~     (p=1.000 n=10+10)
AsJSON/jsonb-10                                       4.38ns ± 4%    4.43ns ± 3%      ~     (p=0.105 n=10+10)
AsJSON/jsonb[]-10                                     69.5ns ± 2%    72.1ns ± 2%    +3.70%  (p=0.000 n=10+10)
AsJSON/name-10                                        18.7ns ± 1%    18.8ns ± 2%      ~     (p=0.447 n=10+10)
AsJSON/name[]-10                                       109ns ± 5%     112ns ± 1%    +3.07%  (p=0.002 n=10+9)
AsJSON/oid-10                                         76.8ns ± 1%    69.6ns ± 2%    -9.33%  (p=0.000 n=10+10)
AsJSON/oid[]-10                                        275ns ± 4%     250ns ± 2%    -9.15%  (p=0.000 n=10+9)
AsJSON/oidvector-10                                    271ns ± 4%     252ns ± 4%    -6.86%  (p=0.000 n=10+10)
AsJSON/string-10                                      17.3ns ± 1%    17.3ns ± 1%      ~     (p=0.321 n=10+9)
AsJSON/string[]-10                                     106ns ± 4%     108ns ± 2%    +2.02%  (p=0.020 n=10+10)AsJSON/time-10                                         232ns ± 2%     103ns ± 1%   -55.61%  (p=0.000 n=10+10)
AsJSON/time[]-10                                       687ns ± 4%     342ns ± 4%   -50.17%  (p=0.000 n=10+10)
AsJSON/timestamp-10                                    171ns ± 2%     166ns ± 2%    -2.95%  (p=0.000 n=10+10)
AsJSON/timestamp[]-10                                  523ns ± 4%     510ns ± 2%    -2.42%  (p=0.029 n=10+10)
AsJSON/timestamptz-10                                  177ns ± 1%     174ns ± 2%    -2.05%  (p=0.000 n=10+10)
AsJSON/timestamptz[]-10                                539ns ± 6%     526ns ± 4%      ~     (p=0.062 n=10+9)
AsJSON/timetz-10                                       435ns ± 2%     401ns ± 2%    -7.69%  (p=0.000 n=10+10)
AsJSON/timetz[]-10                                    1.23µs ± 5%    1.13µs ± 4%    -8.23%  (p=0.000 n=10+10)
AsJSON/uuid-10                                        92.2ns ± 1%    55.1ns ± 1%   -40.24%  (p=0.000 n=9+10)
AsJSON/uuid[]-10                                       315ns ± 3%     211ns ± 2%   -33.06%  (p=0.000 n=10+10)
AsJSON/varbit-10                                       131ns ± 3%     130ns ± 2%      ~     (p=0.127 n=10+10)
AsJSON/varbit[]-10                                     412ns ± 4%     413ns ± 2%      ~     (p=0.529 n=10+10)
AsJSON/varchar-10                                     17.3ns ± 0%    17.4ns ± 1%      ~     (p=0.055 n=9+10)
AsJSON/varchar[]-10                                    106ns ± 3%     107ns ± 2%      ~     (p=0.148 n=10+10)
AsJSON/void-10                                        28.1ns ± 1%    27.0ns ± 0%    -3.88%  (p=0.000 n=10+8)
AsJSON/tuple{bit,_bit,_varchar}-10                     505ns ± 2%     501ns ± 1%      ~     (p=0.190 n=10+10)
AsJSON/tuple{bit,_bit,_varchar}[]-10                  1.39µs ± 0%    1.40µs ± 3%      ~     (p=0.529 n=6+9)
AsJSON/tuple-10                                       60.3ns ± 2%    61.5ns ± 1%    +2.03%  (p=0.000 n=10+10)
AsJSON/tuple[]-10                                      216ns ± 3%     221ns ± 4%    +2.13%  (p=0.024 n=10+10)
AsJSON/tuple{oid,_int2,_char}-10                       385ns ± 2%     379ns ± 2%    -1.37%  (p=0.009 n=10+10)
AsJSON/tuple{oid,_int2,_char}[]-10                    1.09µs ± 5%    1.06µs ± 3%    -2.66%  (p=0.014 n=10+9)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}-10       520ns ± 2%     522ns ± 2%      ~     (p=0.425 n=10+10)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}[]-10    1.46µs ± 4%    1.49µs ± 5%    +2.21%  (p=0.050 n=10+10)
AsJSON/tuple{string,_string,_varchar}-10               302ns ± 1%     306ns ± 1%    +1.34%  (p=0.003 n=9+10)
AsJSON/tuple{string,_string,_varchar}[]-10             848ns ± 4%     870ns ± 6%    +2.52%  (p=0.022 n=9+10)

name                                                old alloc/op   new alloc/op   delta
AsJSON/"char"-10                                       13.6B ± 4%     13.6B ± 4%      ~     (p=1.000 n=10+10)
AsJSON/"char"[]-10                                     97.6B ± 1%     97.6B ± 1%      ~     (p=1.000 n=8+8)
AsJSON/anyelement[]-10                                5.27kB ±26%    5.19kB ±17%      ~     (p=0.796 n=10+10)
AsJSON/bit-10                                          64.8B ± 2%     64.1B ± 3%      ~     (p=0.195 n=9+10)
AsJSON/bit[]-10                                         231B ± 4%      231B ± 5%      ~     (p=0.985 n=10+10)
AsJSON/bool-10                                         0.00B          0.00B           ~     (all equal)
AsJSON/bool[]-10                                       59.4B ± 3%     59.8B ± 2%      ~     (p=0.474 n=9+9)
AsJSON/box2d-10                                         391B ± 3%      373B ± 3%    -4.50%  (p=0.000 n=10+10)
AsJSON/box2d[]-10                                     1.11kB ± 4%    1.06kB ± 3%    -4.43%  (p=0.000 n=10+10)
AsJSON/bytes-10                                        25.0B ± 0%     25.0B ± 0%      ~     (all equal)
AsJSON/bytes[]-10                                       126B ± 5%      127B ± 3%      ~     (p=0.357 n=10+10)
AsJSON/char-10                                         14.0B ± 0%     13.7B ± 5%      ~     (p=0.137 n=8+10)
AsJSON/char[]-10                                       96.9B ± 4%     96.5B ± 1%      ~     (p=0.366 n=10+8)
AsJSON/date-10                                         35.0B ± 0%     35.0B ± 0%      ~     (all equal)
AsJSON/date[]-10                                        154B ± 1%      154B ± 6%      ~     (p=0.899 n=10+10)AsJSON/decimal-10                                      28.4B ± 2%     28.0B ± 0%      ~     (p=0.065 n=10+9)
AsJSON/decimal[]-10                                     136B ± 2%      135B ± 4%      ~     (p=0.733 n=9+10)
AsJSON/float-10                                         155B ± 2%      156B ± 2%      ~     (p=0.445 n=10+10)
AsJSON/float[]-10                                       471B ± 2%      474B ± 3%      ~     (p=0.565 n=10+10)
AsJSON/float4-10                                        155B ± 0%      154B ± 1%    -1.10%  (p=0.001 n=8+9)
AsJSON/float4[]-10                                      470B ± 3%      470B ± 2%      ~     (p=0.792 n=10+7)
AsJSON/inet-10                                         99.1B ± 2%     99.3B ± 4%      ~     (p=1.000 n=9+10)
AsJSON/inet[]-10                                        327B ± 3%      322B ± 4%      ~     (p=0.060 n=9+10)
AsJSON/int-10                                          28.3B ± 2%     28.3B ± 2%      ~     (p=1.000 n=10+10)
AsJSON/int[]-10                                         135B ± 5%      136B ± 3%      ~     (p=0.790 n=10+10)
AsJSON/int2-10                                         28.0B ± 0%     28.0B ± 0%      ~     (all equal)
AsJSON/int2[]-10                                        136B ± 5%      135B ± 4%      ~     (p=0.608 n=10+10)
AsJSON/int2vector-10                                    137B ± 2%      136B ± 4%      ~     (p=0.856 n=10+10)
AsJSON/int4-10                                         28.0B ± 0%     28.0B ± 0%      ~     (all equal)
AsJSON/int4[]-10                                        136B ± 4%      136B ± 4%      ~     (p=0.838 n=10+10)
AsJSON/interval-10                                      116B ± 2%      116B ± 2%      ~     (p=0.917 n=10+9)
AsJSON/interval[]-10                                    372B ± 4%      369B ± 3%      ~     (p=0.423 n=10+10)
AsJSON/jsonb-10                                        0.00B          0.00B           ~     (all equal)
AsJSON/jsonb[]-10                                      58.9B ± 3%     59.4B ± 2%      ~     (p=0.321 n=10+10)
AsJSON/name-10                                         12.4B ± 5%     12.4B ± 5%      ~     (p=1.000 n=10+10)
AsJSON/name[]-10                                       93.8B ± 5%     95.0B ± 4%      ~     (p=0.489 n=10+10)
AsJSON/oid-10                                          39.6B ± 2%     26.5B ± 2%   -33.08%  (p=0.000 n=10+10)
AsJSON/oid[]-10                                         168B ± 4%      131B ± 3%   -21.81%  (p=0.000 n=10+9)
AsJSON/oidvector-10                                     165B ± 4%      132B ± 3%   -19.70%  (p=0.000 n=10+10)
AsJSON/string-10                                       12.3B ± 6%     12.0B ± 0%      ~     (p=0.173 n=10+9)
AsJSON/string[]-10                                     93.0B ± 4%     93.7B ± 2%      ~     (p=0.570 n=10+9)
AsJSON/time-10                                         47.7B ± 3%     27.6B ± 2%   -42.14%  (p=0.000 n=10+10)
AsJSON/time[]-10                                        188B ± 4%      135B ± 4%   -28.18%  (p=0.000 n=10+10)
AsJSON/timestamp-10                                    42.1B ± 3%      0.0B       -100.00%  (p=0.000 n=10+10)
AsJSON/timestamp[]-10                                   174B ± 4%       60B ± 1%   -65.75%  (p=0.000 n=10+10)
AsJSON/timestamptz-10                                  42.0B ± 0%      0.0B       -100.00%  (p=0.000 n=9+10)
AsJSON/timestamptz[]-10                                 172B ± 6%       59B ± 2%   -65.60%  (p=0.000 n=10+9)
AsJSON/timetz-10                                        272B ± 3%      254B ± 2%    -6.84%  (p=0.000 n=10+10)
AsJSON/timetz[]-10                                      785B ± 5%      728B ± 4%    -7.29%  (p=0.000 n=10+10)
AsJSON/uuid-10                                          100B ± 2%        0B       -100.00%  (p=0.000 n=10+10)
AsJSON/uuid[]-10                                        327B ± 4%       60B ± 3%   -81.80%  (p=0.000 n=10+10)
AsJSON/varbit-10                                       64.2B ± 6%     64.1B ± 3%      ~     (p=0.879 n=10+10)
AsJSON/varbit[]-10                                      232B ± 4%      234B ± 2%      ~     (p=0.208 n=10+10)
AsJSON/varchar-10                                      12.0B ± 0%     12.4B ± 5%      ~     (p=0.137 n=8+10)
AsJSON/varchar[]-10                                    93.0B ± 2%     92.6B ± 3%      ~     (p=0.602 n=10+10)
AsJSON/void-10                                         0.00B          0.00B           ~     (all equal)AsJSON/tuple{bit,_bit,_varchar}-10                      283B ± 2%      280B ± 1%      ~     (p=0.089 n=10+10)
AsJSON/tuple{bit,_bit,_varchar}[]-10                    802B ± 1%      800B ± 5%      ~     (p=0.908 n=6+10)
AsJSON/tuple-10                                        78.5B ± 2%     78.7B ± 2%      ~     (p=0.532 n=10+10)
AsJSON/tuple[]-10                                       269B ± 4%      269B ± 4%      ~     (p=0.985 n=10+10)
AsJSON/tuple{oid,_int2,_char}-10                        262B ± 2%      252B ± 2%    -4.08%  (p=0.000 n=10+10)
AsJSON/tuple{oid,_int2,_char}[]-10                      761B ± 4%      726B ± 5%    -4.61%  (p=0.000 n=10+10)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}-10        357B ± 2%      349B ± 2%    -2.30%  (p=0.000 n=10+10)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}[]-10    1.01kB ± 4%    1.00kB ± 4%      ~     (p=0.436 n=10+10)
AsJSON/tuple{string,_string,_varchar}-10                226B ± 2%      227B ± 1%      ~     (p=0.779 n=10+10)
AsJSON/tuple{string,_string,_varchar}[]-10              653B ± 4%      660B ± 7%      ~     (p=0.345 n=9+10)

name                                                old allocs/op  new allocs/op  delta
AsJSON/"char"-10                                        0.00           0.00           ~     (all equal)
AsJSON/"char"[]-10                                      4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/anyelement[]-10                                   121 ±26%       118 ±17%      ~     (p=0.670 n=10+10)
AsJSON/bit-10                                           1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/bit[]-10                                         6.00 ± 0%      6.00 ± 0%      ~     (all equal)
AsJSON/bool-10                                          0.00           0.00           ~     (all equal)
AsJSON/bool[]-10                                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/box2d-10                                         13.0 ± 0%       6.0 ± 0%   -53.85%  (p=0.000 n=10+8)
AsJSON/box2d[]-10                                       37.5 ± 4%      19.7 ± 4%   -47.47%  (p=0.000 n=10+10)
AsJSON/bytes-10                                         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/bytes[]-10                                       6.00 ± 0%      6.00 ± 0%      ~     (all equal)
AsJSON/char-10                                          0.00           0.00           ~     (all equal)
AsJSON/char[]-10                                        4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/date-10                                          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
AsJSON/date[]-10                                        8.00 ± 0%      8.30 ± 8%      ~     (p=0.211 n=10+10)
AsJSON/decimal-10                                       0.00           0.00           ~     (all equal)
AsJSON/decimal[]-10                                     4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/float-10                                         6.50 ± 8%      7.00 ± 0%      ~     (p=0.059 n=10+8)
AsJSON/float[]-10                                       20.0 ± 0%      20.0 ± 0%      ~     (all equal)
AsJSON/float4-10                                        6.70 ±10%      6.00 ± 0%   -10.45%  (p=0.001 n=10+8)
AsJSON/float4[]-10                                      20.0 ± 0%      20.0 ± 0%      ~     (all equal)
AsJSON/inet-10                                          3.00 ± 0%      3.00 ± 0%      ~     (all equal)
AsJSON/inet[]-10                                        11.0 ± 0%      11.0 ± 0%      ~     (all equal)
AsJSON/int-10                                           0.00           0.00           ~     (all equal)
AsJSON/int[]-10                                         4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/int2-10                                          0.00           0.00           ~     (all equal)
AsJSON/int2[]-10                                        4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/int2vector-10                                    4.00 ± 0%      4.00 ± 0%      ~     (all equal)
AsJSON/int4-10                                          0.00           0.00           ~     (all equal)
AsJSON/int4[]-10                                        4.00 ± 0%      3.70 ±19%      ~     (p=0.211 n=10+10)AsJSON/interval-10                                      7.00 ± 0%      7.00 ± 0%      ~     (all equal)
AsJSON/interval[]-10                                    20.7 ± 6%      20.7 ± 3%      ~     (p=1.000 n=10+10)
AsJSON/jsonb-10                                         0.00           0.00           ~     (all equal)
AsJSON/jsonb[]-10                                       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/name-10                                          0.00           0.00           ~     (all equal)
AsJSON/name[]-10                                        3.00 ± 0%      3.00 ± 0%      ~     (all equal)
AsJSON/oid-10                                           2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
AsJSON/oid[]-10                                         8.00 ± 0%      6.00 ± 0%   -25.00%  (p=0.000 n=10+10)
AsJSON/oidvector-10                                     8.00 ± 0%      6.00 ± 0%   -25.00%  (p=0.000 n=10+10)
AsJSON/string-10                                        0.00           0.00           ~     (all equal)
AsJSON/string[]-10                                      3.00 ± 0%      3.00 ± 0%      ~     (all equal)
AsJSON/time-10                                          3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
AsJSON/time[]-10                                        10.6 ± 6%       6.0 ± 0%   -43.40%  (p=0.000 n=10+10)
AsJSON/timestamp-10                                     1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
AsJSON/timestamp[]-10                                   6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.000 n=10+10)
AsJSON/timestamptz-10                                   1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
AsJSON/timestamptz[]-10                                 6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.000 n=10+10)
AsJSON/timetz-10                                        10.0 ± 0%       8.0 ± 0%   -20.00%  (p=0.000 n=9+8)
AsJSON/timetz[]-10                                      29.9 ± 6%      24.9 ± 4%   -16.72%  (p=0.000 n=10+10)
AsJSON/uuid-10                                          2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
AsJSON/uuid[]-10                                        8.00 ± 0%      1.00 ± 0%   -87.50%  (p=0.002 n=8+10)
AsJSON/varbit-10                                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/varbit[]-10                                      6.00 ± 0%      6.00 ± 0%      ~     (all equal)
AsJSON/varchar-10                                       0.00           0.00           ~     (all equal)
AsJSON/varchar[]-10                                     3.00 ± 0%      3.00 ± 0%      ~     (all equal)
AsJSON/void-10                                          0.00           0.00           ~     (all equal)
AsJSON/tuple{bit,_bit,_varchar}-10                      10.0 ± 0%      10.0 ± 0%      ~     (all equal)
AsJSON/tuple{bit,_bit,_varchar}[]-10                    28.0 ± 0%      28.3 ± 5%      ~     (p=0.231 n=6+10)
AsJSON/tuple-10                                         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AsJSON/tuple[]-10                                       6.00 ± 0%      6.00 ± 0%      ~     (all equal)
AsJSON/tuple{oid,_int2,_char}-10                        10.0 ± 0%       9.0 ± 0%   -10.00%  (p=0.000 n=10+10)
AsJSON/tuple{oid,_int2,_char}[]-10                      28.5 ± 5%      26.2 ± 5%    -8.07%  (p=0.000 n=10+10)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}-10        12.0 ± 0%      11.7 ± 6%      ~     (p=0.173 n=9+10)
AsJSON/tuple{char,_oid,_char,_jsonb,_decimal}[]-10      35.1 ± 5%      33.6 ± 5%    -4.27%  (p=0.004 n=10+10)
AsJSON/tuple{string,_string,_varchar}-10                8.00 ± 0%      8.00 ± 0%      ~     (all equal)
AsJSON/tuple{string,_string,_varchar}[]-10              23.6 ± 6%      23.6 ± 6%      ~     (p=1.000 n=10+10)

@miretskiy miretskiy force-pushed the json branch 5 times, most recently from 1ca0ef9 to 10c662e Compare September 15, 2022 13:06
Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 2 of 14 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)


pkg/geo/bbox.go line 46 at r2 (raw file):

// AppendFormat appends string representation of the CartesianBoundingBox
// to the buffer, and returns modified buffer.
func (b *CartesianBoundingBox) AppendFormat(buf []byte) []byte {

You might want to look into having this method take a *bytes.Buffer instead and then calling strconv.AppendFloat(buf.Bytes(),.... Might be more trouble than it's worth though since you probably need to check capacity.


pkg/geo/bbox.go line 49 at r2 (raw file):

	// fmt.Sprintf with %f does not truncate leading zeroes, so use
	// AppendFloat instead.
	buf = append(buf, "BOX("...)

Nit: I think a constant performs better than a literal here?


pkg/util/strutil/util.go line 24 at r2 (raw file):

	}

	var scratch [16]byte

Maybe take this as an argument instead of allocating it here?


pkg/util/timetz/timetz.go line 158 at r2 (raw file):

// AppendFormat appends TimeTZ to the buffer, and returns modified buffer.
func (t *TimeTZ) AppendFormat(buf []byte) []byte {
	if len(buf) < 32 {

Use cap rather than len, I think? Or this also looks like another argument for taking a bytes.Buffer in these methods.


pkg/util/timetz/timetz.go line 168 at r2 (raw file):

		buf = append(buf, "24:00:00"...)
	} else {
		buf = tTime.AppendFormat(buf, "15:04:05.999999")

Looks like placeholder code here

@miretskiy miretskiy requested a review from HonoreDB September 15, 2022 13:44
Copy link
Contributor Author

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB)


pkg/geo/bbox.go line 46 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

You might want to look into having this method take a *bytes.Buffer instead and then calling strconv.AppendFloat(buf.Bytes(),.... Might be more trouble than it's worth though since you probably need to check capacity.

I tried various ways; in the end, lots of code could use bytes.Buffer, but lots also needs []byte (e.g. strconv); settled on the lowest denominator.


pkg/geo/bbox.go line 49 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

Nit: I think a constant performs better than a literal here?

Wdym? like const prefix = "BOX("? Why would it be better?


pkg/util/strutil/util.go line 24 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

Maybe take this as an argument instead of allocating it here?

This is actually similar to how appendInt is implemented in time.Time (part of format); I initially
cribbed this method from there, but then realized that I can make it simpler (and more efficient) by using strconv directly (something time.time wants to avoid). At any rate, using scratch space on the stack is very common and very fast.


pkg/util/timetz/timetz.go line 158 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

Use cap rather than len, I think? Or this also looks like another argument for taking a bytes.Buffer in these methods.

I removed this code.


pkg/util/timetz/timetz.go line 168 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

Looks like placeholder code here

No, it's no placeholder -- 24:00:00 is what master version returns -- and the same thing I return.

Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)


pkg/util/timetz/timetz.go line 168 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

No, it's no placeholder -- 24:00:00 is what master version returns -- and the same thing I return.

Sorry I meant to leave this comment on timeComponent := tTime.Format("15:04:05.999999")

@miretskiy
Copy link
Contributor Author

Sorry I meant to leave this comment on timeComponent := tTime.Format("15:04:05.999999")
Ack.

@HonoreDB
Copy link
Contributor

Wdym? like const prefix = "BOX("? Why would it be better?

Withdrawn, I saw that done in fmt and thought it was for performance but I think I was wrong and it makes no difference.

@miretskiy
Copy link
Contributor Author

CI failed due to flaky test being fixed in #87997

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements!

Reviewed 1 of 2 files at r1, 15 of 15 files at r3, 13 of 14 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)


-- commits line 2 at r3:
nit: s/benchmar./benchmark/


pkg/geo/bbox.go line 47 at r5 (raw file):

// to the buffer, and returns modified buffer.
func (b *CartesianBoundingBox) AppendFormat(buf []byte) []byte {
	// fmt.Sprintf with %f does not truncate leading zeroes, so use

nit: we no longer use fmt.Sprintf so the comment should be updated.


pkg/sql/sem/tree/datum.go line 1786 at r5 (raw file):

	}

	var buf [uuid.RFC4122StrSize]byte

nit: should we reuse the newly-introduced ctx.scratch here? It should be a stack allocation so it might not matter much.


pkg/sql/sem/tree/datum.go line 3738 at r5 (raw file):

		// with certain JSON consumers, so we'll use an alternate formatting
		// path here to maintain consistency with PostgreSQL.
		return json.FromString(formatTime(t.UTC(), time.RFC3339Nano)), nil

Previously we were using loc, so I think there is something off here.


pkg/sql/sem/tree/datum.go line 3759 at r5 (raw file):

}

// formatTime formats time with specified layout.

nit: could you please a TODO(yuzefovich) to consider using this function in more spots?


pkg/util/strutil/util.go line 16 at r5 (raw file):

// AppendInt appends the decimal form of x to b and returns the result.
// If the decimal form is shorter than width, the result is padded with leading 0's.

nit: mention what happens if the decimal form is longer than width.


pkg/util/strutil/util.go line 20 at r5 (raw file):

	if x < 0 {
		width--
		x *= -1

nit: it might be faster to do x = -x.


pkg/util/timeofday/time_of_day.go line 58 at r5 (raw file):

}

// AppendFormat appends this TimeOfDay format to the specified buffer

nit: missing period.


pkg/sql/sem/tree/json_test.go line 26 at r3 (raw file):

func BenchmarkAsJSON(b *testing.B) {
	rng := randutil.NewTestRandWithSeed(-4365865412074131521)

nit: is the usage of a fixed seed here to reduce the variance? It probably deserves a quick comment.


pkg/sql/sem/tree/json_test.go line 78 at r3 (raw file):

		// AsJson(Geo) -> MarshalGeo -> go JSON bytes ->  ParseJSON -> Go native -> json.JSON
		// Benchmarking this generates too much noise.
		// TODO: fix this.

nit: TODOs should have a github username attached.

Copy link
Contributor Author

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @yuzefovich)


pkg/geo/bbox.go line 47 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we no longer use fmt.Sprintf so the comment should be updated.

I think the comment used sense; I think it was intended to warn to not use Sprintf;
But I guess it doesn't matter -- removed.


pkg/sql/sem/tree/datum.go line 1786 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we reuse the newly-introduced ctx.scratch here? It should be a stack allocation so it might not matter much.

I guess we could... As long as scratch > 36 bytes (which it is.. I added a comment there to make sure it's not reduced below).


pkg/sql/sem/tree/datum.go line 3738 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Previously we were using loc, so I think there is something off here.

Good catch...


pkg/sql/sem/tree/datum.go line 3759 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: could you please a TODO(yuzefovich) to consider using this function in more spots?

Ack.


pkg/util/strutil/util.go line 16 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: mention what happens if the decimal form is longer than width.

Done; also added a test.


pkg/util/strutil/util.go line 20 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it might be faster to do x = -x.

Done.


pkg/util/timetz/timetz.go line 168 at r2 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

Sorry I meant to leave this comment on timeComponent := tTime.Format("15:04:05.999999")

Done.


pkg/sql/sem/tree/json_test.go line 26 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: is the usage of a fixed seed here to reduce the variance? It probably deserves a quick comment.

Yeah; Not just reduce variance but have reproducible results so that if you make change,
you can rerun against the same types and exactly the same datum stream to see the effect.
Comment added.


pkg/sql/sem/tree/json_test.go line 78 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: TODOs should have a github username attached.

Sure, I'll volunteer :) Eventually.
For now, it was a bit too big of a lift.

miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 16, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 16, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 16, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
Add a micro benchmark for `tree.AsJSON` method.

Release note: None
Release justification: test only change
Improve performance of `tree.AsJSON` method.

These improvements are important for any query that produces
large number of JSON objects, as well as to changefeeds, which
rely on this function when producing JSON encoded feed.

Most of the changes revolved around modifying underlying types
(s.a. date/timestamp types, box2d, etc) to favor using functions
that append to bytes buffer, instead of relying on slower
functions, such as `fmt.Sprintf`.  The conversion
performance improved around 5-10% for most of the types, and
as high as 50% for time types:

```
Benchmark            old t/op      new t/op    delta
AsJSON/box2d-10    578ns ± 3%    414ns ± 2%   -28.49%  (p=0.000 n=10+9)
AsJSON/box2d[]-10  1.64µs ± 3%   1.19µs ± 4%  -27.14%  (p=0.000 n=10+10)
AsJSON/time-10     232ns ± 2%    103ns ± 1%   -55.61%  (p=0.000 n=10+10)
AsJSON/time[]-10   687ns ± 4%    342ns ± 4%   -50.17%  (p=0.000 n=10+10)
```

Note: Some types in the local benchmark show slight slow down in speed.
No changes were made in those types, and in general, the encoding speed
of these types might be too fast to reliable detect changes:
```
Benchmark            old t/op      new t/op       delta
AsJSON/bool[]-10    65.9ns ± 1%   67.7ns ± 2%    +2.79%  (p=0.001 n=8+9)
```

The emphasis was also placed on reducing allocations.
By relying more heavily on a pooled FmtCtx, which contains
bytes buffer, some conversions resulted in amortized
elimination of allocations (time):
```
Benchmark               old B/op      new t/op    delta
AsJSON/timestamp-10    42.1B ± 3%      0.0B      -100.00%  (p=0.000 n=10+10)
AsJSON/timestamp[]-10  174B ± 4%      60B ± 1%   -65.75%  (p=0.000 n=10+10)
```

Release Note: None
Release Justification: performance improvement
@miretskiy
Copy link
Contributor Author

Bors r+

@craig
Copy link
Contributor

craig bot commented Sep 17, 2022

Build succeeded:

@craig craig bot merged commit 3fa1a16 into cockroachdb:master Sep 17, 2022
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 18, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 20, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 22, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
craig bot pushed a commit that referenced this pull request Sep 22, 2022
88064: changefeedccl: Improve JSON encoder performance  r=honoredb a=miretskiy

Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in #87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 28, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Sep 28, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Oct 31, 2022
Rewrite JSON encoder to improve its performance.

Prior to this change JSON encoder was very inefficient.
This inefficiency had multiple underlying reasons:
  * New Go map objects were constructed for each event.
  * Underlying json conversion functions had inefficiencies
    (tracked in cockroachdb#87968)
  * Conversion of Go maps to JSON incurs the cost
    of sorting the keys -- for each row. Sorting,
    particularly when rows are wide, has significant cost.
  * Each conversion to JSON allocated new array builder
    (to encode keys) and new object builder; that too has cost.
  * Underlying code structure, while attempting to reuse
    code when constructing different "envelope" formats,
    cause the code to be more inefficient.

This PR addresses all of the above.  In particular, since
a schema version for the table is guaranteeed to have
the same set of primary key and value columns, we can construct
JSON builders once.  The expensive sort operation can be performed
once per version; builders can be memoized and cached.

The performance impact is significant:
  * Key encoding speed up is 5-30%, depending on the number of primary
    keys.
  * Value encoding 30% - 60% faster (slowest being "wrapped" envelope
    with diff -- which effectively encodes 2x values)
  * Byte allocations per row reduces by over 70%, with the number
    of allocations reduced similarly.

Release note (enterprise change): Changefeed JSON encoder
performance improved by 50%.
Release justification: performance improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants