Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cheaper cloning of ArrayData #1518

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from
Draft

feat: cheaper cloning of ArrayData #1518

wants to merge 4 commits into from

Conversation

gatesn
Copy link
Contributor

@gatesn gatesn commented Dec 1, 2024

We assume cloning ArrayData is cheap, but currently it's cloning a DType + 4 Arcs.

I want to see what making InnerArrayData an Arc would do to performance. There's obviously follow ups to unwrap the inner arc's if we it works well.

@gatesn gatesn added the benchmark Run benchmarks on this branch label Dec 1, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Dec 1, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion

Benchmark suite Current: 3f35120 Previous: 91f3efe Ratio
arrow/planning 813044.2352168993 ns (2105.1602472889354) 809023.9670166407 ns (1639.2113044043072) 1.00
arrow/exec 1768732.0071097547 ns (5834.963560998556) 1782565.44320029 ns (5288.634808596456) 0.99
vortex-pushdown-compressed/planning 506055.47521057114 ns (1054.1482028008613) 502879.68813224323 ns (1257.7711339545494) 1.01
vortex-pushdown-compressed/exec 2672743.740526315 ns (8803.068460527109) 3420806.958235294 ns (28307.64570588246) 0.78
vortex-pushdown-uncompressed/planning 505427.9231972509 ns (1287.685032191599) 509004.49543654145 ns (1567.5956271940377) 0.99
vortex-pushdown-uncompressed/exec 1511151.4528455534 ns (6141.684094711323) 1503060.3118089647 ns (6011.458172130398) 1.01
vortex-nopushdown-compressed/planning 839612.634153122 ns (4958.80518413597) 840787.1871124875 ns (3715.5511918517877) 1.00
vortex-nopushdown-compressed/exec 3711793.503571429 ns (61435.38213392906) 4133528.673333332 ns (47430.1412500015) 0.90
vortex-nopushdown-uncompressed/planning 821276.8470013754 ns (2133.5785147416173) 827939.5169493048 ns (1808.5379252581042) 0.99
vortex-nopushdown-uncompressed/exec 5099033.73 ns (37782.05660000071) 5582391.272999999 ns (82622.9213875006) 0.91

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random Access

Benchmark suite Current: 3f35120 Previous: 91f3efe Ratio
random-access/vortex-tokio-local-disk 2197674.36 ns (45014.310625000624) 2333145.0860869572 ns (35585.933804348344) 0.94
random-access/vortex-local-fs 2622940.9271428566 ns (41441.587922618724) 2820563.3805263154 ns (37739.429999999935) 0.93
random-access/parquet-tokio-local-disk 228006423 ns (3752880.802916676) 226576287.3 ns (5162648.760416657) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC-H

Benchmark suite Current: 3f35120 Previous: 91f3efe Ratio
tpch_q1/arrow 538222892 ns 534158153 ns 1.01
tpch_q1/parquet 761206331 ns 767456401 ns 0.99
tpch_q1/vortex-file-compressed 447242006 ns 438560029 ns 1.02
tpch_q2/arrow 142638944 ns 142839221 ns 1.00
tpch_q2/parquet 175157393 ns 173473916 ns 1.01
tpch_q2/vortex-file-compressed 139510950 ns 141404918 ns 0.99
tpch_q3/arrow 168412065 ns 173358601 ns 0.97
tpch_q3/parquet 373579389 ns 376769623 ns 0.99
tpch_q3/vortex-file-compressed 216620317 ns 223110286 ns 0.97
tpch_q4/arrow 179023821 ns 176777103 ns 1.01
tpch_q4/parquet 221002183 ns 217757805 ns 1.01
tpch_q4/vortex-file-compressed 139674320 ns 139048244 ns 1.00
tpch_q5/arrow 322358339 ns 321625443 ns 1.00
tpch_q5/parquet 500843726 ns 498386056 ns 1.00
tpch_q5/vortex-file-compressed 318598903 ns 317083894 ns 1.00
tpch_q6/arrow 25646272 ns 24838028 ns 1.03
tpch_q6/parquet 152467462 ns 151681164 ns 1.01
tpch_q6/vortex-file-compressed 12988317 ns 13715892 ns 0.95
tpch_q7/arrow 624096966 ns 621953135 ns 1.00
tpch_q7/parquet 775094094 ns 775177240 ns 1.00
tpch_q7/vortex-file-compressed 615495315 ns 632432524 ns 0.97
tpch_q8/arrow 260948027 ns 259519200 ns 1.01
tpch_q8/parquet 540595160 ns 547991652 ns 0.99
tpch_q8/vortex-file-compressed 283598992 ns 278870293 ns 1.02
tpch_q9/arrow 475874086 ns 471422864 ns 1.01
tpch_q9/parquet 779136026 ns 784309228 ns 0.99
tpch_q9/vortex-file-compressed 502378915 ns 508183056 ns 0.99
tpch_q10/arrow 264200886 ns 266205771 ns 0.99
tpch_q10/parquet 513684454 ns 510678728 ns 1.01
tpch_q10/vortex-file-compressed 270602764 ns 270914863 ns 1.00
tpch_q11/arrow 140080283 ns 140012075 ns 1.00
tpch_q11/parquet 147420466 ns 145984232 ns 1.01
tpch_q11/vortex-file-compressed 122717119 ns 121922467 ns 1.01
tpch_q12/arrow 183648492 ns 184335331 ns 1.00
tpch_q12/parquet 329654076 ns 332077632 ns 0.99
tpch_q12/vortex-file-compressed 201977154 ns 205003658 ns 0.99
tpch_q13/arrow 170118072 ns 167938775 ns 1.01
tpch_q13/parquet 309662968 ns 311086581 ns 1.00
tpch_q13/vortex-file-compressed 179148305 ns 177837853 ns 1.01
tpch_q14/arrow 36997913 ns 38385825 ns 0.96
tpch_q14/parquet 235261377 ns 234238927 ns 1.00
tpch_q14/vortex-file-compressed 79273706 ns 80150648 ns 0.99
tpch_q15/arrow 69285095 ns 67591327 ns 1.03
tpch_q15/parquet 326250420 ns 323822736 ns 1.01
tpch_q15/vortex-file-compressed 145858138 ns 146184314 ns 1.00
tpch_q16/arrow 104297897 ns 104776215 ns 1.00
tpch_q16/parquet 119084598 ns 117515904 ns 1.01
tpch_q16/vortex-file-compressed 102621554 ns 105367334 ns 0.97
tpch_q17/arrow 606275737 ns 600999255 ns 1.01
tpch_q17/parquet 688640842 ns 685693260 ns 1.00
tpch_q17/vortex-file-compressed 548654064 ns 543649016 ns 1.01
tpch_q18/arrow 1147178274 ns 1132552450 ns 1.01
tpch_q18/parquet 1363750448 ns 1356862158 ns 1.01
tpch_q18/vortex-file-compressed 1120661942 ns 1128738436 ns 0.99
tpch_q19/arrow 148640329 ns 151253150 ns 0.98
tpch_q19/parquet 421118988 ns 421966272 ns 1.00
tpch_q19/vortex-file-compressed 131159915 ns 133927808 ns 0.98
tpch_q20/arrow 212094854 ns 213322478 ns 0.99
tpch_q20/parquet 346552332 ns 349722907 ns 0.99
tpch_q20/vortex-file-compressed 266273466 ns 261376737 ns 1.02
tpch_q21/arrow 994440621 ns 986944082 ns 1.01
tpch_q21/parquet 1095013524 ns 1097663771 ns 1.00
tpch_q21/vortex-file-compressed 868090241 ns 854181797 ns 1.02
tpch_q22/arrow 79203496 ns 80041252 ns 0.99
tpch_q22/parquet 111080131 ns 110782864 ns 1.00
tpch_q22/vortex-file-compressed 84620061 ns 84532032 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex Compression

Benchmark suite Current: 3f35120 Previous: 91f3efe Ratio
compress time/taxi 1238579008.6 ns (2744155.6999999285) 1280914368 ns (3467727.068750024) 0.97
compress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd compress time/taxi 1695888031.3 ns (4523084.493749976) 1715274737.9 ns (2925052.0499999523) 0.99
parquet_rs-zstd compress time/taxi throughput 470808924 bytes 470808924 bytes 1
decompress time/taxi 364869921 ns (1491133.0212500095) 365649193.7 ns (937713.4462499917) 1.00
decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd decompress time/taxi 305785277.6 ns (396315.90000000596) 305898141.1 ns (1223442.048750013) 1.00
parquet_rs-zstd decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
vortex:parquet-zstd size/taxi 1.0302258312558379 ratio 1.0302258312558379 ratio 1
vortex:raw size/taxi 0.12245365170690775 ratio 0.12245365170690775 ratio 1
vortex size/taxi 57652272 bytes 57652272 bytes 1
compress time/AirlineSentiment 937836.6840107709 ns (3475.441219529428) 1011066.8432486772 ns (1922.4451812169864) 0.93
compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd compress time/AirlineSentiment 55812.57787146055 ns (200.89080236769587) 56283.8677563251 ns (480.23826833687417) 0.99
parquet_rs-zstd compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
decompress time/AirlineSentiment 115652.71287442728 ns (633.3508684135159) 124718.51963326358 ns (691.9176015706616) 0.93
decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd decompress time/AirlineSentiment 31964.451924814104 ns (139.71444038056688) 32302.15176546693 ns (80.47472319362532) 0.99
parquet_rs-zstd decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
vortex:parquet-zstd size/AirlineSentiment 11.544984488107549 ratio 11.544984488107549 ratio 1
vortex:raw size/AirlineSentiment 5.526732673267326 ratio 5.526732673267326 ratio 1
vortex size/AirlineSentiment 11164 bytes 11164 bytes 1
compress time/Arade 2229267632.2 ns (5353553.28000021) 2284827946.8 ns (5429973.150000095) 0.98
compress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd compress time/Arade 2914174483.6 ns (9356533.599999905) 2909340721.3 ns (10044920.950000048) 1.00
parquet_rs-zstd compress time/Arade throughput 787023760 bytes 787023760 bytes 1
decompress time/Arade 610295997.2 ns (1695715.0325000286) 617269703.3 ns (1531450.7962499857) 0.99
decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd decompress time/Arade 667066737.9 ns (1968393.0687500238) 668567233.9 ns (2275752.3875000477) 1.00
parquet_rs-zstd decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
vortex:parquet-zstd size/Arade 0.4938228181453686 ratio 0.4938228181453686 ratio 1
vortex:raw size/Arade 0.1916207764807507 ratio 0.1916207764807507 ratio 1
vortex size/Arade 150810104 bytes 150810104 bytes 1
compress time/Bimbo 10223168581.3 ns (19534383.150000572) 10630434558.9 ns (9324117.269999504) 0.96
compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd compress time/Bimbo 19385965548.3 ns (38269336.66749954) 19504313011.8 ns (37191737.285001755) 0.99
parquet_rs-zstd compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
decompress time/Bimbo 3942133338.6 ns (23101571.20875001) 3917863187.5 ns (10359192.372499943) 1.01
decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd decompress time/Bimbo 2693231553.5 ns (7250996.199999809) 2659382857 ns (6953170.914999962) 1.01
parquet_rs-zstd decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
vortex:parquet-zstd size/Bimbo 1.8549111341909053 ratio 1.8549111341909053 ratio 1
vortex:raw size/Bimbo 0.1011026298769628 ratio 0.1011026298769628 ratio 1
vortex size/Bimbo 719985556 bytes 719985556 bytes 1
compress time/CMSprovider 12117820870.7 ns (10243932.902500153) 12543989160.1 ns (12674697.526249886) 0.97
compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd compress time/CMSprovider 18618764593.8 ns (29207714.988750458) 18807362807.9 ns (67344602.5862484) 0.99
parquet_rs-zstd compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
decompress time/CMSprovider 4059278333.8 ns (372618399.03624964) 4251222012.5 ns (329313323.68124986) 0.95
decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd decompress time/CMSprovider 5661728667.8 ns (15332237.30000019) 5366801361 ns (11082547.700000286) 1.05
parquet_rs-zstd decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
vortex:parquet-zstd size/CMSprovider 1.3078145376201928 ratio 1.3284145829823057 ratio 0.98
vortex:raw size/CMSprovider 0.19544725569555155 ratio 0.19852584384196814 ratio 0.98
vortex size/CMSprovider 1006382148 bytes 1022234180 bytes 0.98
compress time/Euro2016 2536197675.6 ns (5829249.650000095) 2663151059.4 ns (4734740.045000076) 0.95
compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd compress time/Euro2016 1542078303.6 ns (3528384.8650000095) 1538858871.5 ns (5180522.716249943) 1.00
parquet_rs-zstd compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
decompress time/Euro2016 295217749.5 ns (1530489.4612500072) 298432827.4 ns (702243.2887499928) 0.99
decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd decompress time/Euro2016 485440444.6 ns (1377129.6537500024) 485274628 ns (1325725.875) 1.00
parquet_rs-zstd decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
vortex:parquet-zstd size/Euro2016 1.4831851057079755 ratio 1.4824675342043792 ratio 1.00
vortex:raw size/Euro2016 0.4484024811077135 ratio 0.4481855420072961 ratio 1.00
vortex size/Euro2016 176335720 bytes 176250408 bytes 1.00
compress time/Food 942773563 ns (2375777.149999976) 985592120.6 ns (17799783.82124996) 0.96
compress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd compress time/Food 1034580010.9 ns (1604153.5) 1037415417.1 ns (1567589.5962499976) 1.00
parquet_rs-zstd compress time/Food throughput 332718229 bytes 332718229 bytes 1
decompress time/Food 123687927.53333335 ns (462129.9302499816) 126091609.8084524 ns (432175.78810118884) 0.98
decompress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd decompress time/Food 221166739.6 ns (592082.375) 223596651.35 ns (1153635.225000009) 0.99
parquet_rs-zstd decompress time/Food throughput 332718229 bytes 332718229 bytes 1
vortex:parquet-zstd size/Food 1.4170195301889623 ratio 1.4170195301889623 ratio 1
vortex:raw size/Food 0.15430064097870633 ratio 0.15430064097870633 ratio 1
vortex size/Food 51338636 bytes 51338636 bytes 1
compress time/HashTags 2411510437.4 ns (3270663.9549999237) 2485877997.1 ns (4147638.642500162) 0.97
compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd compress time/HashTags 2446434611.1 ns (4999256.5) 2445392996.9 ns (3038700.75) 1.00
parquet_rs-zstd compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
decompress time/HashTags 457278323.4 ns (1132807.8999999762) 458373830.4 ns (1337364.3425000012) 1.00
decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd decompress time/HashTags 797110859.7 ns (5631432.227500021) 795101791.1 ns (2837210.191250026) 1.00
parquet_rs-zstd decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
vortex:parquet-zstd size/HashTags 1.6963580279453745 ratio 1.6964612186937864 ratio 1.00
vortex:raw size/HashTags 0.28247941972564594 ratio 0.2824966031634888 ratio 1.00
vortex size/HashTags 227253448 bytes 227267272 bytes 1.00
compress time/TPC-H l_comment chunked without fsst 3189215132 ns (22910685.77125001) 3309237015.9 ns (10387985.850000143) 0.96
compress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst 917921816.5 ns (2139188.25) 912294850.1 ns (2188713.367500007) 1.01
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
decompress time/TPC-H l_comment chunked without fsst 133438225.6 ns (388855.79499999434) 160009434.4 ns (659063.9599999934) 0.83
decompress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst 251776648.75 ns (709546.799999997) 252704891.9 ns (2086796.3743750006) 1.00
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked without fsst 4.609568682911742 ratio 4.609515565485771 ratio 1.00
vortex:raw size/TPC-H l_comment chunked without fsst 1.0531617346523032 ratio 1.0531469351220133 ratio 1.00
vortex size/TPC-H l_comment chunked without fsst 262444848 bytes 262441160 bytes 1.00
compress time/TPC-H l_comment chunked 975436636.7 ns (1675100.5) 977328678 ns (1778446.1499999762) 1.00
compress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked 915202674.3 ns (2943832.199999988) 912224351.7 ns (1970759.6712499857) 1.00
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
decompress time/TPC-H l_comment chunked 138769783.03333333 ns (362431.08333332837) 134291177.43333334 ns (1527440.465416655) 1.03
decompress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked 252544631.85 ns (641222.8249999881) 252376314.2 ns (597631.0406249911) 1.00
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked 1.3523801745487976 ratio 1.352441345429517 ratio 1.00
vortex:raw size/TPC-H l_comment chunked 0.3089822819686287 ratio 0.30899547634378954 ratio 1.00
vortex size/TPC-H l_comment chunked 76997488 bytes 77000776 bytes 1.00
compress time/TPC-H l_comment canonical 974846422.7 ns (1780391.801249981) 983116539.35 ns (1528041.1812499762) 0.99
compress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment canonical 913584119.6 ns (2051782.2418749928) 914853402.65 ns (1611704.3112499714) 1.00
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
decompress time/TPC-H l_comment canonical 137140317.06386903 ns (246832.94435417652) 135363806.5977381 ns (993592.1470877975) 1.01
decompress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment canonical 252349638.7388095 ns (614317.4778809398) 250649909.20603174 ns (625760.2501587272) 1.01
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
vortex:parquet-zstd size/TPC-H l_comment canonical 1.3524150688122454 ratio 1.3525026579922352 ratio 1.00
vortex:raw size/TPC-H l_comment canonical 0.3089822621300502 ratio 0.30899545650436383 ratio 1.00
vortex size/TPC-H l_comment canonical 76997488 bytes 77000776 bytes 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@lwwmanning lwwmanning changed the title Experiment to see performance of arc'ing InnerArrayData feat: cheaper cloning of ArrayData Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants