Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message Codec #1692

Merged
merged 33 commits into from
Dec 18, 2024
Merged

Message Codec #1692

merged 33 commits into from
Dec 18, 2024

Conversation

gatesn
Copy link
Contributor

@gatesn gatesn commented Dec 16, 2024

This PR extracts IO as much as possible from message reading/writing.

  • A MessageEncoder takes either Buffer, DType, or ArrayData and produces a Vec<Buffer>
  • A MessageDecoder takes a BytesMut and either returns a message, or the number of additional bytes it needs.
  • (A)SyncMessageReader wraps a (Async)Read trait to produce messages. Vice-versa for (A)SyncMessageWriter.
  • (A)SyncIPCReader provides a way of turning a Read into an Array(Iterator|Stream)
  • Array(Iterator|Stream)IPC provides a way to turn an Array(Iterator|Stream) into ipc bytes

Part of #1676

@@ -420,22 +421,25 @@ mod tests {
cache: Arc<RwLock<LayoutMessageCache>>,
scan: Scan,
) -> (ChunkedLayoutReader, ChunkedLayoutReader, Bytes, usize) {
let mut writer = MessageWriter::new(Vec::new());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This messy test code will be refactored when I get around to the layouts

@gatesn gatesn enabled auto-merge (squash) December 16, 2024 18:04
@gatesn gatesn requested a review from a10y December 16, 2024 18:05
vortex-ipc/src/iterator.rs Outdated Show resolved Hide resolved
vortex-ipc/src/iterator.rs Outdated Show resolved Hide resolved
@gatesn gatesn mentioned this pull request Dec 18, 2024
@gatesn gatesn added the benchmark Run benchmarks on this branch label Dec 18, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Dec 18, 2024
Copy link
Contributor

Benchmarks: TPC-H

Table of Results
name PR ee07c0d base e69bde8 ratio (PR/base) unit
tpch_q01/arrow 555568482 5.34084e+08 1.04023 ns
tpch_q01/parquet 784932103 7.61457e+08 1.03083 ns
tpch_q01/vortex-file-compressed 461547661 4.49313e+08 1.02723 ns
tpch_q02/arrow 140819887 1.40496e+08 1.00231 ns
tpch_q02/parquet 174427397 1.76227e+08 0.989788 ns
tpch_q02/vortex-file-compressed 145259842 1.39365e+08 1.0423 ns
tpch_q03/arrow 167200869 1.68887e+08 0.990017 ns
tpch_q03/parquet 381243731 3.68085e+08 1.03575 ns
tpch_q03/vortex-file-compressed 215867309 2.00769e+08 1.07521 ns
tpch_q04/arrow 182377652 1.79005e+08 1.01884 ns
tpch_q04/parquet 215527676 2.1764e+08 0.990296 ns
tpch_q04/vortex-file-compressed 161168725 1.55702e+08 1.03511 ns
tpch_q05/arrow 328903373 3.18328e+08 1.03322 ns
tpch_q05/parquet 511485981 4.93208e+08 1.03706 ns
tpch_q05/vortex-file-compressed 357792838 3.35241e+08 1.06727 ns
tpch_q06/arrow 27432236 2.45436e+07 1.1177 ns
tpch_q06/parquet 151048681 1.49635e+08 1.00945 ns
tpch_q06/vortex-file-compressed 69260315 5.66152e+07 1.22335 ns
tpch_q07/arrow 638165758 6.23654e+08 1.02327 ns
tpch_q07/parquet 798529302 7.802e+08 1.02349 ns
tpch_q07/vortex-file-compressed 639152769 6.18559e+08 1.03329 ns
tpch_q08/arrow 264037441 2.56509e+08 1.02935 ns
tpch_q08/parquet 549274758 5.48617e+08 1.0012 ns
tpch_q08/vortex-file-compressed 332827086 3.12821e+08 1.06396 ns
tpch_q09/arrow 474684586 4.60669e+08 1.03042 ns
tpch_q09/parquet 781891272 7.61295e+08 1.02705 ns
tpch_q09/vortex-file-compressed 575375782 5.37556e+08 1.07035 ns
tpch_q10/arrow 265915173 2.64708e+08 1.00456 ns
tpch_q10/parquet 518792383 4.99531e+08 1.03856 ns
tpch_q10/vortex-file-compressed 284063099 2.66473e+08 1.06601 ns
tpch_q11/arrow 146908065 1.38021e+08 1.06439 ns
tpch_q11/parquet 158961620 1.46159e+08 1.08759 ns
tpch_q11/vortex-file-compressed 132947420 1.22767e+08 1.08292 ns
tpch_q12/arrow 180759065 1.79274e+08 1.00828 ns
tpch_q12/parquet 324638644 3.23881e+08 1.00234 ns
tpch_q12/vortex-file-compressed 226867581 2.18976e+08 1.03604 ns
tpch_q13/arrow 182508079 1.66433e+08 1.09659 ns
tpch_q13/parquet 313032512 3.04386e+08 1.02841 ns
tpch_q13/vortex-file-compressed 181144700 1.72433e+08 1.05052 ns
tpch_q14/arrow 37532900 3.65808e+07 1.02603 ns
tpch_q14/parquet 233967300 2.29766e+08 1.01828 ns
tpch_q14/vortex-file-compressed 68259040 6.19235e+07 1.10231 ns
tpch_q15/arrow 68405104 6.5592e+07 1.04289 ns
tpch_q15/parquet 328692322 3.241e+08 1.01417 ns
tpch_q15/vortex-file-compressed 105488468 9.77633e+07 1.07902 ns
tpch_q16/arrow 103254548 1.02066e+08 1.01165 ns
tpch_q16/parquet 120680966 1.17757e+08 1.02483 ns
tpch_q16/vortex-file-compressed 109230676 1.04462e+08 1.04564 ns
tpch_q17/arrow 613945214 5.97263e+08 1.02793 ns
tpch_q17/parquet 677322959 6.64176e+08 1.01979 ns
tpch_q17/vortex-file-compressed 580133277 5.66256e+08 1.02451 ns
tpch_q18/arrow 1162619600 1.12254e+09 1.0357 ns
tpch_q18/parquet 1379696556 1.33662e+09 1.03222 ns
tpch_q18/vortex-file-compressed 1198124263 1.15928e+09 1.03351 ns
tpch_q19/arrow 150463779 1.46773e+08 1.02515 ns
tpch_q19/parquet 417006444 4.16818e+08 1.00045 ns
tpch_q19/vortex-file-compressed 118152578 1.11285e+08 1.06171 ns
tpch_q20/arrow 228691568 2.11622e+08 1.08066 ns
tpch_q20/parquet 350587697 3.4662e+08 1.01145 ns
tpch_q20/vortex-file-compressed 239498933 2.26276e+08 1.05844 ns
tpch_q21/arrow 996606324 9.86694e+08 1.01005 ns
tpch_q21/parquet 1111927537 1.09934e+09 1.01145 ns
tpch_q21/vortex-file-compressed 944068306 8.9957e+08 1.04947 ns
tpch_q22/arrow 79029704 7.71206e+07 1.02475 ns
tpch_q22/parquet 108946551 1.08365e+08 1.00536 ns
tpch_q22/vortex-file-compressed 83131317 7.97764e+07 1.04205 ns

@gatesn gatesn merged commit 46ec5a5 into develop Dec 18, 2024
25 of 26 checks passed
@gatesn gatesn deleted the ngates/message-reader branch December 18, 2024 17:07
Copy link
Contributor

Benchmarks: datafusion

Table of Results
name PR ee07c0d base e69bde8 ratio (PR/base) unit
arrow/planning 802010 810706 0.989274 ns
arrow/exec 1.75345e+06 1.77687e+06 0.986821 ns
vortex-pushdown-compressed/planning 502844 502831 1.00003 ns
vortex-pushdown-compressed/exec 2.58008e+06 2.66525e+06 0.968042 ns
vortex-pushdown-uncompressed/planning 495208 500098 0.99022 ns
vortex-pushdown-uncompressed/exec 1.46905e+06 1.47402e+06 0.996632 ns
vortex-nopushdown-compressed/planning 826543 826219 1.00039 ns
vortex-nopushdown-compressed/exec 3.07315e+06 3.23719e+06 0.949325 ns
vortex-nopushdown-uncompressed/planning 817989 817236 1.00092 ns
vortex-nopushdown-uncompressed/exec 4.90802e+06 5.25119e+06 0.934648 ns

Copy link
Contributor

Benchmarks: random_access

Table of Results
name PR ee07c0d base e69bde8 ratio (PR/base) unit
random-access/vortex-tokio-local-disk 3.42688e+06 3.28355e+06 1.04365 ns
random-access/vortex-local-fs 3.66051e+06 3.44311e+06 1.06314 ns
random-access/parquet-tokio-local-disk 2.2467e+08 2.29229e+08 0.980109 ns

Copy link
Contributor

Benchmarks: compress

Table of Results
name PR ee07c0d base e69bde8 ratio (PR/base) unit
compress time/taxi 1.53771e+09 1.51292e+09 1.01639 ns
compress time/taxi throughput 0.306176 0.311193 0.983879 bytes/ns
parquet_rs-zstd compress time/taxi 1.74745e+09 1.71233e+09 1.02051 ns
parquet_rs-zstd compress time/taxi throughput 0.269427 0.274953 0.979905 bytes/ns
decompress time/taxi 3.1421e+08 2.84684e+08 1.10372 ns
decompress time/taxi throughput 1.4984 1.6538 0.906031 bytes/ns
parquet_rs-zstd decompress time/taxi 3.1808e+08 3.09861e+08 1.02653 ns
parquet_rs-zstd decompress time/taxi throughput 1.48016 1.51943 0.974159 bytes/ns
compress time/AirlineSentiment 644240 644067 1.00027 ns
compress time/AirlineSentiment throughput 0.00320688 0.00320774 0.999731 bytes/ns
parquet_rs-zstd compress time/AirlineSentiment 56584.2 56263 1.00571 ns
parquet_rs-zstd compress time/AirlineSentiment throughput 0.036512 0.0367204 0.994324 bytes/ns
decompress time/AirlineSentiment 108181 98153 1.10217 ns
decompress time/AirlineSentiment throughput 0.0190975 0.0210488 0.9073 bytes/ns
parquet_rs-zstd decompress time/AirlineSentiment 32135.7 32965.1 0.974841 ns
parquet_rs-zstd decompress time/AirlineSentiment throughput 0.0642898 0.0626723 1.02581 bytes/ns
compress time/Arade 2.92505e+09 2.87439e+09 1.01763 ns
compress time/Arade throughput 0.269067 0.27381 0.982678 bytes/ns
parquet_rs-zstd compress time/Arade 2.96828e+09 2.91794e+09 1.01725 ns
parquet_rs-zstd compress time/Arade throughput 0.265149 0.269723 0.983041 bytes/ns
decompress time/Arade 5.32601e+08 4.5534e+08 1.16968 ns
decompress time/Arade throughput 1.47772 1.72846 0.854937 bytes/ns
parquet_rs-zstd decompress time/Arade 7.05637e+08 7.12217e+08 0.990761 ns
parquet_rs-zstd decompress time/Arade throughput 1.11536 1.10505 1.00933 bytes/ns
compress time/Bimbo 1.07949e+10 1.05893e+10 1.01942 ns
compress time/Bimbo throughput 0.659694 0.672505 0.98095 bytes/ns
parquet_rs-zstd compress time/Bimbo 2.1082e+10 2.01917e+10 1.0441 ns
parquet_rs-zstd compress time/Bimbo throughput 0.337792 0.352688 0.957766 bytes/ns
decompress time/Bimbo 3.98593e+09 3.51528e+09 1.13389 ns
decompress time/Bimbo throughput 1.78662 2.02583 0.881923 bytes/ns
parquet_rs-zstd decompress time/Bimbo 3.68482e+09 4.71623e+09 0.781306 ns
parquet_rs-zstd decompress time/Bimbo throughput 1.93262 1.50996 1.27991 bytes/ns
compress time/CMSprovider 1.29603e+10 1.2976e+10 0.998792 ns
compress time/CMSprovider throughput 0.397305 0.396825 1.00121 bytes/ns
parquet_rs-zstd compress time/CMSprovider 1.9048e+10 1.81609e+10 1.04885 ns
parquet_rs-zstd compress time/CMSprovider throughput 0.270327 0.283532 0.953428 bytes/ns
decompress time/CMSprovider 2.8862e+09 2.74741e+09 1.05052 ns
decompress time/CMSprovider throughput 1.78407 1.8742 0.951911 bytes/ns
parquet_rs-zstd decompress time/CMSprovider 5.45315e+09 5.58808e+09 0.975855 ns
parquet_rs-zstd decompress time/CMSprovider throughput 0.944262 0.921462 1.02474 bytes/ns
compress time/Euro2016 2.12763e+09 2.13338e+09 0.997307 ns
compress time/Euro2016 throughput 0.184833 0.184335 1.0027 bytes/ns
parquet_rs-zstd compress time/Euro2016 1.52744e+09 1.52593e+09 1.00099 ns
parquet_rs-zstd compress time/Euro2016 throughput 0.257461 0.257716 0.999008 bytes/ns
decompress time/Euro2016 2.9846e+08 2.40469e+08 1.24116 ns
decompress time/Euro2016 throughput 1.31762 1.63537 0.805699 bytes/ns
parquet_rs-zstd decompress time/Euro2016 4.88842e+08 4.85464e+08 1.00696 ns
parquet_rs-zstd decompress time/Euro2016 throughput 0.804464 0.810063 0.993088 bytes/ns
compress time/Food 1.0527e+09 1.10229e+09 0.955014 ns
compress time/Food throughput 0.316064 0.301845 1.0471 bytes/ns
parquet_rs-zstd compress time/Food 1.02098e+09 1.02752e+09 0.993632 ns
parquet_rs-zstd compress time/Food throughput 0.325886 0.32381 1.00641 bytes/ns
decompress time/Food 1.12054e+08 1.03929e+08 1.07818 ns
decompress time/Food throughput 2.96931 3.20145 0.927489 bytes/ns
parquet_rs-zstd decompress time/Food 2.27703e+08 2.22018e+08 1.02561 ns
parquet_rs-zstd decompress time/Food throughput 1.46121 1.49862 0.975034 bytes/ns
compress time/HashTags 2.54332e+09 2.48356e+09 1.02406 ns
compress time/HashTags throughput 0.31632 0.323931 0.976504 bytes/ns
parquet_rs-zstd compress time/HashTags 2.47188e+09 2.41279e+09 1.02449 ns
parquet_rs-zstd compress time/HashTags throughput 0.325463 0.333433 0.976095 bytes/ns
decompress time/HashTags 4.85423e+08 4.21278e+08 1.15226 ns
decompress time/HashTags throughput 1.65732 1.90967 0.867857 bytes/ns
parquet_rs-zstd decompress time/HashTags 8.32581e+08 7.65004e+08 1.08834 ns
parquet_rs-zstd decompress time/HashTags throughput 0.966276 1.05163 0.918834 bytes/ns
compress time/TPC-H l_comment chunked without fsst 3.5047e+09 3.15246e+09 1.11173 ns
compress time/TPC-H l_comment chunked without fsst throughput 0.0711118 0.0790573 0.899497 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst 9.27121e+08 9.14426e+08 1.01388 ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput 0.268816 0.272548 0.986307 bytes/ns
decompress time/TPC-H l_comment chunked without fsst 1.50194e+08 6.91784e+07 2.17112 ns
decompress time/TPC-H l_comment chunked without fsst throughput 1.65935 3.60265 0.460592 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst 2.54849e+08 2.50136e+08 1.01884 ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput 0.977933 0.996358 0.981508 bytes/ns
compress time/TPC-H l_comment chunked 1.05998e+09 9.8815e+08 1.07269 ns
compress time/TPC-H l_comment chunked throughput 0.235123 0.252214 0.932236 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked 9.37743e+08 9.05976e+08 1.03506 ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 0.265771 0.27509 0.966124 bytes/ns
decompress time/TPC-H l_comment chunked 1.04163e+08 8.85585e+07 1.1762 ns
decompress time/TPC-H l_comment chunked throughput 2.39265 2.81424 0.850192 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked 2.53493e+08 2.47778e+08 1.02307 ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 0.983163 1.00584 0.977454 bytes/ns
compress time/TPC-H l_comment canonical 1.09052e+09 9.86282e+08 1.10569 ns
compress time/TPC-H l_comment canonical throughput 0.228537 0.252691 0.904415 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical 9.24865e+08 9.04067e+08 1.02301 ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 0.269471 0.27567 0.977512 bytes/ns
decompress time/TPC-H l_comment canonical 9.9677e+07 8.82984e+07 1.12887 ns
decompress time/TPC-H l_comment canonical throughput 2.50032 2.82253 0.885845 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical 2.50838e+08 2.48292e+08 1.01026 ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 0.993566 1.00376 0.989849 bytes/ns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants