[RELEASE] kvikio v23.12 #322

raydouglass · 2023-11-30T19:31:15Z

❄️ Code freeze for `branch-23.12` and v23.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-23.12 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-23.12 into main for the release

Forward-merge branch-23.10 to branch-23.12

Merge branch-23.10 into branch-23.12 and fix devcontainer CI workflow.

This PR builds conda packages using CUDA 12 on ARM. Closes #281. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #282

This PR contains a set of performance-related improvements for the batch nvCOMP codec. The short summary of the changes: * Replaced multiple calls to CUDA `memcpyAsync` with a single call to a CUDA kernel. * Removed redundant memory allocations and copies (some of them are the result of the previous change). * Vectorized loops, removed redundant loops. As a result, decompression throughput on ERA5 data increased from **4** GB/s to **31** GB/s for LZ4 algorithm. For highly-compressible data from [nvCOMP benchmark](https://github.com/NVIDIA/nvcomp/blob/main/doc/Benchmarks.md), the increase is even higher: from **6** GB/s to about **110** GB/s. Other algorithms, such as GDeflate, show performance improvements as well. Compression throughput was also improved, though the main target was decompression (compress once - decompress many kind of scenario). Limitations: * these improvements are available only when directly using the codec's batch methods, such as `decode_batch` while passing reasonably-sized batches to saturate the GPU. That means these changes will not be available (for now) to `zarr` users as `zarr` serializes the calls into a sequence of `decode` calls. * to get maximum performance, users should use equal-sized chunks (this is the default behavior in most of the cases anyway, such as `zarr`). Authors: - Alexey Kamenev (https://github.com/Alexey-Kamenev) - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) URL: #293

Forward-merge branch-23.10 to branch-23.12

This PR switches back to using `branch-23.12` for CI workflows because the CUDA 12 ARM conda migration is complete. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #304

Fixes #270 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #294

It this PR we introduce `CudaCodec`, which is a base class for all CUDA Condecs/Compressors. This makes it possible to detect if an user tries to open a Zarr file using an incompatible compressor (see #297). Additionally, `kvikio.zarr.open_cupy_array()` now handles `mode="a"` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #298

Removing an old and broken thread-pool module: ```python kvikio.thread_pool.num_threads_reset() kvikio.thread_pool.get_num_threads() ``` Use the default module instead: ```python kvikio.defaults.num_threads_reset() kvikio.defaults.get_num_threads() ``` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #308

... also added some more examples. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #312

Update the nvCOMP version used for compression/decompression to 3.0.4. See also: rapidsai/cudf#13815 rapidsai/rapids-cmake#451 Authors: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Mads R. B. Kristensen (https://github.com/madsbk) - Ray Douglass (https://github.com/raydouglass) URL: #314

Accidentally didn't commit this change in #314.

Update to use non deprecated signatures for `rapids_export` functions Authors: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #301

copy-pr-bot · 2023-11-30T19:31:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

raydouglass and others added 19 commits September 22, 2023 10:58

v23.12 Updates [skip ci]

81c9a45

Merge pull request #288 from rapidsai/branch-23.10

43f273a

Forward-merge branch-23.10 to branch-23.12

Merge branch-23.10 into branch-23.12

0947248

Use short tag in devcontainer versions.

bd88354

Merge pull request #292 from bdice/branch-23.12-merge-23.10

f0878b9

Merge branch-23.10 into branch-23.12 and fix devcontainer CI workflow.

kvikio: Build CUDA 12.0 ARM conda packages. (#282)

b046816

This PR builds conda packages using CUDA 12 on ARM. Closes #281. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #282

Merge pull request #302 from rapidsai/branch-23.10

dfa2d31

Forward-merge branch-23.10 to branch-23.12

Merge pull request #303 from rapidsai/branch-23.10

aadd6bb

Forward-merge branch-23.10 to branch-23.12

Use branch-23.12 workflows. (#304)

ace782f

This PR switches back to using `branch-23.12` for CI workflows because the CUDA 12 ARM conda migration is complete. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #304

update workflow links (#305)

38538eb

updated the nvcomp notebook to use the new API (#294)

822a944

Fixes #270 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #294

Support no compressor in open_cupy_array() (#312)

34f6d8e

... also added some more examples. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #312

Revert rapids-cmake branch. (#316)

e1762f2

Accidentally didn't commit this change in #314.

Enable build concurrency for nightly and merge triggers. (#319)

dffcc3b

raydouglass requested review from a team as code owners November 30, 2023 19:31

Update Changelog [skip ci]

26efdd1

raydouglass merged commit 8bbd481 into main Dec 6, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] kvikio v23.12 #322

[RELEASE] kvikio v23.12 #322

raydouglass commented Nov 30, 2023

copy-pr-bot bot commented Nov 30, 2023

[RELEASE] kvikio v23.12 #322

[RELEASE] kvikio v23.12 #322

Conversation

raydouglass commented Nov 30, 2023

❄️ Code freeze for branch-23.12 and v23.12 release

What does this mean?

What is the purpose of this PR?

copy-pr-bot bot commented Nov 30, 2023

❄️ Code freeze for `branch-23.12` and v23.12 release