[RELEASE] cudf v23.08 #13781

raydouglass · 2023-07-28T14:36:26Z

❄️ Code freeze for `branch-23.08` and v23.08 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-23.08 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-23.08 into main for the release

Forward-merge branch-23.06 to branch-23.08

I originally placed the exception handler into a separate C++ header file that could be included by the Cython header because I figured that reading C++ inlined in Cython would be more confusing to devs. Unfortunately, the current approach complicates the build system due to the need to ensure that the directory containing the C++ header is always in the include path, which becomes problematic depending on where the files including the exception handler are (anywhere outside of `_lib` becomes problematic). Inlining is the simplest solution to this problem. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) - Bradley Dice (https://github.com/bdice) URL: #13411

Forward-merge branch-23.06 to branch-23.08

Bump up JNI version to 23.08.0-SNAPSHOT in branch-23.08 Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #13401

Forward-merge branch-23.06 to branch-23.08

…ply (#13429) Closes #13426 Authors: - https://github.com/brandon-b-miller Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) URL: #13429

Cleans up source files for nvtext and io-text pytests. The pytests are placed into separate files: `test_io_text.py` for the io-text pytests and `test_nvtext.py` for the nvtext pytests. Also removed the `python/cudf/cudf/tests/text` folder which contained 2 empty `.py` files. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #13435

This PR attempts to allow using newer versions of scikit-build again. cf. #13188 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - AJ Schmidt (https://github.com/ajschmidt8) - Lawrence Mitchell (https://github.com/wence-) URL: #13424

closes #13412 Remove weak references of cleaned resources when a resource is cleaned. The cleaned objects are never leaked, it's safe to remove the weak references. This is to reduce the memory usage. Authors: - Chong Gao (https://github.com/res-life) Approvers: - Jason Lowe (https://github.com/jlowe) - Robert (Bobby) Evans (https://github.com/revans2) - MithunR (https://github.com/mythrocks) URL: #13378

Forward-merge branch-23.06 to branch-23.08

Depends on: rapidsai/rapids-cmake#393 Once the above PR is merged, this updated logic ensures that cudf places the custom versions of cccl packages in correct places, and can find them once installed. Authors: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #13235

Remove/update repeated documentation text Remove declaration repetitions in tdigest.hpp Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13470

Authors: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #13484

This moves the logic to update the columns returned from the JSON reader to java. It also updated the code to be able to deal with requested columns that were not in the data. It is not perfect because it will not work if the input file had no columns at all in it. ``` {} {} ``` But it fixes issues for a file that has valid columns in it, but none of them are the columns that we requested. This is a work around for #13473, but is not perfect. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Jason Lowe (https://github.com/jlowe) - MithunR (https://github.com/mythrocks) URL: #13477

Currently, chunked Parquet reader benchmark creates the chunked reader object once and reuses it for all iterations. After the first iteration the source is fully read so each subsequent iteration returns a single, empty, chunk. This PR fixes the use of the chunked reader object. The creation of the object is included in the benchmark timing. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13482

Removes the `max_rows_tensor` parameter is from the `nvtext::subword_tokenize` API since it is no longer required. The parameter was intended to size the temporary working memory for the internal functions. Since some general rework it was no longer used but never removed from the API. Also updates the Python/Cython calls which had been hard-coding a default value anyway. Reference issue #13458 found this issue. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Roeschke (https://github.com/mroeschke) URL: #13463

In #12922, we missed adding a `cuda{{ cuda_major }}_` to the `custreamz` build tag. This PR fixes that. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #13754

A fully unbounded window function (i.e. [unbounded_preceding, unbounded_following]) need not go through the window function machinery for execution. E.g. Consider the following: ```c++ auto grps = { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2 }; auto vals = { 3, 1, 4, 2, 6, 7, 8, 5, 9, 0 }; ``` Running the `MIN` window function on the groups, over an `[UNBOUNDED, UNBOUNDED]` window should produce: ```c++ auto res = { 1, 1, 1, 1, 5, 5, 5, 5, 0, 0 }; ``` This result could more easily be achieved using a grouped `MIN` aggregation, and replicating each group's result for every entry in the group. This commit adds logic to detect fully unbounded windows, and use `groupby::aggregate()` (when one or more grouping keys are specified), or `reduce()` (when there are no grouping keys). Tangentially, this change also adds the following: 1. A new overload of `cudf::groupby::groupby::aggregate()` that takes a `stream` parameter. 2. A `detail` header to declare the (pre-existing) `cudf::reduction::detail::reduce()` function. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Ray Douglass (https://github.com/raydouglass) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #13727

Fixes a typo in the `test.yaml` workflow. See rapidsai/rmm#1310. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #13763

…ge headers (#13707) The current parquet reader assumes that repetition or definition level data with a bit length of 0 will have no data encoded in the header. In the case of V2 headers, this assumption is false. This PR checks the V2 page header data to see if level data needs to be accounted for. Also fixes an error that was present in the RLE data decoder where the encoded length of the RLE data was not skipped properly. Fixes #13655 Authors: - Ed Seidl (https://github.com/etseidl) - Mike Wilson (https://github.com/hyperbolic2346) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec - GALI PREM SAGAR (https://github.com/galipremsagar) - Yunsong Wang (https://github.com/PointKernel) URL: #13707

…3769) In #12922, we missed adding a `cuda{{ cuda_major }}_` to the `cudf-kafka` and `libcudf-example` build strings. This PR fixes that. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) - https://github.com/jakirkham URL: #13769

The plan to support AST based filter predicate pushdown in parquet. This PR adds predicate pushdown on row group filtering. The statistics of columns of each row group are loaded to a device column, and AST filter is applied on min, max of each column to select the row groups to read. The user given AST needs to be converted to another AST to be applied on min, max values of each column ('Statistics AST'). After the row groups are parsed, the user given AST is applied on the output columns to filter any remaining rows in the row groups. New `column_name_reference` is introduced to help the users create AST's that reference columns by name, as the user may or may not have the column indices information before reading. Since AST engine takes only column index reference, a transformation is applied to the user given AST. So, 2 new AST transformation classes are introduced: 1. `named_to_reference_converter` - Converts column name references to column index references 2. `stats_expression_converter` - Converts the above output table filtering AST to 'Statistics AST'. Note: This column_name_reference only supported for predicate pushdown filtering, but not supported for other AST operations such as transform, joins etc. - [x] #13472 - [x] Convert column chunk min, max to cudf type column. - [x] Add AST filter interface to parquet reader options - [x] Convert AST to Statistics AST - [x] Apply statistics AST on Stats values to get row_groups - [x] Apply AST as filter on output columns. Depends on #13472 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) URL: #13348

Closes #11675 Adds `read_parquet_metadata` to libcudf. The metadata has following information - schema - (type, name, children) - num_rows - num_rowgroups - key-value string metadata in file footer To Reviewers: Request for adding more information in metadata. Refer #11214 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) URL: #13663

This PR relaxes cudf's protobuf pinnings to help with compatibility issues. `cudf` uses `protobuf` in two places. The first place `protobuf` is used is at build time, to generate a Python module from a `.proto` file in `python/cudf/cmake/Modules/ProtobufHelpers.cmake`: https://github.com/rapidsai/cudf/blob/f8e5a89e983065e1202f1151dd499bea3102a537/python/cudf/cmake/Modules/ProtobufHelpers.cmake#L16-L17 The second place `protobuf` is used is in the generated file `python/cudf/cudf/utils/metadata/orc_column_statistics_pb2.py` which is [imported here](https://github.com/rapidsai/cudf/blob/f8e5a89e983065e1202f1151dd499bea3102a537/python/cudf/cudf/io/orc.py#L14-L16). The generated Python module used at runtime should be compatible with newer versions of `protobuf` than the version used to build the Python module, from my understanding of https://protobuf.dev/support/cross-version-runtime-guarantee/. Therefore, we only require that the runtime pinning of `protobuf` is of the same major version and an equal-or-greater minor version. That allows us to relax this pinning. Follow-up to #12864, see that PR for more context. Authors: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: #13770

review-notebook-app · 2023-07-28T14:36:33Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

This PR pins `dask` & `distributed` to `2023.7.1` version for `23.08` release. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) - Peter Andreas Entschev (https://github.com/pentschev)

raydouglass and others added 30 commits May 19, 2023 09:51

DOC

0b566b9

Merge pull request #13416 from rapidsai/branch-23.06

c823dd3

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13420 from rapidsai/branch-23.06

212b1c0

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13421 from rapidsai/branch-23.06

905c61e

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13425 from rapidsai/branch-23.06

097b828

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13427 from rapidsai/branch-23.06

9a0f87c

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13430 from rapidsai/branch-23.06

aaf9362

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13432 from rapidsai/branch-23.06

e29c691

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13436 from rapidsai/branch-23.06

fd13c87

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13439 from rapidsai/branch-23.06

0f0ebfd

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13441 from rapidsai/branch-23.06

0536a3a

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13443 from rapidsai/branch-23.06

5d5d367

Forward-merge branch-23.06 to branch-23.08

Init JNI version 23.08.0-SNAPSHOT (#13401)

a03da13

Bump up JNI version to 23.08.0-SNAPSHOT in branch-23.08 Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #13401

Merge pull request #13445 from rapidsai/branch-23.06

126fa35

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13446 from rapidsai/branch-23.06

7f97b27

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13447 from rapidsai/branch-23.06

2def7f1

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13448 from rapidsai/branch-23.06

53c685b

Forward-merge branch-23.06 to branch-23.08

Merge pull request #13451 from rapidsai/branch-23.06

37f76c8

Forward-merge branch-23.06 to branch-23.08

Reject functions without bytecode from _can_be_jitted in GroupBy Ap…

5b3e3ab

…ply (#13429) Closes #13426 Authors: - https://github.com/brandon-b-miller Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) URL: #13429

Merge pull request #13471 from rapidsai/branch-23.06

5e12c25

Forward-merge branch-23.06 to branch-23.08

Remove applying nvbench that doesn't exist in 23.08 (#13484)

8c6c087

Authors: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #13484

bdice and others added 8 commits July 26, 2023 05:07

Fix typo in wheels-test.yaml. (#13763)

f8e5a89

Fixes a typo in the `test.yaml` workflow. See rapidsai/rmm#1310. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #13763

raydouglass requested review from a team as code owners July 28, 2023 14:36

raydouglass requested review from wence-, galipremsagar, harrism and PointKernel July 28, 2023 14:36

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue conda Java Affects Java cuDF API. labels Jul 28, 2023

galipremsagar and others added 2 commits August 2, 2023 15:13

Update Changelog [skip ci]

9d79487

galipremsagar approved these changes Aug 9, 2023

View reviewed changes

ttnghia approved these changes Aug 9, 2023

View reviewed changes

raydouglass merged commit d9589b7 into main Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] cudf v23.08 #13781

[RELEASE] cudf v23.08 #13781

raydouglass commented Jul 28, 2023

review-notebook-app bot commented Jul 28, 2023

[RELEASE] cudf v23.08 #13781

[RELEASE] cudf v23.08 #13781

Conversation

raydouglass commented Jul 28, 2023

❄️ Code freeze for branch-23.08 and v23.08 release

What does this mean?

What is the purpose of this PR?

review-notebook-app bot commented Jul 28, 2023

❄️ Code freeze for `branch-23.08` and v23.08 release