-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v23.08 #13781
Merged
Merged
[RELEASE] cudf v23.08 #13781
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
I originally placed the exception handler into a separate C++ header file that could be included by the Cython header because I figured that reading C++ inlined in Cython would be more confusing to devs. Unfortunately, the current approach complicates the build system due to the need to ensure that the directory containing the C++ header is always in the include path, which becomes problematic depending on where the files including the exception handler are (anywhere outside of `_lib` becomes problematic). Inlining is the simplest solution to this problem. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) - Bradley Dice (https://github.com/bdice) URL: #13411
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Bump up JNI version to 23.08.0-SNAPSHOT in branch-23.08 Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) - Jason Lowe (https://github.com/jlowe) URL: #13401
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
Forward-merge branch-23.06 to branch-23.08
…ply (#13429) Closes #13426 Authors: - https://github.com/brandon-b-miller Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) URL: #13429
Cleans up source files for nvtext and io-text pytests. The pytests are placed into separate files: `test_io_text.py` for the io-text pytests and `test_nvtext.py` for the nvtext pytests. Also removed the `python/cudf/cudf/tests/text` folder which contained 2 empty `.py` files. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #13435
This PR attempts to allow using newer versions of scikit-build again. cf. #13188 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - AJ Schmidt (https://github.com/ajschmidt8) - Lawrence Mitchell (https://github.com/wence-) URL: #13424
closes #13412 Remove weak references of cleaned resources when a resource is cleaned. The cleaned objects are never leaked, it's safe to remove the weak references. This is to reduce the memory usage. Authors: - Chong Gao (https://github.com/res-life) Approvers: - Jason Lowe (https://github.com/jlowe) - Robert (Bobby) Evans (https://github.com/revans2) - MithunR (https://github.com/mythrocks) URL: #13378
Forward-merge branch-23.06 to branch-23.08
Depends on: rapidsai/rapids-cmake#393 Once the above PR is merged, this updated logic ensures that cudf places the custom versions of cccl packages in correct places, and can find them once installed. Authors: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #13235
Remove/update repeated documentation text Remove declaration repetitions in tdigest.hpp Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13470
Authors: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #13484
This moves the logic to update the columns returned from the JSON reader to java. It also updated the code to be able to deal with requested columns that were not in the data. It is not perfect because it will not work if the input file had no columns at all in it. ``` {} {} ``` But it fixes issues for a file that has valid columns in it, but none of them are the columns that we requested. This is a work around for #13473, but is not perfect. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Jason Lowe (https://github.com/jlowe) - MithunR (https://github.com/mythrocks) URL: #13477
Currently, chunked Parquet reader benchmark creates the chunked reader object once and reuses it for all iterations. After the first iteration the source is fully read so each subsequent iteration returns a single, empty, chunk. This PR fixes the use of the chunked reader object. The creation of the object is included in the benchmark timing. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13482
Removes the `max_rows_tensor` parameter is from the `nvtext::subword_tokenize` API since it is no longer required. The parameter was intended to size the temporary working memory for the internal functions. Since some general rework it was no longer used but never removed from the API. Also updates the Python/Cython calls which had been hard-coding a default value anyway. Reference issue #13458 found this issue. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Roeschke (https://github.com/mroeschke) URL: #13463
In #12922, we missed adding a `cuda{{ cuda_major }}_` to the `custreamz` build tag. This PR fixes that. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #13754
A fully unbounded window function (i.e. [unbounded_preceding, unbounded_following]) need not go through the window function machinery for execution. E.g. Consider the following: ```c++ auto grps = { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2 }; auto vals = { 3, 1, 4, 2, 6, 7, 8, 5, 9, 0 }; ``` Running the `MIN` window function on the groups, over an `[UNBOUNDED, UNBOUNDED]` window should produce: ```c++ auto res = { 1, 1, 1, 1, 5, 5, 5, 5, 0, 0 }; ``` This result could more easily be achieved using a grouped `MIN` aggregation, and replicating each group's result for every entry in the group. This commit adds logic to detect fully unbounded windows, and use `groupby::aggregate()` (when one or more grouping keys are specified), or `reduce()` (when there are no grouping keys). Tangentially, this change also adds the following: 1. A new overload of `cudf::groupby::groupby::aggregate()` that takes a `stream` parameter. 2. A `detail` header to declare the (pre-existing) `cudf::reduction::detail::reduce()` function. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Ray Douglass (https://github.com/raydouglass) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #13727
Fixes a typo in the `test.yaml` workflow. See rapidsai/rmm#1310. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #13763
…ge headers (#13707) The current parquet reader assumes that repetition or definition level data with a bit length of 0 will have no data encoded in the header. In the case of V2 headers, this assumption is false. This PR checks the V2 page header data to see if level data needs to be accounted for. Also fixes an error that was present in the RLE data decoder where the encoded length of the RLE data was not skipped properly. Fixes #13655 Authors: - Ed Seidl (https://github.com/etseidl) - Mike Wilson (https://github.com/hyperbolic2346) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec - GALI PREM SAGAR (https://github.com/galipremsagar) - Yunsong Wang (https://github.com/PointKernel) URL: #13707
…3769) In #12922, we missed adding a `cuda{{ cuda_major }}_` to the `cudf-kafka` and `libcudf-example` build strings. This PR fixes that. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) - https://github.com/jakirkham URL: #13769
The plan to support AST based filter predicate pushdown in parquet. This PR adds predicate pushdown on row group filtering. The statistics of columns of each row group are loaded to a device column, and AST filter is applied on min, max of each column to select the row groups to read. The user given AST needs to be converted to another AST to be applied on min, max values of each column ('Statistics AST'). After the row groups are parsed, the user given AST is applied on the output columns to filter any remaining rows in the row groups. New `column_name_reference` is introduced to help the users create AST's that reference columns by name, as the user may or may not have the column indices information before reading. Since AST engine takes only column index reference, a transformation is applied to the user given AST. So, 2 new AST transformation classes are introduced: 1. `named_to_reference_converter` - Converts column name references to column index references 2. `stats_expression_converter` - Converts the above output table filtering AST to 'Statistics AST'. Note: This column_name_reference only supported for predicate pushdown filtering, but not supported for other AST operations such as transform, joins etc. - [x] #13472 - [x] Convert column chunk min, max to cudf type column. - [x] Add AST filter interface to parquet reader options - [x] Convert AST to Statistics AST - [x] Apply statistics AST on Stats values to get row_groups - [x] Apply AST as filter on output columns. Depends on #13472 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) URL: #13348
Closes #11675 Adds `read_parquet_metadata` to libcudf. The metadata has following information - schema - (type, name, children) - num_rows - num_rowgroups - key-value string metadata in file footer To Reviewers: Request for adding more information in metadata. Refer #11214 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) URL: #13663
This PR relaxes cudf's protobuf pinnings to help with compatibility issues. `cudf` uses `protobuf` in two places. The first place `protobuf` is used is at build time, to generate a Python module from a `.proto` file in `python/cudf/cmake/Modules/ProtobufHelpers.cmake`: https://github.com/rapidsai/cudf/blob/f8e5a89e983065e1202f1151dd499bea3102a537/python/cudf/cmake/Modules/ProtobufHelpers.cmake#L16-L17 The second place `protobuf` is used is in the generated file `python/cudf/cudf/utils/metadata/orc_column_statistics_pb2.py` which is [imported here](https://github.com/rapidsai/cudf/blob/f8e5a89e983065e1202f1151dd499bea3102a537/python/cudf/cudf/io/orc.py#L14-L16). The generated Python module used at runtime should be compatible with newer versions of `protobuf` than the version used to build the Python module, from my understanding of https://protobuf.dev/support/cross-version-runtime-guarantee/. Therefore, we only require that the runtime pinning of `protobuf` is of the same major version and an equal-or-greater minor version. That allows us to relax this pinning. Follow-up to #12864, see that PR for more context. Authors: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: #13770
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This PR pins `dask` & `distributed` to `2023.7.1` version for `23.08` release. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) - Peter Andreas Entschev (https://github.com/pentschev)
galipremsagar
approved these changes
Aug 9, 2023
ttnghia
approved these changes
Aug 9, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-23.08
and v23.08 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-23.08
until release (merging of this PR).What is the purpose of this PR?
branch-23.08
intomain
for the release