diff --git a/CHANGELOG.md b/CHANGELOG.md index 3cb6caa25ee..bce764f59e3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,228 @@ +# cuDF 24.02.00 (12 Feb 2024) + +## 🚨 Breaking Changes + +- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke) +- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke) +- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev) +- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice) +- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice) +- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-) +- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt) +- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt) +- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule) +- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann) +- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke) +- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke) +- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann) +- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr) + +## 🐛 Bug Fixes + +- Exclude tests from builds ([#14981](https://github.com/rapidsai/cudf/pull/14981)) [@vyasr](https://github.com/vyasr) +- Fix the bounce buffer size in ORC writer ([#14947](https://github.com/rapidsai/cudf/pull/14947)) [@vuule](https://github.com/vuule) +- Revert sum/product aggregation to always produce `int64_t` type ([#14907](https://github.com/rapidsai/cudf/pull/14907)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Fixed an issue with output chunking computation stemming from input chunking. ([#14889](https://github.com/rapidsai/cudf/pull/14889)) [@nvdbaranec](https://github.com/nvdbaranec) +- Fix total_byte_size in Parquet row group metadata ([#14802](https://github.com/rapidsai/cudf/pull/14802)) [@etseidl](https://github.com/etseidl) +- Fix index difference to follow the pandas format ([#14789](https://github.com/rapidsai/cudf/pull/14789)) [@amiralimi](https://github.com/amiralimi) +- Fix shared-workflows repo name ([#14784](https://github.com/rapidsai/cudf/pull/14784)) [@raydouglass](https://github.com/raydouglass) +- Remove unparseable attributes from all nodes ([#14780](https://github.com/rapidsai/cudf/pull/14780)) [@vyasr](https://github.com/vyasr) +- Refactor and add validation to IntervalIndex.__init__ ([#14778](https://github.com/rapidsai/cudf/pull/14778)) [@mroeschke](https://github.com/mroeschke) +- Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer ([#14772](https://github.com/rapidsai/cudf/pull/14772)) [@etseidl](https://github.com/etseidl) +- Fix calls to deprecated strings factory API ([#14771](https://github.com/rapidsai/cudf/pull/14771)) [@davidwendt](https://github.com/davidwendt) +- Fix ptx file discovery in editable installs ([#14767](https://github.com/rapidsai/cudf/pull/14767)) [@vyasr](https://github.com/vyasr) +- Revise ``shuffle`` deprecation to align with dask/dask ([#14762](https://github.com/rapidsai/cudf/pull/14762)) [@rjzamora](https://github.com/rjzamora) +- Enable intermediate proxies to be picklable ([#14752](https://github.com/rapidsai/cudf/pull/14752)) [@shwina](https://github.com/shwina) +- Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it ([#14751](https://github.com/rapidsai/cudf/pull/14751)) [@etseidl](https://github.com/etseidl) +- Fix CMake args ([#14746](https://github.com/rapidsai/cudf/pull/14746)) [@vyasr](https://github.com/vyasr) +- Fix logic bug introduced in #14730 ([#14742](https://github.com/rapidsai/cudf/pull/14742)) [@wence-](https://github.com/wence-) +- [Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds ([#14731](https://github.com/rapidsai/cudf/pull/14731)) [@razajafri](https://github.com/razajafri) +- Fix ``Groupby.get_group`` ([#14728](https://github.com/rapidsai/cudf/pull/14728)) [@rjzamora](https://github.com/rjzamora) +- Ensure that all CUDA kernels in cudf have hidden visibility. ([#14726](https://github.com/rapidsai/cudf/pull/14726)) [@robertmaynard](https://github.com/robertmaynard) +- Split cuda versions for notebook testing ([#14722](https://github.com/rapidsai/cudf/pull/14722)) [@raydouglass](https://github.com/raydouglass) +- Fix to_numeric not preserving Series index and name ([#14718](https://github.com/rapidsai/cudf/pull/14718)) [@mroeschke](https://github.com/mroeschke) +- Update dask-cudf wheel name ([#14713](https://github.com/rapidsai/cudf/pull/14713)) [@raydouglass](https://github.com/raydouglass) +- Fix strings::contains matching end of string target ([#14711](https://github.com/rapidsai/cudf/pull/14711)) [@davidwendt](https://github.com/davidwendt) +- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev) +- Write file-level statistics when writing ORC files with zero rows ([#14707](https://github.com/rapidsai/cudf/pull/14707)) [@vuule](https://github.com/vuule) +- Potential fix for peformance regression in #14415 ([#14706](https://github.com/rapidsai/cudf/pull/14706)) [@etseidl](https://github.com/etseidl) +- Ensure DataFrame column types are preserved during serialization ([#14705](https://github.com/rapidsai/cudf/pull/14705)) [@mroeschke](https://github.com/mroeschke) +- Skip numba test that fails on ARM ([#14702](https://github.com/rapidsai/cudf/pull/14702)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Allow Z in datetime string parsing in non pandas compat mode ([#14701](https://github.com/rapidsai/cudf/pull/14701)) [@mroeschke](https://github.com/mroeschke) +- Fix nan_as_null not being respected when passing arrow object ([#14688](https://github.com/rapidsai/cudf/pull/14688)) [@mroeschke](https://github.com/mroeschke) +- Fix constructing Series/Index from arrow array and dtype ([#14686](https://github.com/rapidsai/cudf/pull/14686)) [@mroeschke](https://github.com/mroeschke) +- Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply ([#14679](https://github.com/rapidsai/cudf/pull/14679)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Add BaseOffset as a final proxy type to pass instancechecks for offsets against `BaseOffset` ([#14678](https://github.com/rapidsai/cudf/pull/14678)) [@shwina](https://github.com/shwina) +- Add row conversion code from spark-rapids-jni ([#14664](https://github.com/rapidsai/cudf/pull/14664)) [@ttnghia](https://github.com/ttnghia) +- Unconditionally export the CCCL path ([#14656](https://github.com/rapidsai/cudf/pull/14656)) [@vyasr](https://github.com/vyasr) +- Ensure libcudf searches for our patched version of CCCL first ([#14655](https://github.com/rapidsai/cudf/pull/14655)) [@robertmaynard](https://github.com/robertmaynard) +- Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink ([#14648](https://github.com/rapidsai/cudf/pull/14648)) [@vyasr](https://github.com/vyasr) +- Fix invalid memory access in Parquet reader ([#14637](https://github.com/rapidsai/cudf/pull/14637)) [@etseidl](https://github.com/etseidl) +- Use column_empty over as_column([]) ([#14632](https://github.com/rapidsai/cudf/pull/14632)) [@mroeschke](https://github.com/mroeschke) +- Add (implicit) handling for torch tensors in is_scalar ([#14623](https://github.com/rapidsai/cudf/pull/14623)) [@wence-](https://github.com/wence-) +- Fix astype/fillna not maintaining column subclass and types ([#14615](https://github.com/rapidsai/cudf/pull/14615)) [@mroeschke](https://github.com/mroeschke) +- Remove non-empty nulls in cudf::get_json_object ([#14609](https://github.com/rapidsai/cudf/pull/14609)) [@davidwendt](https://github.com/davidwendt) +- Remove `cuda::proclaim_return_type` from nested lambda ([#14607](https://github.com/rapidsai/cudf/pull/14607)) [@ttnghia](https://github.com/ttnghia) +- Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex ([#14605](https://github.com/rapidsai/cudf/pull/14605)) [@mroeschke](https://github.com/mroeschke) +- Address potential race conditions in Parquet reader ([#14602](https://github.com/rapidsai/cudf/pull/14602)) [@etseidl](https://github.com/etseidl) +- Fix DataFrame.reindex removing column name ([#14601](https://github.com/rapidsai/cudf/pull/14601)) [@mroeschke](https://github.com/mroeschke) +- Remove unsanitized input test data from copy gtests ([#14600](https://github.com/rapidsai/cudf/pull/14600)) [@davidwendt](https://github.com/davidwendt) +- Fix race detected in Parquet writer ([#14598](https://github.com/rapidsai/cudf/pull/14598)) [@etseidl](https://github.com/etseidl) +- Correct invalid or missing return types ([#14587](https://github.com/rapidsai/cudf/pull/14587)) [@robertmaynard](https://github.com/robertmaynard) +- Fix unsanitized nulls from strings segmented-reduce ([#14586](https://github.com/rapidsai/cudf/pull/14586)) [@davidwendt](https://github.com/davidwendt) +- Upgrade to nvCOMP 3.0.5 ([#14581](https://github.com/rapidsai/cudf/pull/14581)) [@davidwendt](https://github.com/davidwendt) +- Fix unsanitized nulls produced by `cudf::clamp` APIs ([#14580](https://github.com/rapidsai/cudf/pull/14580)) [@davidwendt](https://github.com/davidwendt) +- Fix unsanitized nulls produced by libcudf dictionary decode ([#14578](https://github.com/rapidsai/cudf/pull/14578)) [@davidwendt](https://github.com/davidwendt) +- Fixes a symbol group lookup table issue ([#14561](https://github.com/rapidsai/cudf/pull/14561)) [@elstehle](https://github.com/elstehle) +- Drop llvm16 from cuda118-conda devcontainer image ([#14526](https://github.com/rapidsai/cudf/pull/14526)) [@charlesbluca](https://github.com/charlesbluca) +- REF: Make DataFrame.from_pandas process by column ([#14483](https://github.com/rapidsai/cudf/pull/14483)) [@mroeschke](https://github.com/mroeschke) +- Improve memory footprint of isin by using contains ([#14478](https://github.com/rapidsai/cudf/pull/14478)) [@wence-](https://github.com/wence-) +- Move creation of env.yaml outside the current directory ([#14476](https://github.com/rapidsai/cudf/pull/14476)) [@davidwendt](https://github.com/davidwendt) +- Enable `pd.Timestamp` objects to be picklable when `cudf.pandas` is active ([#14474](https://github.com/rapidsai/cudf/pull/14474)) [@shwina](https://github.com/shwina) +- Correct dtype of count aggregations on empty dataframes ([#14473](https://github.com/rapidsai/cudf/pull/14473)) [@wence-](https://github.com/wence-) +- Avoid DataFrame conversion in `MultiIndex.from_pandas` ([#14470](https://github.com/rapidsai/cudf/pull/14470)) [@mroeschke](https://github.com/mroeschke) +- JSON writer: avoid default stream use in `string_scalar` constructors ([#14444](https://github.com/rapidsai/cudf/pull/14444)) [@vuule](https://github.com/vuule) +- Fix default stream use in the CSV reader ([#14443](https://github.com/rapidsai/cudf/pull/14443)) [@vuule](https://github.com/vuule) +- Preserve DataFrame(columns=).columns dtype during empty-like construction ([#14381](https://github.com/rapidsai/cudf/pull/14381)) [@mroeschke](https://github.com/mroeschke) +- Defer PTX file load to runtime ([#13690](https://github.com/rapidsai/cudf/pull/13690)) [@brandon-b-miller](https://github.com/brandon-b-miller) + +## 📖 Documentation + +- Disable parallel build ([#14796](https://github.com/rapidsai/cudf/pull/14796)) [@vyasr](https://github.com/vyasr) +- Add pylibcudf to the docs ([#14791](https://github.com/rapidsai/cudf/pull/14791)) [@vyasr](https://github.com/vyasr) +- Describe unpickling expectations when cudf.pandas is enabled ([#14693](https://github.com/rapidsai/cudf/pull/14693)) [@shwina](https://github.com/shwina) +- Update CONTRIBUTING for pyproject-only builds ([#14653](https://github.com/rapidsai/cudf/pull/14653)) [@vyasr](https://github.com/vyasr) +- More doxygen fixes ([#14639](https://github.com/rapidsai/cudf/pull/14639)) [@vyasr](https://github.com/vyasr) +- Enable doxygen XML generation and fix issues ([#14477](https://github.com/rapidsai/cudf/pull/14477)) [@vyasr](https://github.com/vyasr) +- Some doxygen improvements ([#14469](https://github.com/rapidsai/cudf/pull/14469)) [@vyasr](https://github.com/vyasr) +- Remove warning in dask-cudf docs ([#14454](https://github.com/rapidsai/cudf/pull/14454)) [@wence-](https://github.com/wence-) +- Update README links with redirects. ([#14378](https://github.com/rapidsai/cudf/pull/14378)) [@bdice](https://github.com/bdice) +- Add pip install instructions to README ([#13677](https://github.com/rapidsai/cudf/pull/13677)) [@shwina](https://github.com/shwina) + +## 🚀 New Features + +- Add ci check for external kernels ([#14768](https://github.com/rapidsai/cudf/pull/14768)) [@robertmaynard](https://github.com/robertmaynard) +- JSON single quote normalization API ([#14729](https://github.com/rapidsai/cudf/pull/14729)) [@shrshi](https://github.com/shrshi) +- Write cuDF version in Parquet "created_by" metadata field ([#14721](https://github.com/rapidsai/cudf/pull/14721)) [@etseidl](https://github.com/etseidl) +- Implement remaining copying APIs in pylibcudf along with required helper functions ([#14640](https://github.com/rapidsai/cudf/pull/14640)) [@vyasr](https://github.com/vyasr) +- Don't constrain `numba<0.58` ([#14616](https://github.com/rapidsai/cudf/pull/14616)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet ([#14590](https://github.com/rapidsai/cudf/pull/14590)) [@etseidl](https://github.com/etseidl) +- JSON - Parse mixed types as string in JSON reader ([#14572](https://github.com/rapidsai/cudf/pull/14572)) [@karthikeyann](https://github.com/karthikeyann) +- JSON quote normalization ([#14545](https://github.com/rapidsai/cudf/pull/14545)) [@shrshi](https://github.com/shrshi) +- Make DefaultHostMemoryAllocator settable ([#14523](https://github.com/rapidsai/cudf/pull/14523)) [@gerashegalov](https://github.com/gerashegalov) +- Implement more copying APIs in pylibcudf ([#14508](https://github.com/rapidsai/cudf/pull/14508)) [@vyasr](https://github.com/vyasr) +- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule) +- Parquet sub-rowgroup reading. ([#14360](https://github.com/rapidsai/cudf/pull/14360)) [@nvdbaranec](https://github.com/nvdbaranec) +- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann) +- PARQUET-2261 Size Statistics ([#14000](https://github.com/rapidsai/cudf/pull/14000)) [@etseidl](https://github.com/etseidl) +- Improve GroupBy JIT error handling ([#13854](https://github.com/rapidsai/cudf/pull/13854)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Generate unified Python/C++ docs ([#13846](https://github.com/rapidsai/cudf/pull/13846)) [@vyasr](https://github.com/vyasr) +- Expand JIT groupby test suite ([#13813](https://github.com/rapidsai/cudf/pull/13813)) [@brandon-b-miller](https://github.com/brandon-b-miller) + +## 🛠️ Improvements + +- Pin `pytest<8` ([#14920](https://github.com/rapidsai/cudf/pull/14920)) [@galipremsagar](https://github.com/galipremsagar) +- Move cudf::char_utf8 definition from detail to public header ([#14779](https://github.com/rapidsai/cudf/pull/14779)) [@davidwendt](https://github.com/davidwendt) +- Clean up `TimedeltaIndex.__init__` constructor ([#14775](https://github.com/rapidsai/cudf/pull/14775)) [@mroeschke](https://github.com/mroeschke) +- Clean up `DatetimeIndex.__init__` constructor ([#14774](https://github.com/rapidsai/cudf/pull/14774)) [@mroeschke](https://github.com/mroeschke) +- Some `frame.py` typing, move seldom used methods in `frame.py` ([#14766](https://github.com/rapidsai/cudf/pull/14766)) [@mroeschke](https://github.com/mroeschke) +- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke) +- fix benchmarks compatibility with newer pytest-cases ([#14764](https://github.com/rapidsai/cudf/pull/14764)) [@jameslamb](https://github.com/jameslamb) +- Add `pynvjitlink` as a dependency ([#14763](https://github.com/rapidsai/cudf/pull/14763)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Resolve degenerate performance in `create_structs_data` ([#14761](https://github.com/rapidsai/cudf/pull/14761)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Simplify ColumnAccessor methods; avoid unnecessary validations ([#14758](https://github.com/rapidsai/cudf/pull/14758)) [@mroeschke](https://github.com/mroeschke) +- Pin pytest-cases<3.8.2 ([#14756](https://github.com/rapidsai/cudf/pull/14756)) [@mroeschke](https://github.com/mroeschke) +- Use _from_data instead of _from_columns for initialzing Frame ([#14755](https://github.com/rapidsai/cudf/pull/14755)) [@mroeschke](https://github.com/mroeschke) +- Consolidate cudf object handling in as_column ([#14754](https://github.com/rapidsai/cudf/pull/14754)) [@mroeschke](https://github.com/mroeschke) +- Reduce execution time of Parquet C++ tests ([#14750](https://github.com/rapidsai/cudf/pull/14750)) [@vuule](https://github.com/vuule) +- Implement to_datetime(..., utc=True) ([#14749](https://github.com/rapidsai/cudf/pull/14749)) [@mroeschke](https://github.com/mroeschke) +- Remove usages of rapids-env-update ([#14748](https://github.com/rapidsai/cudf/pull/14748)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Provide explicit pool size and avoid RMM detail APIs ([#14741](https://github.com/rapidsai/cudf/pull/14741)) [@harrism](https://github.com/harrism) +- Implement `cudf.MultiIndex.from_arrays` ([#14740](https://github.com/rapidsai/cudf/pull/14740)) [@mroeschke](https://github.com/mroeschke) +- Remove unused/single use methods ([#14739](https://github.com/rapidsai/cudf/pull/14739)) [@mroeschke](https://github.com/mroeschke) +- refactor CUDA versions in dependencies.yaml ([#14733](https://github.com/rapidsai/cudf/pull/14733)) [@jameslamb](https://github.com/jameslamb) +- Remove unneeded methods in Column ([#14730](https://github.com/rapidsai/cudf/pull/14730)) [@mroeschke](https://github.com/mroeschke) +- Clean up base column methods ([#14725](https://github.com/rapidsai/cudf/pull/14725)) [@mroeschke](https://github.com/mroeschke) +- Ensure column.fillna signatures are consistent ([#14724](https://github.com/rapidsai/cudf/pull/14724)) [@mroeschke](https://github.com/mroeschke) +- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke) +- Replace as_numerical with as_numerical_column/codes ([#14719](https://github.com/rapidsai/cudf/pull/14719)) [@mroeschke](https://github.com/mroeschke) +- Use offsetalator in gather_chars ([#14700](https://github.com/rapidsai/cudf/pull/14700)) [@davidwendt](https://github.com/davidwendt) +- Use make_strings_children for fill() specialization logic ([#14697](https://github.com/rapidsai/cudf/pull/14697)) [@davidwendt](https://github.com/davidwendt) +- Change `io::detail::orc` namespace into `io::orc::detail` ([#14696](https://github.com/rapidsai/cudf/pull/14696)) [@ttnghia](https://github.com/ttnghia) +- Fix call to deprecated factory function ([#14695](https://github.com/rapidsai/cudf/pull/14695)) [@davidwendt](https://github.com/davidwendt) +- Use as_column instead of arange for range like inputs ([#14689](https://github.com/rapidsai/cudf/pull/14689)) [@mroeschke](https://github.com/mroeschke) +- Reorganize ORC reader into multiple files and perform some small fixes to cuIO code ([#14665](https://github.com/rapidsai/cudf/pull/14665)) [@ttnghia](https://github.com/ttnghia) +- Split parquet test into multiple files ([#14663](https://github.com/rapidsai/cudf/pull/14663)) [@etseidl](https://github.com/etseidl) +- Custom error messages for IO with nonexistent files ([#14662](https://github.com/rapidsai/cudf/pull/14662)) [@vuule](https://github.com/vuule) +- Explicitly pass .dtype into is_foo_dtype functions ([#14657](https://github.com/rapidsai/cudf/pull/14657)) [@mroeschke](https://github.com/mroeschke) +- Basic validation in reader benchmarks ([#14647](https://github.com/rapidsai/cudf/pull/14647)) [@vuule](https://github.com/vuule) +- Update dependencies.yaml to support CUDA 12.*. ([#14644](https://github.com/rapidsai/cudf/pull/14644)) [@bdice](https://github.com/bdice) +- Consolidate memoryview handling in as_column ([#14643](https://github.com/rapidsai/cudf/pull/14643)) [@mroeschke](https://github.com/mroeschke) +- Convert `FieldType` to scoped enum ([#14642](https://github.com/rapidsai/cudf/pull/14642)) [@vuule](https://github.com/vuule) +- Use instance over is_foo_dtype ([#14641](https://github.com/rapidsai/cudf/pull/14641)) [@mroeschke](https://github.com/mroeschke) +- Use isinstance over is_foo_dtype internally ([#14638](https://github.com/rapidsai/cudf/pull/14638)) [@mroeschke](https://github.com/mroeschke) +- Remove unnecessary **kwargs in function signatures ([#14635](https://github.com/rapidsai/cudf/pull/14635)) [@mroeschke](https://github.com/mroeschke) +- Drop nvbench patch for nvml. ([#14631](https://github.com/rapidsai/cudf/pull/14631)) [@bdice](https://github.com/bdice) +- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice) +- Add cpp/doxygen/xml to .gitignore ([#14613](https://github.com/rapidsai/cudf/pull/14613)) [@davidwendt](https://github.com/davidwendt) +- Create strings-specific make_offsets_child_column for multiple offset types ([#14612](https://github.com/rapidsai/cudf/pull/14612)) [@davidwendt](https://github.com/davidwendt) +- Use the offsetalator in cudf::concatenate for strings ([#14611](https://github.com/rapidsai/cudf/pull/14611)) [@davidwendt](https://github.com/davidwendt) +- Make Parquet ColumnIndex null_counts optional ([#14596](https://github.com/rapidsai/cudf/pull/14596)) [@etseidl](https://github.com/etseidl) +- Support `freq` in DatetimeIndex ([#14593](https://github.com/rapidsai/cudf/pull/14593)) [@shwina](https://github.com/shwina) +- Remove legacy benchmarks for cuDF-python ([#14591](https://github.com/rapidsai/cudf/pull/14591)) [@osidekyle](https://github.com/osidekyle) +- Remove WORKSPACE env var from cudf_test temp_directory class ([#14588](https://github.com/rapidsai/cudf/pull/14588)) [@davidwendt](https://github.com/davidwendt) +- Use exceptions instead of return values to handle errors in `CompactProtocolReader` ([#14582](https://github.com/rapidsai/cudf/pull/14582)) [@vuule](https://github.com/vuule) +- Use cuda::proclaim_return_type on device lambdas. ([#14577](https://github.com/rapidsai/cudf/pull/14577)) [@bdice](https://github.com/bdice) +- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice) +- Update dependencies.yaml to new pip index ([#14575](https://github.com/rapidsai/cudf/pull/14575)) [@vyasr](https://github.com/vyasr) +- Simplify Python CMake ([#14565](https://github.com/rapidsai/cudf/pull/14565)) [@vyasr](https://github.com/vyasr) +- Java expose parquet pass_read_limit ([#14564](https://github.com/rapidsai/cudf/pull/14564)) [@revans2](https://github.com/revans2) +- Add column sanitization checks in `CUDF_TEST_EXPECT_COLUMN_*` macros ([#14559](https://github.com/rapidsai/cudf/pull/14559)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Use cudf_test temp_directory class for nvtext::subword_tokenize gbenchmark ([#14558](https://github.com/rapidsai/cudf/pull/14558)) [@davidwendt](https://github.com/davidwendt) +- Fix return type of prefix increment overloads ([#14544](https://github.com/rapidsai/cudf/pull/14544)) [@vuule](https://github.com/vuule) +- Make bpe_merge_pairs_impl member private ([#14543](https://github.com/rapidsai/cudf/pull/14543)) [@davidwendt](https://github.com/davidwendt) +- Small clean up in `io::statistics` ([#14542](https://github.com/rapidsai/cudf/pull/14542)) [@vuule](https://github.com/vuule) +- Change json gtest environment variable to compile-time definition ([#14541](https://github.com/rapidsai/cudf/pull/14541)) [@davidwendt](https://github.com/davidwendt) +- Remove extra total chars size calculation from cudf::concatenate ([#14540](https://github.com/rapidsai/cudf/pull/14540)) [@davidwendt](https://github.com/davidwendt) +- Refactor IndexedFrame.hash_values to use cudf::hashing functions, add xxhash64 to cudf Python. ([#14538](https://github.com/rapidsai/cudf/pull/14538)) [@bdice](https://github.com/bdice) +- Move non-templated inline function definitions from table_view.hpp to table_view.cpp ([#14535](https://github.com/rapidsai/cudf/pull/14535)) [@davidwendt](https://github.com/davidwendt) +- Add JNI for strings::code_points ([#14533](https://github.com/rapidsai/cudf/pull/14533)) [@thirtiseven](https://github.com/thirtiseven) +- Add a test for issue 12773 ([#14529](https://github.com/rapidsai/cudf/pull/14529)) [@vyasr](https://github.com/vyasr) +- Split libarrow build dependencies. ([#14506](https://github.com/rapidsai/cudf/pull/14506)) [@bdice](https://github.com/bdice) +- Implement `IndexedFrame.duplicated` with `distinct_indices` + `scatter` ([#14493](https://github.com/rapidsai/cudf/pull/14493)) [@wence-](https://github.com/wence-) +- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-) +- Remove unsanitized null from input strings column in rank_tests.cpp ([#14475](https://github.com/rapidsai/cudf/pull/14475)) [@davidwendt](https://github.com/davidwendt) +- Refactor Parquet kernel_error ([#14464](https://github.com/rapidsai/cudf/pull/14464)) [@etseidl](https://github.com/etseidl) +- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt) +- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt) +- Introduce Comprehensive Pathological Unit Tests for Issue #14409 ([#14459](https://github.com/rapidsai/cudf/pull/14459)) [@aocsa](https://github.com/aocsa) +- Expose stream parameter in public nvtext APIs ([#14456](https://github.com/rapidsai/cudf/pull/14456)) [@davidwendt](https://github.com/davidwendt) +- Include encode type in the error message when unsupported Parquet encoding is detected ([#14453](https://github.com/rapidsai/cudf/pull/14453)) [@ZelboK](https://github.com/ZelboK) +- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann) +- Refactor cudf.Series.__init__ ([#14450](https://github.com/rapidsai/cudf/pull/14450)) [@mroeschke](https://github.com/mroeschke) +- Remove the use of `volatile` in Parquet ([#14448](https://github.com/rapidsai/cudf/pull/14448)) [@vuule](https://github.com/vuule) +- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke) +- Testing stream pool implementation ([#14437](https://github.com/rapidsai/cudf/pull/14437)) [@shrshi](https://github.com/shrshi) +- Match pandas join ordering obligations in pandas-compatible mode ([#14428](https://github.com/rapidsai/cudf/pull/14428)) [@wence-](https://github.com/wence-) +- Forward-merge branch-23.12 to branch-24.02 ([#14426](https://github.com/rapidsai/cudf/pull/14426)) [@bdice](https://github.com/bdice) +- Use isinstance(..., cudf.IntervalDtype) instead of is_interval_dtype ([#14424](https://github.com/rapidsai/cudf/pull/14424)) [@mroeschke](https://github.com/mroeschke) +- Use isinstance(..., cudf.CategoricalDtype) instead of is_categorical_dtype ([#14423](https://github.com/rapidsai/cudf/pull/14423)) [@mroeschke](https://github.com/mroeschke) +- Forward-merge branch-23.12 to branch-24.02 ([#14422](https://github.com/rapidsai/cudf/pull/14422)) [@bdice](https://github.com/bdice) +- REF: Remove instances of pd.core ([#14421](https://github.com/rapidsai/cudf/pull/14421)) [@mroeschke](https://github.com/mroeschke) +- Expose streams in public filling APIs for label_bins ([#14401](https://github.com/rapidsai/cudf/pull/14401)) [@ZelboK](https://github.com/ZelboK) +- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke) +- Limit DELTA_BINARY_PACKED encoder to the same number of bits as the physical type being encoded ([#14392](https://github.com/rapidsai/cudf/pull/14392)) [@etseidl](https://github.com/etseidl) +- Add SHA-1 and SHA-2 hash functions. ([#14391](https://github.com/rapidsai/cudf/pull/14391)) [@bdice](https://github.com/bdice) +- Expose streams in Parquet reader and writer APIs ([#14359](https://github.com/rapidsai/cudf/pull/14359)) [@shrshi](https://github.com/shrshi) +- Update to fmt 10.1.1 and spdlog 1.12.0. ([#14355](https://github.com/rapidsai/cudf/pull/14355)) [@bdice](https://github.com/bdice) +- Replace default stream for scalars and column factories usages (because of defaulted arguments) ([#14354](https://github.com/rapidsai/cudf/pull/14354)) [@karthikeyann](https://github.com/karthikeyann) +- Expose streams in ORC reader and writer APIs ([#14350](https://github.com/rapidsai/cudf/pull/14350)) [@shrshi](https://github.com/shrshi) +- Convert compression and io to string axis type in IO benchmarks ([#14347](https://github.com/rapidsai/cudf/pull/14347)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Add cuDF devcontainers ([#14015](https://github.com/rapidsai/cudf/pull/14015)) [@trxcllnt](https://github.com/trxcllnt) +- Refactoring of Buffers (last step towards unifying COW and Spilling) ([#13801](https://github.com/rapidsai/cudf/pull/13801)) [@madsbk](https://github.com/madsbk) +- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr) +- Simplify null count checking in column equality comparator ([#13312](https://github.com/rapidsai/cudf/pull/13312)) [@vyasr](https://github.com/vyasr) + # cuDF 23.12.00 (6 Dec 2023) ## 🚨 Breaking Changes