diff --git a/CHANGELOG.md b/CHANGELOG.md index 425165beb70..f1b63afb6f2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,223 @@ +# cuDF 23.04.00 (6 Apr 2023) + +## 🚨 Breaking Changes + +- Pin `dask` and `distributed` for release ([#13070](https://github.com/rapidsai/cudf/pull/13070)) [@galipremsagar](https://github.com/galipremsagar) +- Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. ([#12947](https://github.com/rapidsai/cudf/pull/12947)) [@bdice](https://github.com/bdice) +- Update minimum `pandas` and `numpy` pinnings ([#12887](https://github.com/rapidsai/cudf/pull/12887)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `names` & `dtype` in `Index.copy` ([#12825](https://github.com/rapidsai/cudf/pull/12825)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `Index.is_*` methods ([#12820](https://github.com/rapidsai/cudf/pull/12820)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `datetime_is_numeric` from `describe` ([#12818](https://github.com/rapidsai/cudf/pull/12818)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `na_sentinel` in `factorize` ([#12817](https://github.com/rapidsai/cudf/pull/12817)) [@galipremsagar](https://github.com/galipremsagar) +- Make string methods return a Series with a useful Index ([#12814](https://github.com/rapidsai/cudf/pull/12814)) [@shwina](https://github.com/shwina) +- Produce useful guidance on overflow error in `to_csv` ([#12705](https://github.com/rapidsai/cudf/pull/12705)) [@wence-](https://github.com/wence-) +- Move `strings_udf` code into cuDF ([#12669](https://github.com/rapidsai/cudf/pull/12669)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings ([#12609](https://github.com/rapidsai/cudf/pull/12609)) [@davidwendt](https://github.com/davidwendt) +- Replace message parsing with throwing more specific exceptions ([#12426](https://github.com/rapidsai/cudf/pull/12426)) [@vyasr](https://github.com/vyasr) + +## 🐛 Bug Fixes + +- Fix memcheck script to execute only _TEST files found in bin/gtests/libcudf ([#13006](https://github.com/rapidsai/cudf/pull/13006)) [@davidwendt](https://github.com/davidwendt) +- Fix `DataFrame` constructor to broadcast scalar inputs properly ([#12997](https://github.com/rapidsai/cudf/pull/12997)) [@galipremsagar](https://github.com/galipremsagar) +- Drop `force_nullable_schema` from chunked parquet writer ([#12996](https://github.com/rapidsai/cudf/pull/12996)) [@galipremsagar](https://github.com/galipremsagar) +- Fix gtest column utility comparator diff reporting ([#12995](https://github.com/rapidsai/cudf/pull/12995)) [@davidwendt](https://github.com/davidwendt) +- Handle index names while performing `groupby` ([#12992](https://github.com/rapidsai/cudf/pull/12992)) [@galipremsagar](https://github.com/galipremsagar) +- Fix `__setitem__` on string columns when the scalar value ends in a null byte ([#12991](https://github.com/rapidsai/cudf/pull/12991)) [@wence-](https://github.com/wence-) +- Fix `sort_values` when column is all empty strings ([#12988](https://github.com/rapidsai/cudf/pull/12988)) [@eriknw](https://github.com/eriknw) +- Remove unused variable and fix memory issue in ORC writer ([#12984](https://github.com/rapidsai/cudf/pull/12984)) [@ttnghia](https://github.com/ttnghia) +- Pre-emptive fix for upstream `dask.dataframe.read_parquet` changes ([#12983](https://github.com/rapidsai/cudf/pull/12983)) [@rjzamora](https://github.com/rjzamora) +- Remove MANIFEST.in use auto-generated one for sdists and package_data for wheels ([#12960](https://github.com/rapidsai/cudf/pull/12960)) [@vyasr](https://github.com/vyasr) +- Update to use rapids-export(COMPONENTS) feature. ([#12959](https://github.com/rapidsai/cudf/pull/12959)) [@robertmaynard](https://github.com/robertmaynard) +- cudftestutil supports static gtest dependencies ([#12957](https://github.com/rapidsai/cudf/pull/12957)) [@robertmaynard](https://github.com/robertmaynard) +- Include gtest in build environment. ([#12956](https://github.com/rapidsai/cudf/pull/12956)) [@vyasr](https://github.com/vyasr) +- Correctly handle scalar indices in `Index.__getitem__` ([#12955](https://github.com/rapidsai/cudf/pull/12955)) [@wence-](https://github.com/wence-) +- Avoid building cython twice ([#12945](https://github.com/rapidsai/cudf/pull/12945)) [@galipremsagar](https://github.com/galipremsagar) +- Fix set index error for Series rolling window operations ([#12942](https://github.com/rapidsai/cudf/pull/12942)) [@galipremsagar](https://github.com/galipremsagar) +- Fix calculation of null counts for Parquet statistics ([#12938](https://github.com/rapidsai/cudf/pull/12938)) [@etseidl](https://github.com/etseidl) +- Preserve integer dtype of hive-partitioned column containing nulls ([#12930](https://github.com/rapidsai/cudf/pull/12930)) [@rjzamora](https://github.com/rjzamora) +- Use get_current_device_resource for intermediate allocations in COLLECT_LIST window code ([#12927](https://github.com/rapidsai/cudf/pull/12927)) [@karthikeyann](https://github.com/karthikeyann) +- Mark dlpack tensor deleter as noexcept to match PyCapsule_Destructor signature. ([#12921](https://github.com/rapidsai/cudf/pull/12921)) [@bdice](https://github.com/bdice) +- Fix conda recipe post-link.sh typo ([#12916](https://github.com/rapidsai/cudf/pull/12916)) [@pentschev](https://github.com/pentschev) +- min_rows and num_rows are swapped in ComputePageSizes declaration in Parquet reader ([#12886](https://github.com/rapidsai/cudf/pull/12886)) [@etseidl](https://github.com/etseidl) +- Expect cupy to now support bool arrays for dlpack. ([#12883](https://github.com/rapidsai/cudf/pull/12883)) [@vyasr](https://github.com/vyasr) +- Use python -m pytest for nightly wheel tests ([#12871](https://github.com/rapidsai/cudf/pull/12871)) [@bdice](https://github.com/bdice) +- Parquet writer column_size() should return a size_t ([#12870](https://github.com/rapidsai/cudf/pull/12870)) [@etseidl](https://github.com/etseidl) +- Fix cudf::hash_partition kernel launch error with decimal128 types ([#12863](https://github.com/rapidsai/cudf/pull/12863)) [@davidwendt](https://github.com/davidwendt) +- Fix an issue with parquet chunked reader undercounting string lengths. ([#12859](https://github.com/rapidsai/cudf/pull/12859)) [@nvdbaranec](https://github.com/nvdbaranec) +- Remove tokenizers pre-install pinning. ([#12854](https://github.com/rapidsai/cudf/pull/12854)) [@vyasr](https://github.com/vyasr) +- Fix parquet `RangeIndex` bug ([#12838](https://github.com/rapidsai/cudf/pull/12838)) [@rjzamora](https://github.com/rjzamora) +- Remove KAFKA_HOST_TEST from compute-sanitizer check ([#12831](https://github.com/rapidsai/cudf/pull/12831)) [@davidwendt](https://github.com/davidwendt) +- Make string methods return a Series with a useful Index ([#12814](https://github.com/rapidsai/cudf/pull/12814)) [@shwina](https://github.com/shwina) +- Tell cudf_kafka to use header-only fmt ([#12796](https://github.com/rapidsai/cudf/pull/12796)) [@vyasr](https://github.com/vyasr) +- Add `GroupBy.dtypes` ([#12783](https://github.com/rapidsai/cudf/pull/12783)) [@galipremsagar](https://github.com/galipremsagar) +- Fix a leak in a test and clarify some test names ([#12781](https://github.com/rapidsai/cudf/pull/12781)) [@revans2](https://github.com/revans2) +- Fix bug in all-null list due to join_list_elements special handling ([#12767](https://github.com/rapidsai/cudf/pull/12767)) [@karthikeyann](https://github.com/karthikeyann) +- Add try/except for expected null-schema error in read_parquet ([#12756](https://github.com/rapidsai/cudf/pull/12756)) [@rjzamora](https://github.com/rjzamora) +- Throw an exception if an unsupported page encoding is detected in Parquet reader ([#12754](https://github.com/rapidsai/cudf/pull/12754)) [@etseidl](https://github.com/etseidl) +- Fix a bug with `num_keys` in `_scatter_by_slice` ([#12749](https://github.com/rapidsai/cudf/pull/12749)) [@thomcom](https://github.com/thomcom) +- Bump pinned rapids wheel deps to 23.4 ([#12735](https://github.com/rapidsai/cudf/pull/12735)) [@sevagh](https://github.com/sevagh) +- Rework logic in cudf::strings::split_record to improve performance ([#12729](https://github.com/rapidsai/cudf/pull/12729)) [@davidwendt](https://github.com/davidwendt) +- Add `always_nullable` flag to Dremel encoding ([#12727](https://github.com/rapidsai/cudf/pull/12727)) [@divyegala](https://github.com/divyegala) +- Fix memcheck read error in compound segmented reduce ([#12722](https://github.com/rapidsai/cudf/pull/12722)) [@davidwendt](https://github.com/davidwendt) +- Fix faulty conditional logic in JIT `GroupBy.apply` ([#12706](https://github.com/rapidsai/cudf/pull/12706)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Produce useful guidance on overflow error in `to_csv` ([#12705](https://github.com/rapidsai/cudf/pull/12705)) [@wence-](https://github.com/wence-) +- Handle parquet list data corner case ([#12698](https://github.com/rapidsai/cudf/pull/12698)) [@nvdbaranec](https://github.com/nvdbaranec) +- Fix missing trailing comma in json writer ([#12688](https://github.com/rapidsai/cudf/pull/12688)) [@karthikeyann](https://github.com/karthikeyann) +- Remove child fom newCudaAsyncMemoryResource ([#12681](https://github.com/rapidsai/cudf/pull/12681)) [@abellina](https://github.com/abellina) +- Handle bool types in `round` API ([#12670](https://github.com/rapidsai/cudf/pull/12670)) [@galipremsagar](https://github.com/galipremsagar) +- Ensure all of device bitmask is initialized in from_arrow ([#12668](https://github.com/rapidsai/cudf/pull/12668)) [@wence-](https://github.com/wence-) +- Fix `from_arrow` to load a sliced arrow table ([#12665](https://github.com/rapidsai/cudf/pull/12665)) [@galipremsagar](https://github.com/galipremsagar) +- Fix dask-cudf read_parquet bug for multi-file aggregation ([#12663](https://github.com/rapidsai/cudf/pull/12663)) [@rjzamora](https://github.com/rjzamora) +- Fix AllocateLikeTest gtests reading uninitialized null-mask ([#12643](https://github.com/rapidsai/cudf/pull/12643)) [@davidwendt](https://github.com/davidwendt) +- Fix `find_common_dtype` and `values` to handle complex dtypes ([#12537](https://github.com/rapidsai/cudf/pull/12537)) [@galipremsagar](https://github.com/galipremsagar) +- Fix fetching of MultiIndex values when a label is passed ([#12521](https://github.com/rapidsai/cudf/pull/12521)) [@galipremsagar](https://github.com/galipremsagar) +- Fix `Series` comparison vs scalars ([#12519](https://github.com/rapidsai/cudf/pull/12519)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Allow casting from `UDFString` back to `StringView` to call methods in `strings_udf` ([#12363](https://github.com/rapidsai/cudf/pull/12363)) [@brandon-b-miller](https://github.com/brandon-b-miller) + +## 📖 Documentation + +- Fix `GroupBy.apply` doc examples rendering ([#12994](https://github.com/rapidsai/cudf/pull/12994)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- add sphinx building and s3 uploading for dask-cudf docs ([#12982](https://github.com/rapidsai/cudf/pull/12982)) [@quasiben](https://github.com/quasiben) +- Add developer documentation forbidding default parameters in detail APIs ([#12978](https://github.com/rapidsai/cudf/pull/12978)) [@vyasr](https://github.com/vyasr) +- Add README symlink for dask-cudf. ([#12946](https://github.com/rapidsai/cudf/pull/12946)) [@bdice](https://github.com/bdice) +- Remove return type from [@return doxygen tags ([#12908](https://github.com/rapidsai/cudf/pull/12908)) @davidwendt](https://github.com/return doxygen tags ([#12908](https://github.com/rapidsai/cudf/pull/12908)) @davidwendt) +- Fix docs build to be `pydata-sphinx-theme=0.13.0` compatible ([#12874](https://github.com/rapidsai/cudf/pull/12874)) [@galipremsagar](https://github.com/galipremsagar) +- Add skeleton API and prose documentation for dask-cudf ([#12725](https://github.com/rapidsai/cudf/pull/12725)) [@wence-](https://github.com/wence-) +- Enable doctests for GroupBy methods ([#12658](https://github.com/rapidsai/cudf/pull/12658)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Add comment about CUB patch for SegmentedSortInt.Bool gtest ([#12611](https://github.com/rapidsai/cudf/pull/12611)) [@davidwendt](https://github.com/davidwendt) + +## 🚀 New Features + +- Add JNI method for strings::replace multi variety ([#12979](https://github.com/rapidsai/cudf/pull/12979)) [@NVnavkumar](https://github.com/NVnavkumar) +- Add nunique aggregation support for cudf::segmented_reduce ([#12972](https://github.com/rapidsai/cudf/pull/12972)) [@davidwendt](https://github.com/davidwendt) +- Refactor orc chunked writer ([#12949](https://github.com/rapidsai/cudf/pull/12949)) [@ttnghia](https://github.com/ttnghia) +- Make Parquet writer `nullable` option application to single table writes ([#12933](https://github.com/rapidsai/cudf/pull/12933)) [@vuule](https://github.com/vuule) +- Refactor `io::orc::ProtobufWriter` ([#12877](https://github.com/rapidsai/cudf/pull/12877)) [@ttnghia](https://github.com/ttnghia) +- Make timezone table independent from ORC ([#12805](https://github.com/rapidsai/cudf/pull/12805)) [@vuule](https://github.com/vuule) +- Cache JIT `GroupBy.apply` functions ([#12802](https://github.com/rapidsai/cudf/pull/12802)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Implement initial support for avro logical types ([#6482) (#12788](https://github.com/rapidsai/cudf/pull/6482) (#12788)) [@tpn](https://github.com/tpn) +- Update `tests/column_utilities` to use `experimental::equality` row comparator ([#12777](https://github.com/rapidsai/cudf/pull/12777)) [@divyegala](https://github.com/divyegala) +- Update `distinct/unique_count` to `experimental::row` hasher/comparator ([#12776](https://github.com/rapidsai/cudf/pull/12776)) [@divyegala](https://github.com/divyegala) +- Update `hash_partition` to use `experimental::row::row_hasher` ([#12761](https://github.com/rapidsai/cudf/pull/12761)) [@divyegala](https://github.com/divyegala) +- Update `is_sorted` to use `experimental::row::lexicographic` ([#12752](https://github.com/rapidsai/cudf/pull/12752)) [@divyegala](https://github.com/divyegala) +- Update default data source in cuio reader benchmarks ([#12740](https://github.com/rapidsai/cudf/pull/12740)) [@PointKernel](https://github.com/PointKernel) +- Reenable stream identification library in CI ([#12714](https://github.com/rapidsai/cudf/pull/12714)) [@vyasr](https://github.com/vyasr) +- Add `regex_program` strings splitting java APIs and tests ([#12713](https://github.com/rapidsai/cudf/pull/12713)) [@cindyyuanjiang](https://github.com/cindyyuanjiang) +- Add `regex_program` strings replacing java APIs and tests ([#12701](https://github.com/rapidsai/cudf/pull/12701)) [@cindyyuanjiang](https://github.com/cindyyuanjiang) +- Add `regex_program` strings extract java APIs and tests ([#12699](https://github.com/rapidsai/cudf/pull/12699)) [@cindyyuanjiang](https://github.com/cindyyuanjiang) +- Variable fragment sizes for Parquet writer ([#12685](https://github.com/rapidsai/cudf/pull/12685)) [@etseidl](https://github.com/etseidl) +- Add segmented reduction support for fixed-point types ([#12680](https://github.com/rapidsai/cudf/pull/12680)) [@davidwendt](https://github.com/davidwendt) +- Move `strings_udf` code into cuDF ([#12669](https://github.com/rapidsai/cudf/pull/12669)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Add `regex_program` searching APIs and related java classes ([#12666](https://github.com/rapidsai/cudf/pull/12666)) [@cindyyuanjiang](https://github.com/cindyyuanjiang) +- Add logging to libcudf ([#12637](https://github.com/rapidsai/cudf/pull/12637)) [@vuule](https://github.com/vuule) +- Add compound aggregations to cudf::segmented_reduce ([#12573](https://github.com/rapidsai/cudf/pull/12573)) [@davidwendt](https://github.com/davidwendt) +- Convert `rank` to use to experimental row comparators ([#12481](https://github.com/rapidsai/cudf/pull/12481)) [@divyegala](https://github.com/divyegala) +- Use rapids-cmake parallel testing feature ([#12451](https://github.com/rapidsai/cudf/pull/12451)) [@robertmaynard](https://github.com/robertmaynard) +- Enable detection of undesired stream usage ([#12089](https://github.com/rapidsai/cudf/pull/12089)) [@vyasr](https://github.com/vyasr) + +## 🛠️ Improvements + +- Pin `dask` and `distributed` for release ([#13070](https://github.com/rapidsai/cudf/pull/13070)) [@galipremsagar](https://github.com/galipremsagar) +- Pin cupy in wheel tests to supported versions ([#13041](https://github.com/rapidsai/cudf/pull/13041)) [@vyasr](https://github.com/vyasr) +- Pin numba version ([#13001](https://github.com/rapidsai/cudf/pull/13001)) [@vyasr](https://github.com/vyasr) +- Rework gtests SequenceTest to remove using namepace cudf ([#12985](https://github.com/rapidsai/cudf/pull/12985)) [@davidwendt](https://github.com/davidwendt) +- Stop setting package version attribute in wheels ([#12977](https://github.com/rapidsai/cudf/pull/12977)) [@vyasr](https://github.com/vyasr) +- Move detail reduction functions to cudf::reduction::detail namespace ([#12971](https://github.com/rapidsai/cudf/pull/12971)) [@davidwendt](https://github.com/davidwendt) +- Remove default detail mrs: part7 ([#12970](https://github.com/rapidsai/cudf/pull/12970)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part6 ([#12969](https://github.com/rapidsai/cudf/pull/12969)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part5 ([#12968](https://github.com/rapidsai/cudf/pull/12968)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part4 ([#12967](https://github.com/rapidsai/cudf/pull/12967)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part3 ([#12966](https://github.com/rapidsai/cudf/pull/12966)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part2 ([#12965](https://github.com/rapidsai/cudf/pull/12965)) [@vyasr](https://github.com/vyasr) +- Remove default detail mrs: part1 ([#12964](https://github.com/rapidsai/cudf/pull/12964)) [@vyasr](https://github.com/vyasr) +- Add `force_nullable_schema` parameter to Parquet writer. ([#12952](https://github.com/rapidsai/cudf/pull/12952)) [@galipremsagar](https://github.com/galipremsagar) +- Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. ([#12947](https://github.com/rapidsai/cudf/pull/12947)) [@bdice](https://github.com/bdice) +- Remove remaining default stream parameters ([#12943](https://github.com/rapidsai/cudf/pull/12943)) [@vyasr](https://github.com/vyasr) +- Fix cudf::segmented_reduce gtest for ANY aggregation ([#12940](https://github.com/rapidsai/cudf/pull/12940)) [@davidwendt](https://github.com/davidwendt) +- Implement `groupby.head` and `groupby.tail` ([#12939](https://github.com/rapidsai/cudf/pull/12939)) [@wence-](https://github.com/wence-) +- Fix libcudf gtests to pass null-count=0 for empty validity masks ([#12923](https://github.com/rapidsai/cudf/pull/12923)) [@davidwendt](https://github.com/davidwendt) +- Migrate parquet encoding to use experimental row operators ([#12918](https://github.com/rapidsai/cudf/pull/12918)) [@PointKernel](https://github.com/PointKernel) +- Fix benchmarks coded in namespace cudf and using namespace cudf ([#12915](https://github.com/rapidsai/cudf/pull/12915)) [@karthikeyann](https://github.com/karthikeyann) +- Fix io/text gtests coded in namespace cudf::test ([#12914](https://github.com/rapidsai/cudf/pull/12914)) [@karthikeyann](https://github.com/karthikeyann) +- Pass `SCCACHE_S3_USE_SSL` to conda builds ([#12910](https://github.com/rapidsai/cudf/pull/12910)) [@ajschmidt8](https://github.com/ajschmidt8) +- Fix FST, JSON gtests & benchmarks coded in namespace cudf::test ([#12907](https://github.com/rapidsai/cudf/pull/12907)) [@karthikeyann](https://github.com/karthikeyann) +- Generate pyproject dependencies using dfg ([#12906](https://github.com/rapidsai/cudf/pull/12906)) [@vyasr](https://github.com/vyasr) +- Update libcudf counting functions to specify cudf::size_type ([#12904](https://github.com/rapidsai/cudf/pull/12904)) [@davidwendt](https://github.com/davidwendt) +- Fix `moto` env vars & pass `AWS_SESSION_TOKEN` to conda builds ([#12902](https://github.com/rapidsai/cudf/pull/12902)) [@ajschmidt8](https://github.com/ajschmidt8) +- Rewrite CSV writer benchmark with nvbench ([#12901](https://github.com/rapidsai/cudf/pull/12901)) [@PointKernel](https://github.com/PointKernel) +- Rework some code logic to reduce iterator and comparator inlining to improve compile time ([#12900](https://github.com/rapidsai/cudf/pull/12900)) [@davidwendt](https://github.com/davidwendt) +- Deprecate `line_terminator` in favor of `lineterminator` in `to_csv` ([#12896](https://github.com/rapidsai/cudf/pull/12896)) [@wence-](https://github.com/wence-) +- Add `stream` and `mr` parameters for `structs::detail::flatten_nested_columns` ([#12892](https://github.com/rapidsai/cudf/pull/12892)) [@ttnghia](https://github.com/ttnghia) +- Deprecate libcudf regex APIs accepting pattern strings directly ([#12891](https://github.com/rapidsai/cudf/pull/12891)) [@davidwendt](https://github.com/davidwendt) +- Remove default parameters from detail headers in include ([#12888](https://github.com/rapidsai/cudf/pull/12888)) [@vyasr](https://github.com/vyasr) +- Update minimum `pandas` and `numpy` pinnings ([#12887](https://github.com/rapidsai/cudf/pull/12887)) [@galipremsagar](https://github.com/galipremsagar) +- Implement `groupby.sample` ([#12882](https://github.com/rapidsai/cudf/pull/12882)) [@wence-](https://github.com/wence-) +- Update JNI build ENV default to gcc 11 ([#12881](https://github.com/rapidsai/cudf/pull/12881)) [@pxLi](https://github.com/pxLi) +- Change return type of `cudf::structs::detail::flatten_nested_columns` to smart pointer ([#12878](https://github.com/rapidsai/cudf/pull/12878)) [@ttnghia](https://github.com/ttnghia) +- Fix passing seed parameter to MurmurHash3_32 in cudf::hash() function ([#12875](https://github.com/rapidsai/cudf/pull/12875)) [@davidwendt](https://github.com/davidwendt) +- Remove manual artifact upload step in CI ([#12869](https://github.com/rapidsai/cudf/pull/12869)) [@ajschmidt8](https://github.com/ajschmidt8) +- Update to GCC 11 ([#12868](https://github.com/rapidsai/cudf/pull/12868)) [@bdice](https://github.com/bdice) +- Fix null hive-partition behavior in dask-cudf parquet ([#12866](https://github.com/rapidsai/cudf/pull/12866)) [@rjzamora](https://github.com/rjzamora) +- Update to protobuf>=4.21.6,<4.22. ([#12864](https://github.com/rapidsai/cudf/pull/12864)) [@bdice](https://github.com/bdice) +- Update RMM allocators ([#12861](https://github.com/rapidsai/cudf/pull/12861)) [@pentschev](https://github.com/pentschev) +- Improve performance for replace-multi for long strings ([#12858](https://github.com/rapidsai/cudf/pull/12858)) [@davidwendt](https://github.com/davidwendt) +- Drop Python 3.7 handling for pickle protocol 4 ([#12857](https://github.com/rapidsai/cudf/pull/12857)) [@jakirkham](https://github.com/jakirkham) +- Migrate as much as possible to pyproject.toml ([#12850](https://github.com/rapidsai/cudf/pull/12850)) [@vyasr](https://github.com/vyasr) +- Enable nbqa pre-commit hooks for isort and black. ([#12848](https://github.com/rapidsai/cudf/pull/12848)) [@bdice](https://github.com/bdice) +- Setting a threshold for KvikIO IO ([#12841](https://github.com/rapidsai/cudf/pull/12841)) [@madsbk](https://github.com/madsbk) +- Update datasets download URL ([#12840](https://github.com/rapidsai/cudf/pull/12840)) [@jjacobelli](https://github.com/jjacobelli) +- Make docs builds less verbose ([#12836](https://github.com/rapidsai/cudf/pull/12836)) [@AyodeAwe](https://github.com/AyodeAwe) +- Consolidate linter configs into pyproject.toml ([#12834](https://github.com/rapidsai/cudf/pull/12834)) [@vyasr](https://github.com/vyasr) +- Deprecate `names` & `dtype` in `Index.copy` ([#12825](https://github.com/rapidsai/cudf/pull/12825)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `inplace` parameters in categorical methods ([#12824](https://github.com/rapidsai/cudf/pull/12824)) [@galipremsagar](https://github.com/galipremsagar) +- Add optional text file support to ninja-log utility ([#12823](https://github.com/rapidsai/cudf/pull/12823)) [@davidwendt](https://github.com/davidwendt) +- Deprecate `Index.is_*` methods ([#12820](https://github.com/rapidsai/cudf/pull/12820)) [@galipremsagar](https://github.com/galipremsagar) +- Add dfg as a pre-commit hook ([#12819](https://github.com/rapidsai/cudf/pull/12819)) [@vyasr](https://github.com/vyasr) +- Deprecate `datetime_is_numeric` from `describe` ([#12818](https://github.com/rapidsai/cudf/pull/12818)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate `na_sentinel` in `factorize` ([#12817](https://github.com/rapidsai/cudf/pull/12817)) [@galipremsagar](https://github.com/galipremsagar) +- Shuffling read into a sub function in parquet read ([#12809](https://github.com/rapidsai/cudf/pull/12809)) [@hyperbolic2346](https://github.com/hyperbolic2346) +- Fixing parquet coalescing of reads ([#12808](https://github.com/rapidsai/cudf/pull/12808)) [@hyperbolic2346](https://github.com/hyperbolic2346) +- CI: Remove specification of manual stage for check_style.sh script. ([#12803](https://github.com/rapidsai/cudf/pull/12803)) [@csadorf](https://github.com/csadorf) +- Add compute-sanitizer github workflow action to nightly tests ([#12800](https://github.com/rapidsai/cudf/pull/12800)) [@davidwendt](https://github.com/davidwendt) +- Enable groupby std and variance aggregation types in libcudf Debug build ([#12799](https://github.com/rapidsai/cudf/pull/12799)) [@davidwendt](https://github.com/davidwendt) +- Expose seed argument to hash_values ([#12795](https://github.com/rapidsai/cudf/pull/12795)) [@ayushdg](https://github.com/ayushdg) +- Fix groupby gtests coded in namespace cudf::test ([#12784](https://github.com/rapidsai/cudf/pull/12784)) [@davidwendt](https://github.com/davidwendt) +- Improve performance for cudf::strings::count_characters for long strings ([#12779](https://github.com/rapidsai/cudf/pull/12779)) [@davidwendt](https://github.com/davidwendt) +- Deallocate encoded data in ORC writer immediately after compression ([#12770](https://github.com/rapidsai/cudf/pull/12770)) [@vuule](https://github.com/vuule) +- Stop force pulling fmt in nvbench. ([#12768](https://github.com/rapidsai/cudf/pull/12768)) [@vyasr](https://github.com/vyasr) +- Remove now redundant cuda initialization ([#12758](https://github.com/rapidsai/cudf/pull/12758)) [@vyasr](https://github.com/vyasr) +- Adds JSON reader, writer io benchmark ([#12753](https://github.com/rapidsai/cudf/pull/12753)) [@karthikeyann](https://github.com/karthikeyann) +- Use test paths relative to package directory. ([#12751](https://github.com/rapidsai/cudf/pull/12751)) [@bdice](https://github.com/bdice) +- Add build metrics report as artifact to cpp-build workflow ([#12750](https://github.com/rapidsai/cudf/pull/12750)) [@davidwendt](https://github.com/davidwendt) +- Add JNI methods for detecting and purging non-empty nulls from LIST and STRUCT ([#12742](https://github.com/rapidsai/cudf/pull/12742)) [@razajafri](https://github.com/razajafri) +- Stop using versioneer to manage versions ([#12741](https://github.com/rapidsai/cudf/pull/12741)) [@vyasr](https://github.com/vyasr) +- Reduce error handling verbosity in CI tests scripts ([#12738](https://github.com/rapidsai/cudf/pull/12738)) [@AjayThorve](https://github.com/AjayThorve) +- Reduce the number of test cases in multibyte_split benchmark ([#12737](https://github.com/rapidsai/cudf/pull/12737)) [@PointKernel](https://github.com/PointKernel) +- Update shared workflow branches ([#12733](https://github.com/rapidsai/cudf/pull/12733)) [@ajschmidt8](https://github.com/ajschmidt8) +- JNI switches to nested JSON reader ([#12732](https://github.com/rapidsai/cudf/pull/12732)) [@res-life](https://github.com/res-life) +- Changing `cudf::io::source_info` to use `cudf::host_span<std::byte>` in a non-breaking form ([#12730](https://github.com/rapidsai/cudf/pull/12730)) [@hyperbolic2346](https://github.com/hyperbolic2346) +- Add nvbench environment class for initializing RMM in benchmarks ([#12728](https://github.com/rapidsai/cudf/pull/12728)) [@davidwendt](https://github.com/davidwendt) +- Split C++ and Python build dependencies into separate lists. ([#12724](https://github.com/rapidsai/cudf/pull/12724)) [@bdice](https://github.com/bdice) +- Add build dependencies to Java tests. ([#12723](https://github.com/rapidsai/cudf/pull/12723)) [@bdice](https://github.com/bdice) +- Allow setting the seed argument for hash partition ([#12715](https://github.com/rapidsai/cudf/pull/12715)) [@firestarman](https://github.com/firestarman) +- Remove gpuCI scripts. ([#12712](https://github.com/rapidsai/cudf/pull/12712)) [@bdice](https://github.com/bdice) +- Unpin `dask` and `distributed` for development ([#12710](https://github.com/rapidsai/cudf/pull/12710)) [@galipremsagar](https://github.com/galipremsagar) +- `partition_by_hash()`: use `_split()` ([#12704](https://github.com/rapidsai/cudf/pull/12704)) [@madsbk](https://github.com/madsbk) +- Remove DataFrame.quantiles from docs. ([#12684](https://github.com/rapidsai/cudf/pull/12684)) [@bdice](https://github.com/bdice) +- Fast path for `experimental::row::equality` ([#12676](https://github.com/rapidsai/cudf/pull/12676)) [@divyegala](https://github.com/divyegala) +- Move date to build string in `conda` recipe ([#12661](https://github.com/rapidsai/cudf/pull/12661)) [@ajschmidt8](https://github.com/ajschmidt8) +- Refactor reduction logic for fixed-point types ([#12652](https://github.com/rapidsai/cudf/pull/12652)) [@davidwendt](https://github.com/davidwendt) +- Pay off some JNI RMM API tech debt ([#12632](https://github.com/rapidsai/cudf/pull/12632)) [@revans2](https://github.com/revans2) +- Merge `copy-on-write` feature branch into `branch-23.04` ([#12619](https://github.com/rapidsai/cudf/pull/12619)) [@galipremsagar](https://github.com/galipremsagar) +- Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings ([#12609](https://github.com/rapidsai/cudf/pull/12609)) [@davidwendt](https://github.com/davidwendt) +- Pin cuda-nvrtc. ([#12606](https://github.com/rapidsai/cudf/pull/12606)) [@bdice](https://github.com/bdice) +- Remove cudf::test::print calls from libcudf gtests ([#12604](https://github.com/rapidsai/cudf/pull/12604)) [@davidwendt](https://github.com/davidwendt) +- Init JNI version 23.04.0-SNAPSHOT ([#12599](https://github.com/rapidsai/cudf/pull/12599)) [@pxLi](https://github.com/pxLi) +- Add performance benchmarks to user facing docs ([#12595](https://github.com/rapidsai/cudf/pull/12595)) [@galipremsagar](https://github.com/galipremsagar) +- Add docs build job ([#12592](https://github.com/rapidsai/cudf/pull/12592)) [@AyodeAwe](https://github.com/AyodeAwe) +- Replace message parsing with throwing more specific exceptions ([#12426](https://github.com/rapidsai/cudf/pull/12426)) [@vyasr](https://github.com/vyasr) +- Support conversion to/from cudf in dask.dataframe.core.to_backend ([#12380](https://github.com/rapidsai/cudf/pull/12380)) [@rjzamora](https://github.com/rjzamora) + # cuDF 23.02.00 (9 Feb 2023) ## 🚨 Breaking Changes