[NIGHTLY] v25.02.00
Pre-release
Pre-release
rapids-bot
released this
21 Nov 23:18
·
217 commits
to branch-25.02
since this release
π Links
π¨ Breaking Changes
- Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
- Rework minhash APIs for deprecation cycle (#17421) @davidwendt
- Change indices for dictionary column to signed integer type (#17390) @davidwendt
π Bug Fixes
- Fix: DataFrameGroupBy.get_group was raising with length>1 tuples (#17653) @MarcoGorelli
- Fix possible int overflow in compute_mixed_join_output_size (#17633) @davidwendt
- Fix a minor potential i32 overflow in
thrust::transform_exclusive_scan
in PQ reader preprocessing (#17617) @mhaseeb123 - Fix failing xgboost test in the cudf.pandas third-party integration tests (#17616) @Matt711
- Fix
dask_cudf.read_csv
(#17612) @rjzamora - Fix memcheck error in ReplaceTest.NormalizeNansAndZerosMutable gtest (#17610) @davidwendt
- Correctly accept a
pandas.CategoricalDtype(pandas.IntervalDtype(...), ...)
type (#17604) @mroeschke - Add ability to modify and propagate
names
ofcolumns
object (#17597) @galipremsagar - Ignore NaN correctly in .quantile (#17593) @mroeschke
- Fix groupby argmin/max gather of sorted-order indices (#17591) @davidwendt
- Fix ctest fail running libcudf tests in a Debug build (#17576) @davidwendt
- Specify a version for rapids_logger dependency (#17573) @jlowe
- [JNI] remove rmm argument to set rw access for fabric handles (#17553) @abellina
- Document undefined behavior in div_rounding_up_safe (#17542) @davidwendt
- Fix nvcc-imposed UB in
constexpr
functions (#17534) @vuule - Add anonymous namespace to libcudf test source (#17529) @davidwendt
- Propagate failures in pandas integration tests and Skip failing tests (#17521) @Matt711
- Fix libcudf compile error when logging is disabled (#17512) @davidwendt
- Fix Dask-cuDF
clip
APIs (#17509) @rjzamora - Fix pylibcudf to_arrow with multiple nested data types (#17504) @mroeschke
- Fix groupby(as_index=False).size not reseting index (#17499) @mroeschke
- Revert "Temporarily skip tests due to dask/distributed#8953" (#17492) @Matt711
- Workaround for a misaligned access in
read_csv
on some CUDA versions (#17477) @vuule - Fix some possible thread-id overflow calculations (#17473) @davidwendt
- Temporarily skip tests due to dask/distributed#8953 (#17472) @wence-
- Detect mismatches in begin and end tokens returned by JSON tokenizer FST (#17471) @shrshi
- Support dask>=2024.11.2 in Dask cuDF (#17439) @rjzamora
- Fix write_json failure for zero columns in table/struct (#17414) @karthikeyann
- Fix Debug-mode failing Arrow test (#17405) @zeroshade
- Fix all null list column with missing child column in JSON reader (#17348) @karthikeyann
π Documentation
- Document interpreter install command for cudf.pandas (#17358) @bdice
- add comment to Series.tolist method (#17350) @tequilayu
π New Features
- Add JSON reader options structs to pylibcudf (#17614) @Matt711
- Add JSON Writer options classes to pylibcudf (#17606) @Matt711
- Add ORC reader options structs to pylibcudf (#17601) @Matt711
- Add Avro Reader options classes to pylibcudf (#17599) @Matt711
- Implement
HOST_UDF
aggregation for groupby (#17592) @ttnghia - Plumb pylibcudf.io.parquet options classes through cudf python (#17506) @Matt711
- Add partition-wise
Select
support to cuDF-Polars (#17495) @rjzamora - Add multi-partition
Scan
support to cuDF-Polars (#17494) @rjzamora - Migrate
cudf::io::merge_row_group_metadata
to pylibcudf (#17491) @Matt711 - Add Parquet Reader options classes to pylibcudf (#17464) @Matt711
- Add multi-partition
DataFrameScan
support to cuDF-Polars (#17441) @rjzamora - Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
- Abstract polars function expression nodes to ensure they are serializable (#17418) @pentschev
- Add CSV Reader options classes to pylibcudf (#17412) @Matt711
- Add support for
pylibcudf.DataType
serialization (#17352) @pentschev - Enable rounding for Decimal32 and Decimal64 in cuDF (#17332) @a-hirota
- Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#17326) @bdice
- Expose stream-ordering to groupby APIs (#17324) @shrshi
- Migrate ORC Writer to pylibcudf (#17310) @Matt711
π οΈ Improvements
- Support compression= in DataFrame.to_json (#17634) @mroeschke
- Use PyNVML 12 (#17627) @jakirkham
- Remove cudf._lib.utils in favor of python APIs (#17625) @mroeschke
- Fix return types for MurmurHash3_x86_32 template specializations (#17622) @davidwendt
- Clean up namespaces and improve compression-related headers (#17621) @vuule
- Use more pylibcudf.types instead of cudf._lib.types (#17619) @mroeschke
- Remove patch that is only needed for clang-tidy to run on test files (#17618) @vyasr
- update telemetry actions to fluent-bit friendly style (#17615) @msarahan
- Bump the oldest
pyarrow
version to14.0.2
in test matrix (#17611) @galipremsagar - Use
[[nodiscard]]
attribute before__device__
(#17608) @vuule - Use
host_vector
inflatten_single_pass_aggs
(#17605) @vuule - Stop memory_resource.hpp from including itself (#17603) @vyasr
- Replace the outdated cuco window concept with buckets (#17602) @PointKernel
- Check if nightlies have succeeded recently enough (#17596) @vyasr
- A couple of fixes in rapids-logger usage (#17588) @vyasr
- Simplify expression transformer in Parquet predicate pushdown with
ast::tree
(#17587) @mhaseeb123 - Remove unused functionality in cudf._lib.utils.pyx (#17586) @mroeschke
- Use cuda-python
cuda.bindings
import names. (#17585) @bdice - Use no-sync copy for fixed-width types in cudf::concatenate (#17584) @davidwendt
- Remove cudf._lib.groupby in favor of inlining pylibcudf (#17582) @mroeschke
- Remove unused code of json schema in JSON reader (#17581) @karthikeyann
- Expose Scalar's constructor and
Scalar#getScalarHandle()
to public (#17580) @ttnghia - Allow large strings in nvtext benchmarks (#17579) @davidwendt
- Remove cudf._lib.reduce in favor of inlining pylibcudf (#17574) @mroeschke
- Use batched memcpy when writing ORC statistics (#17572) @vuule
- Allow large strings in nvbench strings benchmarks (#17571) @davidwendt
- Update version references in workflow (#17568) @AyodeAwe
- Enable all json reader options in pylibcudf read_json (#17563) @karthikeyann
- Remove cudf._lib.parquet in favor of inlining pylibcudf (#17562) @mroeschke
- Fix CMake format in cudf/_lib/CMakeLists.txt (#17559) @mroeschke
- Replace direct
cudaMemcpyAsync
calls with utility functions (within/include
) (#17557) @vuule - Remove cudf._lib.interop in favor of inlining pylibcudf (#17555) @mroeschke
- gate telemetry dispatch calls on TELEMETRY_ENABLED env var (#17551) @msarahan
- Replace direct
cudaMemcpyAsync
calls with utility functions (within/src
) (#17550) @vuule - Remove unused
BufferArrayFromVector
(#17549) @Matt711 - Move cudf._lib.copying to cudf.core._internals (#17548) @mroeschke
- Update cuda-python lower bounds to 12.6.2 / 11.8.5 (#17547) @bdice
- Fix typos, rename types, and add null_probability benchmark axis for distinct (#17546) @PointKernel
- Mark more constexpr functions as device-available (#17545) @vyasr
- Use cooperative-groups instead of cub warp-reduce for strings contains (#17540) @davidwendt
- Remove cudf._lib.nvtext in favor of inlining pylibcudf (#17535) @mroeschke
- Remove unused masked keyword in column_empty (#17530) @mroeschke
- Remove Thrust patch in favor of CMake definition for Thrust 32-bit offset types. (#17527) @bdice
- [JNI] Enables fabric handles for CUDA async memory pools (#17526) @abellina
- Force Thrust to use 32-bit offset type. (#17523) @bdice
- Replace cudf::detail::copy_if logic with thrust::copy_if and gather (#17520) @davidwendt
- Replaces uses of
cudf._lib.Column.from_unique_ptr
withpylibcudf.Column.from_libcudf
(#17517) @Matt711 - Move cudf._lib.aggregation to cudf.core._internals (#17516) @mroeschke
- Migrate copy_column and Column.from_scalar to pylibcudf (#17513) @Matt711
- Remove cudf._lib.transform in favor of inlining pylibcudf (#17505) @mroeschke
- Remove cudf._lib.string.convert/split in favor of inlining pylibcudf (#17496) @mroeschke
- Move cudf._lib.sort to cudf.core._internals (#17488) @mroeschke
- Remove cudf._lib.csv in favor in inlining pylibcudf (#17485) @mroeschke
- Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. (#17475) @bdice
- Remove cudf._lib.binops in favor of inlining pylibcudf (#17468) @mroeschke
- Remove cudf._lib.orc in favor of inlining pylibcudf (#17466) @mroeschke
- skip most CI on devcontainer-only changes (#17465) @jameslamb
- Set build type for all examples (#17463) @vyasr
- Update the hook versions in pre-commit (#17462) @wence-
- Remove cudf._lib.string_casting in favor of inlining pylibcudf (#17460) @mroeschke
- Remove cudf._lib.filling in favor of inlining pylibcudf (#17459) @mroeschke
- Update MurmurHash3_x64_128 to use the cuco equivalent implementation (#17457) @PointKernel
- Move cudf._lib.stream_compaction to cudf.core._internals (#17456) @mroeschke
- Clean up xxhash_64 implementations (#17455) @PointKernel
- Update Hadoop dependency in Java pom (#17454) @jlowe
- Adapt to rmm logger changes (#17451) @vyasr
- Require approval to run CI on draft PRs (#17450) @bdice
- Expose stream-ordering in nvtext API (#17446) @shrshi
- Use exec_policy_nosync in write_json (#17445) @karthikeyann
- Remove cudf._lib.json in favor of inlining pylibcudf (#17443) @mroeschke
- Remove cudf._lib.null_mask in favor of inlining pylibcudf (#17440) @mroeschke
- Expose stream-ordering in replace API (#17436) @shrshi
- Apply clang-tidy autofixes from new rules (#17431) @vyasr
- Remove cudf._lib.round in favor of inlining pylibcudf (#17430) @mroeschke
- Update MurmurHash3_x86_32 to use the cuco equivalent implementation (#17429) @PointKernel
- Remove cudf._lib.replace in favor of inlining pylibcudf (#17428) @mroeschke
- Remove nvtx/ranges.hpp include from cuda.cuh (#17427) @davidwendt
- Remove the unused detail
int_fastdiv.h
header (#17426) @PointKernel - Remove cudf._lib.lists in favor of inlining pylibcudf (#17425) @mroeschke
- Remove cudf._lib.quantile (#17424) @mroeschke
- Remove cudf._lib.rolling in favor of inlining pylibcudf (#17423) @mroeschke
- Rework minhash APIs for deprecation cycle (#17421) @davidwendt
- Use thread_index_type in binary-ops jit kernel.cu (#17420) @davidwendt
- Change binops for-each kernel to thrust::for_each_n (#17419) @davidwendt
- Move cudf._lib.search to cudf.core._internals (#17411) @mroeschke
- Use grid_1d utilities in copy_range.cuh (#17409) @davidwendt
- Remove cudf._lib.text in favor of inlining pylibcudf (#17408) @mroeschke
- Run clang-tidy checks in PR CI (#17407) @bdice
- Update strings/text source to use grid_1d for thread/block/stride calculations (#17404) @davidwendt
- Expose stream-ordering to strings attribute APIs (#17398) @shrshi
- Expose stream-ordering to interop APIs (#17397) @shrshi
- Remove unused type aliases (#17396) @PointKernel
- Remove some cudf._lib.strings files in favor of inlining pylibcudf (#17394) @mroeschke
- Update xxhash_64 to utilize the cuco equivalent implementation (#17393) @PointKernel
- Change indices for dictionary column to signed integer type (#17390) @davidwendt
- Return categorical values in to_numpy/to_cupy (#17388) @mroeschke
- Forward-merge branch-24.12 to branch-25.02 (#17379) @bdice
- Remove unused IO utilities from cudf python (#17374) @Matt711
- Remove cudf._lib.datetime in favor of inlining pylibcudf (#17372) @mroeschke
- Remove cudf._lib.join in favor of inlining pylibcudf (#17371) @mroeschke
- Remove cudf._lib.merge in favor of inlining pylibcudf (#17370) @mroeschke
- Remove cudf._lib.partitioning in favor of inlining pylibcudf (#17369) @mroeschke
- Remove cudf._lib.reshape in favor of inlining pylibcudf (#17368) @mroeschke
- Remove cudf._lib.timezone in favor of inlining pylibcudf (#17366) @mroeschke
- Remove cudf._lib.transpose in favor of inlining pylibcudf (#17365) @mroeschke
- Move make_strings_column benchmark to nvbench (#17340) @davidwendt
- Improve strings contains/find performance for smaller strings (#17330) @davidwendt
- Use rapids-logger to generate the cudf logger (#17307) @vyasr
- Add write_parquet to pylibcudf (#17263) @mroeschke
- Single-partition Dask executor for cuDF-Polars (#17262) @rjzamora
- Add breaking change workflow trigger (#17248) @AyodeAwe
- Update to CCCL 2.7.0-rc2. (#17233) @bdice
- Make
column_empty
mask buffer creation consistent with libcudf (#16715) @mroeschke