Skip to content

v22.02.00

Compare
Choose a tag to compare
@GPUtester GPUtester released this 02 Feb 16:43
· 5331 commits to main since this release

🚨 Breaking Changes

  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • decimal128 Support for to/from_arrow (#9986) @codereport
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
  • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
  • Break tie for top categorical columns in Series.describe (#9867) @isVoid
  • Add partitioning support in parquet writer (#9810) @devavret
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
  • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Add decimal128 support to Parquet reader and writer (#9765) @vuule
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
  • Add parameters to control row group size in Parquet writer (#9677) @vuule
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Add support for decimal128 in cudf python (#9533) @galipremsagar
  • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
  • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

  • Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
  • Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
  • ORC writer API changes for granular statistics (#10058) @mythrocks
  • Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
  • Unpin dask and distributed in CI (#10028) @galipremsagar
  • Add _from_column_like_self factory (#10022) @isVoid
  • Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
  • Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
  • Clean up CUDA stream use in cuIO (#9991) @vuule
  • Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
  • Add strings tests to transpose_test.cpp (#9985) @davidwendt
  • Use gpuci_mamba_retry on Java CI. (#9983) @bdice
  • Remove deprecated method one_hot_encoding (#9977) @isVoid
  • Minor cleanup of unused Python functions (#9974) @vyasr
  • Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
  • Remove str.subword_tokenize (#9968) @VibhuJawa
  • Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
  • Remove deprecated method parameter from merge and join. (#9944) @bdice
  • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
  • Remove deprecated method Series.hash_encode. (#9942) @bdice
  • use ninja in java ci build (#9933) @rongou
  • Add build-time publish step to cpu build script (#9927) @davidwendt
  • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
  • Remove various unused functions (#9922) @vyasr
  • Raise in query if dtype is not supported (#9921) @brandon-b-miller
  • Add missing imports tests (#9920) @Ethyling
  • Spark Decimal128 hashing (#9919) @rwlee
  • Replace thrust/std::get with structured bindings (#9915) @codereport
  • Upgrade thrust version to 1.15 (#9912) @robertmaynard
  • Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
  • Return count of set bits from inplace_bitmask_and. (#9904) @bdice
  • Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
  • Update ucx-py version on release using rvc (#9897) @Ethyling
  • Remove IncludeCategories from .clang-format (#9876) @codereport
  • Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
  • Add clang-tidy to libcudf (#9860) @codereport
  • Remove deprecated methods from Java Table class (#9853) @jlowe
  • Add test for map column metadata handling in ORC writer (#9852) @vuule
  • Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
  • add templated benchmark with fixture (#9838) @karthikeyann
  • Use list of column inputs for apply_boolean_mask (#9832) @isVoid
  • Added a few more tests for Decimal to String cast (#9818) @razajafri
  • Run doctests. (#9815) @bdice
  • Avoid overflow for fixed_point round (#9809) @sperlingxx
  • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
  • Use vector factories for host-device copies. (#9806) @bdice
  • Refactor host device macros (#9797) @vyasr
  • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
  • Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
  • Improve build time of libcudf iterator tests (#9788) @davidwendt
  • Copy Java native dependencies directly into classpath (#9787) @jlowe
  • Add decimal types to cuIO benchmarks (#9776) @vuule
  • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
  • Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
  • Use CTAD with Thrust function objects (#9768) @codereport
  • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
  • Use Java classloader to find test resources (#9760) @jlowe
  • Allow cast decimal128 to string and add tests (#9756) @razajafri
  • Load balance optimization for contiguous_split (#9755) @nvdbaranec
  • Consolidate and improve reset_index (#9750) @isVoid
  • Update to UCX-Py 0.24 (#9748) @pentschev
  • Skip cufile tests in JNI build script (#9744) @pxLi
  • Enable string to decimal 128 cast (#9742) @razajafri
  • Use stop instead of stop_. (#9735) @bdice
  • Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
  • Improve cmake format script (#9723) @vyasr
  • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
  • Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
  • Use stream allocator adaptor for hash join table (#9704) @PointKernel
  • Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
  • Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
  • Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
  • Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
  • Add utility to format ninja-log build times (#9631) @davidwendt
  • Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
  • Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
  • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
  • Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
  • Simplify merge internals and reduce overhead (#9516) @vyasr
  • Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
  • Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris