You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes: #7367, #7446
This PR upgrades pandas to `1.2.2` in `cudf`. Changes include:
- [x] Bumping up `pandas` version.
- [x] Fixing `isin` behavior which now takes in types into accout: pandas-dev/pandas#38781
- [x] `CategoricalColumn.__setitem__` will now not allow setting of values that are not in existing categories.
- [x] Introduced `cudf.core._compat.PANDAS_GE_120` variable to create back-ward compatibility.
- [x] Updated usages of `pd.core.tools.datetimes._guess_datetime_format` to `pd.core.tools.datetimes.guess_datetime_format`
- [x] Introduced `std` & `median` in `DateTimeColumn`.
- [x] Fixed incorrect handling of passing `StringMethods` as an input to methods in string APIs.
- [x] Fixed a typo in calling `is_valid` of `Scalar`.
- [x] Removed unnecessary special handling in `TimeDeltaColumn.sum` logic for empty inputs.
- [x] Introduced passing `dtype='float64'` wherever there is an empty series being created since pandas will soon be defaulting to `object` dtype if no type is passed and we don't have a perfectly resembling `object` dtype as that of pandas.
- [x] Fixed deprecation warnings of `Index.__or__` and `Index.__xor__` by replacing with `union` & `symmetric_difference` APIs.
- [x] Introduced mapping of our `float32` & `float64` dtypes to pandas Nullable dtypes `FLoat32Dtype` & `Float64Dtype` when `nullable=True` in `to_pandas`.
- [x] With introduction of nullable float dtypes, there is an issue in creating `MultiIndex` from dataframe: pandas-dev/pandas#39984, so introduced a workaround in our `MultiIndex.__repr__` code.
- [x] Removed usages of `check_less_precise` in our code-base as this is deprecated and is replaced with `rtol` & `atol`. Retained its usages in our testing APIs for back-ward compatibility.
- [x] Removed good number `xfail` cases which are actually passing right now because of resolved issues in both `pandas` & `cudf`.
- [x] Did some miscellaneous code-cleanup in pytests.
- [x] Fixed pytests that will fail when run in parallel due to access to shared pytest params being manipulated inplace.
- [x] Follow a standard import pattern across pytest files, some files do `from pandas import Series` and some do `from cudf.core import Series`. So removed both patterns and doing only simple `import cudf` & `import pandas as pd` to avoid confusion while debugging test failures across multiple files. (Made this change in all pytest files which I had to touch as part of pandas upgrade, we can make similar changes in future for the files which we touch).
- [x] Fix issue with assigning `np.nan` values to a `CategoricalColumn` and fix related `__repr__` code: #7446
Authors:
- GALI PREM SAGAR (@galipremsagar)
Approvers:
- Keith Kraus (@kkraus14)
- AJ Schmidt (@ajschmidt8)
URL: #7375
Is your feature request related to a problem? Please describe.
cudf
is currently pinned topandas<1.2.0a0
, we would want to use some new features introduced in1.2
. For example:use_nullable_dtypes
param inread_parquet
: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.htmlDescribe the solution you'd like
Test current cudf code-base and make versioned fixes to breakages.
The text was updated successfully, but these errors were encountered: