Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable testing cudf.pandas unit tests for all minor versions of pandas #16595

Merged
merged 43 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9da733f
initial commit
galipremsagar Aug 19, 2024
bb8bc13
test
galipremsagar Aug 19, 2024
1d0dae3
test
galipremsagar Aug 19, 2024
3909c4d
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 19, 2024
f88780f
test
galipremsagar Aug 19, 2024
3461362
Merge branch 'pandas_minor_ver_ci' of https://github.com/galipremsaga…
galipremsagar Aug 19, 2024
c3e7ee1
add py file
galipremsagar Aug 19, 2024
2b361e0
disable
galipremsagar Aug 19, 2024
e5ef72a
pass pandas version
galipremsagar Aug 19, 2024
9773185
add pandas version
galipremsagar Aug 19, 2024
6f2b962
update
galipremsagar Aug 19, 2024
84342cb
test
galipremsagar Aug 19, 2024
704f75d
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 19, 2024
4e9706f
style
galipremsagar Aug 19, 2024
2aff0b4
Merge branch 'pandas_minor_ver_ci' of https://github.com/galipremsaga…
galipremsagar Aug 19, 2024
77b546a
Update .github/workflows/pr.yaml
galipremsagar Aug 19, 2024
ba26ab2
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 19, 2024
74528f9
Fix tests
galipremsagar Aug 20, 2024
9c8c0c3
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 20, 2024
95fde99
use filter
galipremsagar Aug 20, 2024
d515f62
Merge branch 'pandas_minor_ver_ci' of https://github.com/galipremsaga…
galipremsagar Aug 20, 2024
b6cbc5f
Address reviews
galipremsagar Aug 21, 2024
c5585f9
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 21, 2024
191b7a2
fix yaml
galipremsagar Aug 21, 2024
f2d953f
Merge branch 'pandas_minor_ver_ci' of https://github.com/galipremsaga…
galipremsagar Aug 21, 2024
16c5a8b
install packaging'
galipremsagar Aug 21, 2024
efce401
Apply suggestions from code review
galipremsagar Aug 21, 2024
b69c105
Update pr.yaml
galipremsagar Aug 21, 2024
877392b
Update python/cudf/cudf_pandas_tests/test_profiler.py
galipremsagar Aug 21, 2024
723e484
Update test_cudf_pandas.py
galipremsagar Aug 21, 2024
c3a985b
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 21, 2024
98a97f6
improve
galipremsagar Aug 21, 2024
782c938
update
galipremsagar Aug 22, 2024
f400708
update
galipremsagar Aug 22, 2024
b172637
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 22, 2024
6165bf9
update path
galipremsagar Aug 22, 2024
a0161f7
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 22, 2024
810fcca
fix path
galipremsagar Aug 22, 2024
062494b
Merge branch 'pandas_minor_ver_ci' of https://github.com/galipremsaga…
galipremsagar Aug 22, 2024
5fcf7c5
Apply suggestions from code review
galipremsagar Aug 22, 2024
f1b6840
Update ci/cudf_pandas_scripts/fetch_pandas_versions.py
galipremsagar Aug 23, 2024
8c58b3f
simplify
galipremsagar Aug 23, 2024
47de3f2
Merge branch 'branch-24.10' into pandas_minor_ver_ci
galipremsagar Aug 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ jobs:
- wheel-tests-dask-cudf
- devcontainer
- unit-tests-cudf-pandas
- fetch-pandas-versions
- cross-pandas-version-tests
- pandas-tests
- pandas-tests-diff
secrets: inherit
Expand All @@ -41,6 +43,40 @@ jobs:
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
enable_check_generated_files: false
fetch-pandas-versions:
runs-on: ubuntu-latest
outputs:
pandas-versions: ${{ steps.get-versions.outputs.versions }}
steps:
- name: Checkout code
uses: actions/checkout@v2
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

- name: Set up Python
uses: actions/setup-python@v2
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
with:
python-version: '3.11'

- name: Install requests
run: pip install requests
- name: Install packaging
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
run: pip install packaging
- name: Fetch pandas versions
id: get-versions
run: |
versions=$(python ci/cudf_pandas_scripts/fetch_pandas_versions.py ">=2.0,<2.2.3dev0")
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
echo "::set-output name=versions::${versions}"
cross-pandas-version-tests:
needs: [fetch-pandas-versions, wheel-build-cudf]
secrets: inherit
strategy:
matrix:
pandas-version: ${{ fromJson(needs.fetch-pandas-versions.outputs.pandas-versions) }}
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
# This selects "ARCH=amd64 + the latest supported Python + CUDA".
matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
build_type: pull-request
script: ci/cudf_pandas_scripts/run_tests.sh ${{ matrix.pandas-version }}
conda-cpp-build:
needs: checks
secrets: inherit
Expand Down
26 changes: 26 additions & 0 deletions ci/cudf_pandas_scripts/fetch_pandas_versions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (c) 2024, NVIDIA CORPORATION.

import requests
from packaging.version import Version
from packaging.specifiers import SpecifierSet
import argparse

def get_pandas_versions(pandas_range):
url = "https://pypi.org/pypi/pandas/json"
response = requests.get(url)
data = response.json()
versions = data['releases'].keys()

specifier = SpecifierSet(pandas_range)

minor_versions = list(set([version[:3] for version in versions if Version(version) in specifier]))

return minor_versions

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Filter pandas versions by prefix.")
parser.add_argument("pandas_range", type=str, help="The version prefix to filter by.")
args = parser.parse_args()

versions = get_pandas_versions(args.pandas_range)
print(versions)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
22 changes: 18 additions & 4 deletions ci/cudf_pandas_scripts/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@ mkdir -p "${RAPIDS_TESTS_DIR}" "${RAPIDS_COVERAGE_DIR}"

# Function to display script usage
function display_usage {
echo "Usage: $0 [--no-cudf]"
echo "Usage: $0 [--no-cudf] [pandas-version]"
}

# Default value for the --no-cudf option
no_cudf=false
PANDAS_VERSION=""

# Parse command-line arguments
while [[ $# -gt 0 ]]; do
Expand All @@ -25,9 +26,14 @@ while [[ $# -gt 0 ]]; do
shift
;;
*)
echo "Error: Unknown option $1"
display_usage
exit 1
if [[ -z "$PANDAS_VERSION" ]]; then
PANDAS_VERSION=$1
shift
else
echo "Error: Unknown option $1"
display_usage
exit 1
fi
;;
esac
done
Expand All @@ -47,6 +53,14 @@ else
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)"
fi

# Conditionally install the specified version of pandas
if [ -n "$PANDAS_VERSION" ]; then
echo "Installing pandas version: $PANDAS_VERSION"
python -m pip install pandas==$PANDAS_VERSION
else
echo "No pandas version specified, using existing pandas installation"
fi

python -m pytest -p cudf.pandas \
--cov-config=./python/cudf/.coveragerc \
--cov=cudf \
Expand Down
18 changes: 18 additions & 0 deletions python/cudf/cudf_pandas_tests/test_cudf_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
get_calendar,
)

from cudf.core._compat import PANDAS_CURRENT_SUPPORTED_VERSION, PANDAS_VERSION

# Accelerated pandas has the real pandas and cudf modules as attributes
pd = xpd._fsproxy_slow
cudf = xpd._fsproxy_fast
Expand Down Expand Up @@ -607,6 +609,10 @@ def test_array_function_series_fallback(series):
tm.assert_equal(expect, got)


@pytest.mark.skipif(
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
reason="Fails in older versions of pandas",
)
def test_timedeltaproperties(series):
psr, sr = series
psr, sr = psr.astype("timedelta64[ns]"), sr.astype("timedelta64[ns]")
Expand Down Expand Up @@ -666,6 +672,10 @@ def test_maintain_container_subclasses(multiindex):
assert isinstance(got, xpd.core.indexes.frozen.FrozenList)


@pytest.mark.skipif(
PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
reason="Fails in older versions of pandas due to unsupported boxcar window type",
)
def test_rolling_win_type():
pdf = pd.DataFrame(range(5))
df = xpd.DataFrame(range(5))
Expand Down Expand Up @@ -1281,6 +1291,10 @@ def max_times_two(self):
assert s.max_times_two() == 6


@pytest.mark.skipif(
PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
reason="DatetimeArray.__floordiv__ missing in pandas-2.0.0",
)
def test_floordiv_array_vs_df():
xarray = xpd.Series([1, 2, 3], dtype="datetime64[ns]").array
parray = pd.Series([1, 2, 3], dtype="datetime64[ns]").array
Expand Down Expand Up @@ -1552,6 +1566,10 @@ def test_numpy_cupy_flatiter(series):
assert type(arr.flat._fsproxy_slow) == np.flatiter


@pytest.mark.skipif(
PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
reason="pyarrow_numpy storage type was not supported in pandas-2.0.0",
)
def test_arrow_string_arrays():
cu_s = xpd.Series(["a", "b", "c"])
pd_s = pd.Series(["a", "b", "c"])
Expand Down
8 changes: 8 additions & 0 deletions python/cudf/cudf_pandas_tests/test_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
import os
import subprocess

import pytest

from cudf.pandas import LOADED, Profiler

if not LOADED:
Expand All @@ -13,7 +15,13 @@
import numpy as np
import pandas as pd

from cudf.core._compat import PANDAS_CURRENT_SUPPORTED_VERSION, PANDAS_VERSION


@pytest.mark.skipif(
PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
reason="funciton names change across versions of pandas, so making sure it only runs on latest version of pandas",
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
)
def test_profiler():
np.random.seed(42)
with Profiler() as profiler:
Expand Down
Loading