Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conda recipe for cudf-polars #17037

Merged
merged 9 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions ci/build_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,10 @@ RAPIDS_PACKAGE_VERSION=$(head -1 ./VERSION) rapids-conda-retry mambabuild \
--channel "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \
conda/recipes/custreamz

RAPIDS_PACKAGE_VERSION=$(head -1 ./VERSION) rapids-conda-retry mambabuild \
--no-test \
--channel "${CPP_CHANNEL}" \
--channel "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \
conda/recipes/cudf-polars

rapids-upload-conda-to-s3 python
18 changes: 16 additions & 2 deletions ci/test_python_other.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ rapids-mamba-retry install \
--channel "${PYTHON_CHANNEL}" \
"dask-cudf=${RAPIDS_VERSION}" \
"cudf_kafka=${RAPIDS_VERSION}" \
"custreamz=${RAPIDS_VERSION}"
"custreamz=${RAPIDS_VERSION}" \
"cudf-polars=${RAPIDS_VERSION}"

rapids-logger "Check GPU usage"
nvidia-smi
Expand All @@ -37,7 +38,7 @@ rapids-logger "pytest dask_cudf (legacy)"
DASK_DATAFRAME__QUERY_PLANNING=False ./ci/run_dask_cudf_pytests.sh \
--junitxml="${RAPIDS_TESTS_DIR}/junit-dask-cudf-legacy.xml" \
--numprocesses=8 \
--dist=loadscope \
--dist=worksteal \
bdice marked this conversation as resolved.
Show resolved Hide resolved
.

rapids-logger "pytest cudf_kafka"
Expand All @@ -54,5 +55,18 @@ rapids-logger "pytest custreamz"
--cov-report=xml:"${RAPIDS_COVERAGE_DIR}/custreamz-coverage.xml" \
--cov-report=term

# Note that cudf-polars uses rmm.mr.CudaAsyncMemoryResource() which allocates
bdice marked this conversation as resolved.
Show resolved Hide resolved
# half the available memory. This doesn't play well with multiple workers, so
# we keep --numprocesses=1 for now.
rapids-logger "pytest cudf-polars"
./ci/run_cudf_polars_pytests.sh \
--junitxml="${RAPIDS_TESTS_DIR}/junit-cudf-polars.xml" \
--numprocesses=1 \
--dist=worksteal \
bdice marked this conversation as resolved.
Show resolved Hide resolved
--cov-config=./.coveragerc \
--cov=cudf_polars \
--cov-report=xml:"${RAPIDS_COVERAGE_DIR}/cudf-polars-coverage.xml" \
--cov-report=term

rapids-logger "Test script exiting with value: $EXITCODE"
exit ${EXITCODE}
4 changes: 4 additions & 0 deletions conda/recipes/cudf-polars/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright (c) 2024, NVIDIA CORPORATION.

# This assumes the script is executed from the root of the repo directory
./build.sh cudf_polars
61 changes: 61 additions & 0 deletions conda/recipes/cudf-polars/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright (c) 2018-2024, NVIDIA CORPORATION.
bdice marked this conversation as resolved.
Show resolved Hide resolved

{% set version = environ['RAPIDS_PACKAGE_VERSION'].lstrip('v') %}
{% set minor_version = version.split('.')[0] + '.' + version.split('.')[1] %}
{% set py_version = environ['CONDA_PY'] %}
{% set cuda_version = '.'.join(environ['RAPIDS_CUDA_VERSION'].split('.')[:2]) %}
{% set cuda_major = cuda_version.split('.')[0] %}
{% set date_string = environ['RAPIDS_DATE_STRING'] %}

package:
name: cudf-polars
version: {{ version }}

source:
path: ../../..

build:
number: {{ GIT_DESCRIBE_NUMBER }}
string: cuda{{ cuda_major }}_py{{ py_version }}_{{ date_string }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}
script_env:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_SESSION_TOKEN
- CMAKE_C_COMPILER_LAUNCHER
- CMAKE_CUDA_COMPILER_LAUNCHER
- CMAKE_CXX_COMPILER_LAUNCHER
- CMAKE_GENERATOR
- PARALLEL_LEVEL
- SCCACHE_BUCKET
- SCCACHE_IDLE_TIMEOUT
- SCCACHE_REGION
- SCCACHE_S3_KEY_PREFIX=cudf-polars-aarch64 # [aarch64]
- SCCACHE_S3_KEY_PREFIX=cudf-polars-linux64 # [linux64]
- SCCACHE_S3_USE_SSL
- SCCACHE_S3_NO_CREDENTIALS

requirements:
host:
- python
- rapids-build-backend >=0.3.0,<0.4.0.dev0
- setuptools
- cuda-version ={{ cuda_version }}
run:
- python
- pylibcudf ={{ version }}
- polars >=1.8,<1.9
- {{ pin_compatible('cuda-version', max_pin='x', min_pin='x') }}

test:
requires:
- cuda-version ={{ cuda_version }}
imports:
- cudf_polars


about:
home: https://rapids.ai/
license: Apache-2.0
license_family: APACHE
license_file: LICENSE
summary: cudf-polars library
3 changes: 3 additions & 0 deletions python/cudf_polars/.coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configuration file for Python coverage tests
[run]
source = cudf_polars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do some configuration of coverage in pyproject.toml. Can this configuration also go there (I suppose in a [tool.coverage.run] section?

Copy link
Contributor Author

@bdice bdice Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not even need a separate section for this. I pushed 1a2be0d to try using pyproject.toml without defining the [tool.coverage.run] source.

2 changes: 1 addition & 1 deletion python/cudf_polars/tests/expressions/test_agg.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def test_bool_agg(agg, request):
assert_gpu_result_equal(q, check_exact=False)


@pytest.mark.parametrize("cum_agg", expr.UnaryFunction._supported_cum_aggs)
@pytest.mark.parametrize("cum_agg", sorted(expr.UnaryFunction._supported_cum_aggs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good spot!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this was necessary to make multi-worker testing operate correctly. The frozenset had a different order on each worker, so pytest failed because the test collection didn't agree across all workers.

def test_cum_agg_reverse_unsupported(cum_agg):
df = pl.LazyFrame({"a": [1, 2, 3]})
expr = getattr(pl.col("a"), cum_agg)(reverse=True)
Expand Down
Loading