Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation on wheel file characteristics in CI #110

Closed
19 tasks done
jameslamb opened this issue Oct 22, 2024 · 3 comments
Closed
19 tasks done

Add validation on wheel file characteristics in CI #110

jameslamb opened this issue Oct 22, 2024 · 3 comments
Assignees
Labels
improvement Improves an existing functionality

Comments

@jameslamb
Copy link
Member

jameslamb commented Oct 22, 2024

Description

As we start to publish more RAPIDS wheels directly to https://pypi.org/, it'll be important to validate the following characteristics:

These and any other non-functional characteristics of wheels that can be validated easily, quickly, and programmatically should be validated in CI.

Benefits of this work

  • improves release confidence ... reduces the risk of "oops these wheels are too big for PyPI" surprises near the end of a release
  • reduces the effort required to identify which changes led to increases in wheel sizes

Acceptance Criteria

  • CI for RAPIDS libraries fails on pull requests when wheel builds produce wheels that:
    • do not pass twine check --strict
    • are too large (compressed)

Approach

Such checks should run in wheel-building CI jobs, on PRs... they don't require GPUs, and should run so quickly that it isn't worth the spin-up time and cost of a dedicated wheel-validation CI job.

This could be accomplished with something like the following in wheel-build scripts, after the wheel file is finalized (e.g. has gone through auditwheel repair).

# check README formatting
twine check --strict dist/*

# assert size, and print a summary of contents
pydistcheck \
  --inspect \
  --select 'distro-too-large-compressed' \
  --max-allowed-size-compressed '100M' \
  ./delete-me/*
example of this using 24.12 libcudf-cu12 nightlies (click me)

On an x86_64 linux system:

mkdir -p ./delete-me
pip download \
    --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ \
    --no-deps \
    -d ./delete-me \
    'libcudf-cu12==24.12.*,>=0.0.0a0'

Checking the formatting.

twine check --strict ./delete-me/*
Checking ./delete-me/libcudf_cu12-24.12.0a223-py3-none-manylinux_2_28_x86_64.whl: PASSED

And checking the size (using 5M just to show how this would fail in CI if the wheels got too big):

pydistcheck \
  --inspect \
  --select 'distro-too-large-compressed' \
  --max-allowed-size-compressed '5M' \
  ./delete-me/*
==================== running pydistcheck ====================

checking './delete-me/libcudf_cu12-24.12.0a223-py3-none-manylinux_2_28_x86_64.whl'
----- package inspection summary -----
file size
  * compressed size: 0.4G
  * uncompressed size: 0.7G
  * compression space saving: 35.2%
contents
  * directories: 163
  * files: 1974 (2 compiled)
size by extension
  * .so - 0.6G (97.0%)
  * .h - 6.7M (1.0%)
  * no-extension - 5.3M (0.8%)
  * .cuh - 3.8M (0.6%)
  * .hpp - 2.2M (0.3%)
  * .a - 1.2M (0.2%)
  * .inl - 0.8M (0.1%)
  * .cmake - 0.1M (0.0%)
  * .md - 8.3K (0.0%)
  * .py - 3.6K (0.0%)
  * .pc - 0.2K (0.0%)
  * .txt - 34.0B (0.0%)
largest files
  * (0.6G) libcudf/lib64/libcudf.so
  * (3.8M) libcudf/bin/flatc
  * (1.1M) libcudf/lib64/libflatbuffers.a
  * (0.5M) libcudf/include/libcudf/rapids/libcudacxx/cuda/std/__atomic/functions/cuda_ptx_generated.h
  * (0.2M) libcudf_cu12-24.12.0a223.dist-info/RECORD
------------ check results -----------
1. [distro-too-large-compressed] Compressed size 0.4G is larger than the allowed size (5.0M).
errors found while checking: 1

==================== done running pydistcheck ===============
echo $?
# 1

This should be rolled out to all RAPIDS projects producing wheels, starting with those targeting publishing to https://pypi.org/.

Notes

Disclaimer

pydistcheck is something I (@jameslamb) created and am currently the sole contributor on, so I'm biased about it being the best tool for this purpose.

The narrow goal here, "assert files are not too big", could be accomplished other ways... like unzipping the wheel and using some mix of stat / du / find -size, or with a Python script.

I think the summaries pydistcheck generates (see "Approach" section) and all the others things we could ask it to check in the future (docs link) make it preferable to those approaches, but I shouldn't be the one to decide that.

Task Lists

Updates

Preview Give feedback
@jameslamb jameslamb added the improvement Improves an existing functionality label Oct 22, 2024
@bdice
Copy link
Contributor

bdice commented Oct 22, 2024

I took a brief look at pydistcheck and would support adopting it. It seems like a great tool for this job. It's also good to lean into your expertise in this area. 👍

@vyasr
Copy link
Contributor

vyasr commented Oct 22, 2024

Yup I'm happy with trying out pydistcheck. Maybe we should do a group review of it in a future build-infra team meeting to make sure we're all aligned. Adding a size check to CI is definitely critical so that we know when someone tries to add a feature that suddenly bloats binary size.

@jameslamb jameslamb self-assigned this Nov 1, 2024
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Nov 12, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

## Notes for Reviewers

### How I tested this

Initially set the size threshold for `libcudf` to a value that I knew it'd violate (75MB compressed, when the wheels are 400+ MB compressed).

Saw CI fail as expected, and print a summary with the expected contents.

```text
checking 'final_dist/libcudf_cu11-24.12.0a333-py3-none-manylinux_2_28_aarch64.whl'
----- package inspection summary -----
file size
  * compressed size: 0.4G
  * uncompressed size: 0.6G
  * compression space saving: 34.6%
contents
  * directories: 164
  * files: 1974 (2 compiled)
size by extension
  * .so - 0.6G (97.0%)
  * .h - 6.7M (1.0%)
  * no-extension - 4.8M (0.7%)
  * .cuh - 3.8M (0.6%)
  * .hpp - 2.2M (0.3%)
  * .a - 1.1M (0.2%)
  * .inl - 0.8M (0.1%)
  * .cmake - 0.1M (0.0%)
  * .md - 8.3K (0.0%)
  * .py - 4.0K (0.0%)
  * .pc - 0.2K (0.0%)
  * .txt - 34.0B (0.0%)
largest files
  * (0.6G) libcudf/lib64/libcudf.so
  * (3.3M) libcudf/bin/flatc
  * (1.0M) libcudf/lib64/libflatbuffers.a
  * (0.5M) libcudf/include/libcudf/rapids/libcudacxx/cuda/std/__atomic/functions/cuda_ptx_generated.h
  * (0.2M) libcudf_cu11-24.12.0a333.dist-info/RECORD
------------ check results -----------
1. [distro-too-large-compressed] Compressed size 0.4G is larger than the allowed size (75.0M).
errors found while checking: 1
```

([build link](https://github.com/rapidsai/cudf/actions/runs/11748370606/job/32732391718?pr=17284#step:13:3062))

Updated that threshold in `python/libcudf/pyproject.toml`, and saw the build succeed (but the summary still printed).

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #17284
rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this issue Nov 13, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1092
rapids-bot bot pushed a commit to rapidsai/kvikio that referenced this issue Nov 13, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #548
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this issue Nov 13, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1404
rapids-bot bot pushed a commit to rapidsai/rmm that referenced this issue Nov 13, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1726
jacobtomlinson pushed a commit to rapidsai/jupyterlab-nvdashboard that referenced this issue Nov 13, 2024
## Description

Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we
continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
- *and to prompt discussion on PRs that significantly increase wheel
sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
- *e.g. like how
https://github.com/scikit-learn/scikit-learn/blob/main/README.rst
becomes https://pypi.org/project/scikit-learn/*
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #6136
rapids-bot bot pushed a commit to rapidsai/cucim that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #797
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)

URL: #1482
rapids-bot bot pushed a commit to rapidsai/cuvs that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #464
rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #319
rapids-bot bot pushed a commit to rapidsai/raft that referenced this issue Nov 14, 2024
…on Cython dependency (#2490)

Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Also puts a ceiling on Cython to its latest stable release (`<=3.0.11`), to fix #2490 (comment). Work to relax that is tracked in (#2491).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #2490
rapids-bot bot pushed a commit to rapidsai/cuxfilter that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ajay Thorve (https://github.com/AjayThorve)

URL: #643
rapids-bot bot pushed a commit to rapidsai/cugraph-gnn that referenced this issue Nov 14, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #67
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this issue Nov 15, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

## Notes for Reviewers

This intentionally omits the libraries that are about to be moved to other repos.

* `cugraph-dgl` / `cugraph-pyg` (rapidsai/cugraph-gnn#67)
* `nx-cugraph` (rapidsai/nx-cugraph#19)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #4754
kingcrimsontianyu pushed a commit to kingcrimsontianyu/kvikio that referenced this issue Nov 15, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#548
rapids-bot bot pushed a commit to rapidsai/nx-cugraph that referenced this issue Nov 15, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Erik Welch (https://github.com/eriknw)
  - Bradley Dice (https://github.com/bdice)

URL: #19
@jameslamb
Copy link
Member Author

This is complete. Thanks everyone!

BradReesWork pushed a commit to BradReesWork/cugraph that referenced this issue Nov 18, 2024
Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

This intentionally omits the libraries that are about to be moved to other repos.

* `cugraph-dgl` / `cugraph-pyg` (rapidsai/cugraph-gnn#67)
* `nx-cugraph` (rapidsai/nx-cugraph#19)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#4754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants