Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stricter pattern for reading data from other files in conda recipes #72

Closed
5 tasks done
jameslamb opened this issue Jun 12, 2024 · 3 comments
Closed
5 tasks done
Assignees

Comments

@jameslamb
Copy link
Member

jameslamb commented Jun 12, 2024

Description

conda-build supports a function load_file_data() which can be used to load data from a file that is then referenced in other Jinja2 template statements throughout the recipes.

See "Loading data from other files" (conda-build docs).

Across RAPIDS, this is used in a few places to read data out of pyproject.toml files and re-use it in conda recipes. For example, in dask-cuda:

{% set data = load_file_data("pyproject.toml") %}

build:
  ...
  entry_points:
    {% for e in data.get("project", {}).get("scripts", {}).items() %}
    - {{ e|join(" = ") }}
    {% endfor %}

(dask-cuda/conda/recipes/dask-cuda/meta.yaml)

That function returns a Python dictionary. Because of the use of .get() in those statements, it's possible for data that should be present to be silently ignored.

Those should be switched to a stricter access pattern that'd result in a loud build-time error if any expected data is missing, like this:

build:
  ...
  entry_points:
    {% for e in data["project")["scripts"].items() %}
    - {{ e|join(" = ") }}
    {% endfor %}

Benefits of this work

  • reduces the risk of conda packages having silently-broken medata
    • *(e.g., accidentally dropping a required dependency from the package metadata)
  • makes the meta.yaml files slightly easier to read

Acceptance Criteria

  • all RAPIDS conda recipes using conda's load_*() functions access those functions' data with a strict pattern that'll fail if any expected data is missing

Approach

The supported functions are:

  • load_setup_py_data
  • load_file_data
  • load_str_data

Use GitHub search to double-check that the task list attached to this issue captures all uses of them across RAPIDS.

Whenever those are used, switch to a stricter pattern using [] subsetting instead of .get().

Notes

Inspired by my experience in rapidsai/ucx-py#1044, moving dependency data around in pyproject.toml and needing to adjust a corresponding conda recipe.

Tasks

Preview Give feedback
@jameslamb jameslamb self-assigned this Jun 12, 2024
KyleFromNVIDIA pushed a commit to rapidsai/rapids-build-backend that referenced this issue Jun 12, 2024
Contributes to rapidsai/build-planning#72

Proposes using `[]` subsetting instead of `.get()` in templating
statements in the conda recipe that read data out of `pyproject.toml`.
That'll ensure that we get a big loud build error if changes to
`pyproject.toml` remove some sections that the conda recipe expects to
exist.

Also proposes removing the logic for automatically including `run`
dependencies on all optional dependencies except those in the `[test]`
group... this project doesn't have any other optional dependencies.
@jameslamb
Copy link
Member Author

"always use .get()" might be too strict / not possible.

See these discussions:

And this failing PR: rapidsai/dask-cuda#1349

@jameslamb
Copy link
Member Author

There was an interesting case I ran into in dask-cuda, where it seems like this is not supported:

{% for e in data.get("project", {}).get("scripts", {}).items() %}
  - {{ e|join(" = ") }}
{% endfor %}

but this is:

{% for k in data["project"]["scripts"] %}
  - {{ k ~ " = " ~ data["project"]["scripts"][k] }}
{% endfor %}

ref: rapidsai/dask-cuda#1349 (comment)

@jakirkham if you look at that and think "that's unexpected and looks like a bug in conda-build", let me know and I'd be happy to put together a bug report. I'm just not sure if it's a bug or just a gap in my understanding of Jinja2.

rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this issue Jun 12, 2024
Contributes to rapidsai/build-planning#31
Contributes to rapidsai/dependency-file-generator#89
Contributes to rapidsai/build-planning#72

Proposes:

* introducing `rapids-build-backend` as this project's build backend, to reduce the complexity of various CI/build scripts
* making get-data-from-`pyproject.toml` code in conda recipe a bit stricter

## Notes for Reviewers

This reverts some of the workarounds introduced for docs builds in #1041. See this thread for context: #1041 (comment)

~To do that, I set 2 environment variables manually in in the readthedocs console ([link](https://readthedocs.org/dashboard/ucx-py/edit/)):~

* ~`RAPIDS_DISABLE_CUDA=true` = used to prevent needing `nvcc` in the docs-building environment~
* ~`RAPIDS_MATRIX_ENTRY='cuda=12.2'` = used to ensure that `pip install .` tries to install a real package (e.g. `libucx-cu12`)~

~I think this is the first place we're using the `rapids-build-backend` environment variable support.~

see #1044 (comment)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1044
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this issue Jun 13, 2024
Contributes to rapidsai/build-planning#72

Proposes using `[]` subsetting instead of `.get()` in templating statements in the conda recipe that read data out of `pyproject.toml`. That'll ensure that we get a big loud build error if changes to `pyproject.toml` remove some sections that the conda recipe expects to exist.

## Notes for Reviewers

### How I tested this

Rendered the recipe.

```shell
git fetch upstream --tags

RAPIDS_DATE_STRING="2408" \
RAPIDS_PACKAGE_VERSION="24.8.0" \
conda render \
  -c conda-forge \
  -c rapidsai-nightly \
  conda/recipes/dask-cuda
```

<details><summary>It looks correct to me (click for details)</summary>

```text
--------------
Hash contents:
--------------
{}
----------
meta.yaml:
----------
package:
  name: dask-cuda
  version: 24.8.0
source:
  path: /Users/jlamb/repos/dask-cuda
build:
  entry_points:
    - dask-cuda-worker = dask_cuda.cli:worker
    - dask-cuda-config = dask_cuda.cli:config
  number: '10'
  script:
    - /Users/jlamb/miniforge3/conda-bld/dask-cuda_1718216576022/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/bin/python
      -m pip install . -vv
  string: py310_2408_g3a04719_10
requirements:
  host:
    - bzip2 1.0.8 h93a5062_5
    - ca-certificates 2024.6.2 hf0a4a13_0
    - libffi 3.4.2 h3422bc3_5
    - libzlib 1.3.1 hfb2fe0b_1
    - ncurses 6.5 hb89a1cb_0
    - python_abi 3.10 4_cp310
    - tzdata 2024a h0c530f3_0
    - xz 5.2.6 h57fd34a_0
    - yaml 0.2.5 h3422bc3_2
    - libsqlite 3.46.0 hfb93653_0
    - openssl 3.3.1 hfb2fe0b_0
    - readline 8.2 h92ec313_1
    - tk 8.6.13 h5083fa2_1
    - python 3.10.14 h2469fbe_0_cpython
    - attrs 23.2.0 pyh71513ae_0
    - packaging 24.1 pyhd8ed1ab_0
    - pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1
    - pyyaml 6.0.1 py310h2aa6e3c_1
    - rpds-py 0.18.1 py310h947b723_0
    - setuptools 70.0.0 pyhd8ed1ab_0
    - tomlkit 0.12.5 pyha770c72_0
    - wheel 0.43.0 pyhd8ed1ab_1
    - zipp 3.19.2 pyhd8ed1ab_0
    - importlib_resources 6.4.0 pyhd8ed1ab_0
    - pip 24.0 pyhd8ed1ab_0
    - referencing 0.35.1 pyhd8ed1ab_0
    - jsonschema-specifications 2023.12.1 pyhd8ed1ab_0
    - jsonschema 4.22.0 pyhd8ed1ab_0
    - rapids-dependency-file-generator 1.13.11 py_0
    - rapids-build-backend 0.3.1 py_0
  run:
    - pynvml>=11.0.0,<11.5
    - numpy>=1.23,<2.0a0
    - python_abi 3.10.* *_cp310
    - click >=8.1
    - rapids-dask-dependency==24.8.*,>=0.0.0a0
    - numba>=0.57
    - python >=3.10,<3.11.0a0
    - pandas>=1.3
    - zict>=2.0.0
test:
  commands:
    - dask cuda --help
    - dask-cuda-worker --help
    - dask cuda worker --help
    - dask-cuda-config --help
    - dask cuda config --help
  imports:
    - dask_cuda
about:
  dev_url: https://github.com/rapidsai/dask-cuda
  doc_url: https://docs.rapids.ai/api/dask-cuda/stable/
  home: https://github.com/rapidsai/dask-cuda
  license: Apache 2.0
  license_file:
    - ../../../LICENSE
  summary: Utilities for Dask and CUDA interactions
extra:
  copy_test_source_files: true
  final: true
```

</details>

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1349
bdice pushed a commit to rapidsai/pynvjitlink that referenced this issue Jun 13, 2024
Contributes to rapidsai/build-planning#72.

Proposes replacing a use of `.get()` in the conda recipe with `[]`, to
ensure that we get a loud build error if the `"project"` table is not
present when `conda-build` reads `pyproject.toml`.

## Notes for Reviewers

I know that at some time in the past, the `.get()` was introduced as a
possible solution for errors observed at build time:
#33 (comment)

Pushed this to see if those issues are still present, and they don't
seem to be!
@jameslamb
Copy link
Member Author

This is complete 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant