Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Pin dask & distributed for release #916

Merged
merged 3 commits into from
May 26, 2022

Conversation

galipremsagar
Copy link
Contributor

Pinnings to be in-line with: rapidsai/cudf#10965

@galipremsagar galipremsagar requested a review from a team as a code owner May 25, 2022 13:38
@github-actions github-actions bot added the gpuCI gpuCI issue label May 25, 2022
@galipremsagar galipremsagar self-assigned this May 25, 2022
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 25, 2022
@pentschev
Copy link
Member

The failures seem to arise from inconsistent versions being picked from conda packages:

15:49:34 dask_cuda/tests/test_dask_cuda_worker.py::test_cuda_visible_devices_and_memory_limit_and_nthreads 
15:49:34 --------------------------- Subprocess stdout/stderr---------------------------
15:49:34 Traceback (most recent call last):
15:49:34   File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pkg_resources/__init__.py", line 579, in _build_master
15:49:34     ws.require(__requires__)
15:49:34   File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pkg_resources/__init__.py", line 897, in require
15:49:34     needed = self.resolve(parse_requirements(requirements))
15:49:34   File "/opt/conda/envs/rapids/lib/python3.8/site-packages/pkg_resources/__init__.py", line 788, in resolve
15:49:34     raise VersionConflict(dist, req).with_context(dependent_req)
15:49:34 pkg_resources.ContextualVersionConflict: (dask 2022.5.0+46.g4c0cdfdbd (/opt/conda/envs/rapids/lib/python3.8/site-packages), Requirement.parse('dask==2022.05.1'), {'distributed', 'dask-cuda'})

@Ethyling @ajschmidt8 @charlesbluca @jakirkham would be good to get your expertise here in what would be a safe way to prevent gpuCI from picking an inconsistent set of dask/dask-core/distributed/etc packages.

@pentschev
Copy link
Member

To be more specific, I think those issues are related to the changes from #897 .

@charlesbluca
Copy link
Member

charlesbluca commented May 25, 2022

I think the issue here is that when INSTALL_DASK_MAIN=0, we need to explicitly install stable Dask packages in the GPU build script; I think this should be resolved by replacing this block:

if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
gpuci_logger "Installing dask and distributed from dask nightly channel"
gpuci_mamba_retry install -c dask/label/dev \
"dask/label/dev::dask" \
"dask/label/dev::distributed"
fi

With something like this (used in cuDF's CI):

https://github.com/rapidsai/cudf/blob/df5dc08710c9cda8aa5f9b2f0b26a863301b5e01/ci/gpu/build.sh#L92-L99

    if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
        gpuci_logger "gpuci_mamba_retry update dask"
        gpuci_mamba_retry update dask
        conda list
    else
        gpuci_logger "gpuci_mamba_retry install conda-forge::dask>=2022.03.0 conda-forge::distributed>=2022.03.0 conda-forge::dask-core>=2022.03.0 --force-reinstall"
        gpuci_mamba_retry install conda-forge::dask>=2022.03.0 conda-forge::distributed>=2022.03.0 conda-forge::dask-core>=2022.03.0 --force-reinstall
    fi

@pentschev
Copy link
Member

Could we instead just update the Dask versions in https://github.com/rapidsai/integration/blob/branch-22.06/conda/recipes/versions.yaml and then eliminate the else condition from your suggestion @charlesbluca , or is there a reason why we must force installing a specific version?

@galipremsagar
Copy link
Contributor Author

Could we instead just update the Dask versions in https://github.com/rapidsai/integration/blob/branch-22.06/conda/recipes/versions.yaml and then eliminate the else condition from your suggestion @charlesbluca , or is there a reason why we must force installing a specific version?

We could try waiting for new images & packages to be published after rapidsai/integration#476 is merged. If that works we wouldn't need else condition.

ci/gpu/build.sh Show resolved Hide resolved
ci/gpu/build.sh Outdated Show resolved Hide resolved
Co-authored-by: Peter Andreas Entschev <[email protected]>
@pentschev
Copy link
Member

We could try waiting for new images & packages to be published after rapidsai/integration#476 is merged. If that works we wouldn't need else condition.

Yes, that would be good. But perhaps we can get this merged right away and then change it afterwards. Also added a few suggestions.

@charlesbluca
Copy link
Member

charlesbluca commented May 25, 2022

Hmm it was my understanding that even after updating the version pinning in the integration axis file and removing the Dask channel from our conda config, we weren't guaranteed to have stable packages, and would want a step to explicitly ensure they are installed?

At least that's what I thought the conclusion of rapidsai/cudf#10182 was

@pentschev
Copy link
Member

removing the Dask channel from our conda config, we weren't guaranteed to have stable packages, and would want a step to explicitly ensure they are installed?

How so? If that's the case, it then sounds to me that it's not guaranteed that pinning works for the release process which would be a major issue IMO.

My hope was always that the integration repo versions MUST be respected, if not that feels like an enormous flaw to me.

@Ethyling @ajschmidt8 could you comment on this?

@codecov-commenter
Copy link

codecov-commenter commented May 25, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.06@63529e8). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head c477e97 differs from pull request most recent head d973e8c. Consider uploading reports for the commit d973e8c to get more accurate results

@@              Coverage Diff               @@
##             branch-22.06    #916   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?      22           
  Lines                   ?    3075           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?    3075           
  Partials                ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63529e8...d973e8c. Read the comment docs.

@charlesbluca
Copy link
Member

charlesbluca commented May 25, 2022

Think this is more of a consequence of the fact that the Dask channel is placed at a higher priority than conda-forge during gpuCI image builds?

https://github.com/rapidsai/gpuci-build-environment/blob/9c199e81bdd6bcf157be79976be10a06afd8d305/context/.condarc

In that case, installing dask=2022.5.1 would still end up preferring the nightly packages over the stable release (at least this seems to be my behavior trying this out locally).

We could potentially remove the else blocks throughout the build scripts by centralizing the solution to the gpuCI image builds, either by:

  • explicitly or conditionally removing the dask/label/dev channel around code freeze
  • potentially changing the channel priority policy, as I think that in general 2022.5.1 > 2022.5.1aXXXXXX and if conda was allowed to prefer lower priority channels it would probably end up going for the stable package over nightlies

EDIT:

Alternatively, since it seems like we would always want to run mamba update -c dask/label/dev dask during a run where INSTALL_DASK_MAIN=1, maybe it would make sense to remove dask/label/dev from the gpuCI image channels altogether, so that images always start off with stable Dask packages, with the responsibility of installing nightlies being placed on the build scripts?

@galipremsagar
Copy link
Contributor Author

galipremsagar commented May 25, 2022

explicitly or conditionally removing the dask/label/dev channel around code freeze

We do this for release builds I suppose.

The force-reinstall in else block(coming from cudf) is because we have an image/package that already has pacakges from dask/label/dev channel and those get higher priority for the same version 2022.5.1, hence we do a force-reinstall in else to get conda-forge packages installed. - This is what I was able to summarise from all the experimentation I did in rapidsai/cudf#10182.

@pentschev
Copy link
Member

The force-reinstall in else block(coming from cudf) is because we have an image/package that already has pacakges from dask/label/dev channel and those get higher priority for the same version 2022.5.1, hence we do a force-reinstall in else to get conda-forge packages installed. - This is what I was able to summarise from all the experimentation I did in rapidsai/cudf#10182.

Maybe I'm misunderstanding something, but this still seems problematic with the release process. When we release, won't a non-stable Dask version be picked then? Additionally, if anyone has dask/label/dev channel when installing RAPIDS, they can pick a 2022.05.1XXXX version that will be breaking for them.

@jakirkham
Copy link
Member

Maybe we should follow up on this discussion after the release?

@jakirkham
Copy link
Member

rerun tests

@galipremsagar
Copy link
Contributor Author

galipremsagar commented May 25, 2022

When we release, won't a non-stable Dask version be picked then?

Nope, at least in cudf & docker image creations for release build we drop those channels:

  1. https://github.com/rapidsai/cudf/blob/4e66281f48c55735edb4b610e0f859ee2de32a75/ci/cpu/build.sh#L51
  2. https://github.com/rapidsai/docker/blob/fea6a2fbe3c858ead7c0174dab231cec1be30cf1/generated-dockerfiles/rapidsai-core_ubuntu20.04-base.amd64.Dockerfile#L22

Additionally, if anyone has dask/label/dev channel when installing RAPIDS, they can pick a 2022.05.1XXXX version that will be breaking for them.

Yea, that is possible. I don't think we have a mechanism to version pin based on channels.

@pentschev
Copy link
Member

Nope, at least in cudf & docker image creations for release build we drop those channels:

Got it. It seems a bit different in Dask-CUDA, where Dask and Distributed main are installed from pip:

pip install git+https://github.com/dask/dask.git@main
pip install git+https://github.com/dask/distributed.git@main

I don't mean to hold off on this PR, but I want to make sure that we're not making any mistakes for the release process and that nightly packages after this PR is merged -- and ultimately the release packages -- indeed pin to the correct Dask version. If there's a mistake with that, it can be critical in the future.

If there are reasons to believe that I'm overcomplicating things, please let me know, but I would like to confirm who/what/where is the actual pinning coming from/happening.

Comment on lines -1 to +2
dask>=2022.03.0
distributed>=2022.03.0
dask==2022.05.1
distributed==2022.05.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will ensure dask & distributed are pinned correctly in the Conda packages. In PR ( #893 ), we ensure these are pulled into the Conda packages

@pentschev
Copy link
Member

Thanks @galipremsagar for working on this, and everyone for clarifications. It seems like release process is correct, I will keep an eye on this just to be safe, but I think we're good here.

@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 019ea90 into rapidsai:branch-22.06 May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuCI gpuCI issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants