Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum Dask/Distributed versions #848

Closed
pentschev opened this issue Feb 1, 2022 · 10 comments
Closed

Minimum Dask/Distributed versions #848

pentschev opened this issue Feb 1, 2022 · 10 comments

Comments

@pentschev
Copy link
Member

pentschev commented Feb 1, 2022

Given we don't really test for old Dask and Distributed versions, I would propose we limit the oldest version (minimum) to 2 or 3 Dask releases, as ensuring older versions work is unrealistic. For example, we currently pin >=2021.11.1, and at the time of writing this, this is the 5th most recent release, with the 5 most recent releases being, from newest to oldest:

  • 2022.01.1
  • 2022.01.0
  • 2021.12.0
  • 2021.11.2
  • 2021.11.1

With my proposal, we would pin >=2021.01.0 (second most recent) or >=2021.12.0 (third most recent). To avoid overburdening ourselves changing pinnings after each new Dask release we could do this only once per RAPIDS release, around code freeze time.

What are people's thoughts on the proposal?

@pentschev
Copy link
Member Author

@quasiben
Copy link
Member

quasiben commented Feb 2, 2022

It would be even better if we could automate it though that may be more work than it's worth. However, I want to bring up an issue with the proposal. Often when trying to debug regressions devs want to easily switch out versions of dask/distributed possibly going further back than 2 versions. Pinning in the conda version would probably be ok but pinning in setup.py or for development environments would present a bit of barrier to quickly swapping out dask/distributed versions

@pentschev
Copy link
Member Author

Pinning in the conda version would probably be ok but pinning in setup.py or for development environments would present a bit of barrier to quickly swapping out dask/distributed versions

The problem with that is users can also mistakenly install an older Dask/Distributed version. I've seen this happen quite a few times last year when users install conda packages incrementally, rather than installing everything at conda environment creation time. Given dask-cuda is pure-Python, a solution for devs that want to go back further is to clone dask-cuda, change requirements.txt and pip install .. Since we have a simple solution for devs, I would much more prefer to leave less room for user mistakes rather than simplifying a few instances of devs wanting to go further back on Dask dependencies.

@charlesbluca
Copy link
Member

My preference would be to keep the same pinning for the setup.py and conda recipe, as we are still publishing packages on PyPI and it could get confusing having different Dask/Distributed pinnings between the two distributions. Beyond this, there are a decent amount of ways for devs to get the latest dask-cuda co-existing with older Dask versions such that I don't think it should come up as an issue.

It would be even better if we could automate it though that may be more work than it's worth

If we're okay with having automation that operates independently of RAPIDS' release cycle, it shouldn't be difficult to put together a workflow similar to the one updating Dask's gpuCI images to achieve this - happy to open a PR doing that once we decide a pattern moving forward 🙂

@pentschev
Copy link
Member Author

If we're okay with having automation that operates independently of RAPIDS' release cycle, it shouldn't be difficult to put together a workflow similar to the one updating Dask's gpuCI images to achieve this - happy to open a PR doing that once we decide a pattern moving forward 🙂

I'm +1 on that if you find an easy way to parse Dask latest N releases and pick it as minimum.

As a side note for you @charlesbluca : ops is maintaining a versioning system that can query the correct UCX-Py version based on the RAPIDS version and automatically update CI scripts which I think could be useful in dask-build-environment if it's not already doing that, see below for reference.

NEXT_UCXPY_VERSION="$(curl -s https://version.gpuci.io/rapids/${NEXT_SHORT_TAG}).*"
echo "Preparing release $CURRENT_TAG => $NEXT_FULL_TAG"
# Inplace sed replace; workaround for Linux and Mac
function sed_runner() {
sed -i.bak ''"$1"'' $2 && rm -f ${2}.bak
}
# Update UCX-Py version
sed_runner "s/export UCXPY_VERSION=.*/export UCXPY_VERSION="${NEXT_UCXPY_VERSION}"/g" ci/gpu/build.sh

@github-actions
Copy link

github-actions bot commented Mar 6, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Jun 4, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@pentschev
Copy link
Member Author

From experience with past couple of releases, it seems unfeasible to try and be less restrictive with Dask releases. They simply occur too often that is impractical for us to prevent no bugs get introduced that impact RAPIDS use cases. Therefore, I am closing this and we will continue deciding on probably a single release pin at RAPIDS code freeze time.

If anybody has further ideas on how to improve this in the future, please feel free to reopen or write a new issue.

@charlesbluca
Copy link
Member

Don't think this option is anywhere on the horizon (so not reopening the issue for it), but would minimum version testing suit our needs here? I've been exploring this in dask-contrib/dask-sql#555 and it's been helpful in verifying that our core dependencies are constrained properly.

There are some cases that are missed in that PR (e.g. min version testing only runs on python 3.8, we still aren't testing core dependency versions between our minimum and the latest), but I feel like something like that could allow us to support a wider range of Dask versions with the security that they will work as expected.

@pentschev
Copy link
Member Author

I don't think this would help. A minimum version would suffice if there were any guarantees of stability moving forward, as happens with software that follow SemVer, where it's guaranteed with reasonable probability that a patch release will still work with other Major.Minor versions. In Dask there is simply no forwards-/backwards-compatibility, that we could with high probability assume things would still work.

To exemplify, let me use the latest Dask releases that we could potentially consider for the RAPIDS 22.06:

In the case above, if we had minimum assumed to be 2022.05.0, we would be fine, except for 2022.05.1. However, if we decided to move to 2022.05.2, we still have no guarantees that 2022.06.0 will present no issues that will be too late for RAPIDS 22.06 to resolve.

We have seen this sort of issue in the past, so unless we are able to test or verify in some way that all versions our pinning supports work well I don't see this as feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants