Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Decide what versions of cupy to support #12884

Closed
vyasr opened this issue Mar 6, 2023 · 13 comments
Closed

[FEA] Decide what versions of cupy to support #12884

vyasr opened this issue Mar 6, 2023 · 13 comments
Labels
feature request New feature or request

Comments

@vyasr
Copy link
Contributor

vyasr commented Mar 6, 2023

This issue is essentially a cupy version of #12398 that I am raising as a result of needing #12883 to update cudf for supporting recent changes to cupy. We currently claim to support cupy 9.5-11.*, but our test suite will have some failures when run using cupy < 11.6. We should determine what an appropriate support matrix here is.

@vyasr vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Mar 6, 2023
@vyasr
Copy link
Contributor Author

vyasr commented Mar 6, 2023

@bdice @wence- thoughts on what versions you would like to see supported here?

CC @shwina

@bdice
Copy link
Contributor

bdice commented Mar 6, 2023

If our test suite requires cupy >=11.6, I think that's what we should pin to. If we want to extend support to 11.x then we should make our tests pass with some set of earlier 11.x versions. We should not allow dependency versions that do not pass our test suite.

@bdice
Copy link
Contributor

bdice commented Mar 6, 2023

I would recommend only supporting very recent versions of cupy, which would mean pinning to >=11.6. We rely heavily on the library and its cutting edge features, for both platform support (e.g. CUDA 12, sm90) and API support (e.g. dlpack).

@vyasr
Copy link
Contributor Author

vyasr commented Mar 6, 2023

Definitely agree that we shouldn't allow versions that don't pass our test suite. Requiring a cupy version that was released within the month feels rather onerous though. We really don't use much in the way of "cutting edge features" from cupy. I'm fairly certain the list of examples you gave is exhaustive, with the small exception of some small changes to RNGs that we use in sampling. RAPIDS is going to be supporting both CUDA 11 and CUDA 12 for the foreseeable future (once we roll out CUDA 12 support) so it would be odd to use CUDA 12 or recent hardware as a reason to drop support for older cupy versions. IMO if tests pass on 11.x with an older cupy version on older hardware/CUDA we should continue to allow its usage on CUDA 11 systems. I'm certainly fine dropping any version that requires us to make modifications to our tests to get them to pass now, though.

@wence-
Copy link
Contributor

wence- commented Mar 7, 2023

So CUDA 12 support arrived (with a jit compilation slowness that was fixed in 11.6) in 11.5; Support for CUDA 11.8 (and H100s) arrived in 11.3. If we intend to continue testing on CUDA 11 and CUDA 12 for the foreseeable, which if we're supporting both we should be doing, then I'd be inclined to pick something around the 11.3 mark for oldest supported version.

FWIW, locally with 11.5 I see these failures:

FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[1-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[1-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[100-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[100-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[6000000-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[6000000-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[2-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[2-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[100-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[100-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[6000000-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[6000000-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_nanoseconds_overflow - AssertionError: numpy array are different
FAILED python/cudf/cudf/tests/test_orc.py::test_writer_timestamp_stream_size - AssertionError: numpy array are different
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_writer_negative_timestamp - AssertionError: numpy array are different

But I also see these failures with 11.6 (so I don't think this is a cupy thing?)

There is a further difficulty here which is that our test suite always runs with whatever the most recently available cupy is (subject to upper-bound constraints). So someone could introduce functionality that breaks in older versions, but we would never notice until a bug comes in. I think this speaks to the broader question of how to do testing and version requirements (similar to the pynvml discussions we're having elsewhere).

@wence-
Copy link
Contributor

wence- commented Mar 7, 2023

FWIW, locally with 11.5 I see these failures:

These same failures appear with 11.3 (and are the only ones). So if these are red herrings then I think a lower bound of 11.3 would be fine.

@shwina
Copy link
Contributor

shwina commented Mar 7, 2023

I believe these are red herrings, and have to do with the locale.

@bdice
Copy link
Contributor

bdice commented Mar 7, 2023

See #7314. Those test failures are consistently seen in rapids-compose Docker environments and can be safely ignored for this purpose (but the underlying bug remains somewhat of a mystery).

@vyasr
Copy link
Contributor Author

vyasr commented Mar 7, 2023

The ORC failures are indeed red herrings due to locale and are most likely occurring because @wence- is developing using rapids-compose, see #7314. The failures to be concerned about from #12883 are the ones around ufuncs or dlpack.

@wence- I'm not sure how you're not seeing any other failures with 11.3 though. Wasn't #12883 correcting for a change only present in 11.6? Wouldn't we still need to condition that test?

@wence-
Copy link
Contributor

wence- commented Mar 7, 2023

@wence- I'm not sure how you're not seeing any other failures with 11.3 though. Wasn't #12883 correcting for a change only present in 11.6? Wouldn't we still need to condition that test?

I hadn't pulled in #12883. Rebuilding the environment with gcc-11, I'll update in an hour or two...

@vyasr
Copy link
Contributor Author

vyasr commented Mar 7, 2023

Without #12883 I would have expected you to see the dlpack failures with cupy 11.6

@wence-
Copy link
Contributor

wence- commented Mar 7, 2023

Without #12883 I would have expected you to see the dlpack failures with cupy 11.6

I have yet to run with 11.6 (my environment had 11.5).

@vyasr
Copy link
Contributor Author

vyasr commented May 17, 2023

As of #13284 we require cupy>=12.0.0.

@vyasr vyasr closed this as completed May 17, 2023
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants