[FEA] Decide what versions of cupy to support #12884

vyasr · 2023-03-06T18:06:18Z

This issue is essentially a cupy version of #12398 that I am raising as a result of needing #12883 to update cudf for supporting recent changes to cupy. We currently claim to support cupy 9.5-11.*, but our test suite will have some failures when run using cupy < 11.6. We should determine what an appropriate support matrix here is.

vyasr · 2023-03-06T18:07:15Z

@bdice @wence- thoughts on what versions you would like to see supported here?

CC @shwina

bdice · 2023-03-06T19:35:24Z

If our test suite requires cupy >=11.6, I think that's what we should pin to. If we want to extend support to 11.x then we should make our tests pass with some set of earlier 11.x versions. We should not allow dependency versions that do not pass our test suite.

bdice · 2023-03-06T19:41:02Z

I would recommend only supporting very recent versions of cupy, which would mean pinning to >=11.6. We rely heavily on the library and its cutting edge features, for both platform support (e.g. CUDA 12, sm90) and API support (e.g. dlpack).

vyasr · 2023-03-06T21:01:01Z

Definitely agree that we shouldn't allow versions that don't pass our test suite. Requiring a cupy version that was released within the month feels rather onerous though. We really don't use much in the way of "cutting edge features" from cupy. I'm fairly certain the list of examples you gave is exhaustive, with the small exception of some small changes to RNGs that we use in sampling. RAPIDS is going to be supporting both CUDA 11 and CUDA 12 for the foreseeable future (once we roll out CUDA 12 support) so it would be odd to use CUDA 12 or recent hardware as a reason to drop support for older cupy versions. IMO if tests pass on 11.x with an older cupy version on older hardware/CUDA we should continue to allow its usage on CUDA 11 systems. I'm certainly fine dropping any version that requires us to make modifications to our tests to get them to pass now, though.

wence- · 2023-03-07T15:04:45Z

So CUDA 12 support arrived (with a jit compilation slowness that was fixed in 11.6) in 11.5; Support for CUDA 11.8 (and H100s) arrived in 11.3. If we intend to continue testing on CUDA 11 and CUDA 12 for the foreseeable, which if we're supporting both we should be doing, then I'd be inclined to pick something around the 11.3 mark for oldest supported version.

FWIW, locally with 11.5 I see these failures:

FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[1-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[1-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[100-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[100-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[6000000-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_write_statistics[6000000-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[2-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[2-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[100-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[100-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[6000000-STRIPE] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_chunked_write_statistics[6000000-ROWGROUP] - AssertionError: assert False
FAILED python/cudf/cudf/tests/test_orc.py::test_nanoseconds_overflow - AssertionError: numpy array are different
FAILED python/cudf/cudf/tests/test_orc.py::test_writer_timestamp_stream_size - AssertionError: numpy array are different
FAILED python/cudf/cudf/tests/test_orc.py::test_orc_writer_negative_timestamp - AssertionError: numpy array are different

But I also see these failures with 11.6 (so I don't think this is a cupy thing?)

There is a further difficulty here which is that our test suite always runs with whatever the most recently available cupy is (subject to upper-bound constraints). So someone could introduce functionality that breaks in older versions, but we would never notice until a bug comes in. I think this speaks to the broader question of how to do testing and version requirements (similar to the pynvml discussions we're having elsewhere).

wence- · 2023-03-07T15:06:39Z

FWIW, locally with 11.5 I see these failures:

These same failures appear with 11.3 (and are the only ones). So if these are red herrings then I think a lower bound of 11.3 would be fine.

shwina · 2023-03-07T15:17:21Z

I believe these are red herrings, and have to do with the locale.

bdice · 2023-03-07T15:22:41Z

See #7314. Those test failures are consistently seen in rapids-compose Docker environments and can be safely ignored for this purpose (but the underlying bug remains somewhat of a mystery).

vyasr · 2023-03-07T15:23:39Z

The ORC failures are indeed red herrings due to locale and are most likely occurring because @wence- is developing using rapids-compose, see #7314. The failures to be concerned about from #12883 are the ones around ufuncs or dlpack.

@wence- I'm not sure how you're not seeing any other failures with 11.3 though. Wasn't #12883 correcting for a change only present in 11.6? Wouldn't we still need to condition that test?

wence- · 2023-03-07T15:35:39Z

@wence- I'm not sure how you're not seeing any other failures with 11.3 though. Wasn't #12883 correcting for a change only present in 11.6? Wouldn't we still need to condition that test?

I hadn't pulled in #12883. Rebuilding the environment with gcc-11, I'll update in an hour or two...

vyasr · 2023-03-07T16:44:38Z

Without #12883 I would have expected you to see the dlpack failures with cupy 11.6

wence- · 2023-03-07T16:45:56Z

Without #12883 I would have expected you to see the dlpack failures with cupy 11.6

I have yet to run with 11.6 (my environment had 11.5).

vyasr · 2023-05-17T20:31:38Z

As of #13284 we require cupy>=12.0.0.

vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Mar 6, 2023

vyasr mentioned this issue Mar 6, 2023

Expect cupy to now support bool arrays for dlpack. #12883

Merged

3 tasks

vyasr closed this as completed May 17, 2023

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Decide what versions of cupy to support #12884

[FEA] Decide what versions of cupy to support #12884

vyasr commented Mar 6, 2023

vyasr commented Mar 6, 2023

bdice commented Mar 6, 2023

bdice commented Mar 6, 2023 •

edited

Loading

vyasr commented Mar 6, 2023

wence- commented Mar 7, 2023

wence- commented Mar 7, 2023

shwina commented Mar 7, 2023

bdice commented Mar 7, 2023

vyasr commented Mar 7, 2023

wence- commented Mar 7, 2023

vyasr commented Mar 7, 2023

wence- commented Mar 7, 2023

vyasr commented May 17, 2023

[FEA] Decide what versions of cupy to support #12884

[FEA] Decide what versions of cupy to support #12884

Comments

vyasr commented Mar 6, 2023

vyasr commented Mar 6, 2023

bdice commented Mar 6, 2023

bdice commented Mar 6, 2023 • edited Loading

vyasr commented Mar 6, 2023

wence- commented Mar 7, 2023

wence- commented Mar 7, 2023

shwina commented Mar 7, 2023

bdice commented Mar 7, 2023

vyasr commented Mar 7, 2023

wence- commented Mar 7, 2023

vyasr commented Mar 7, 2023

wence- commented Mar 7, 2023

vyasr commented May 17, 2023

bdice commented Mar 6, 2023 •

edited

Loading