Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

[REVIEW] Reverting change that removed unsigned in dispatch_spmv_orig.cuh #196

Closed
wants to merge 1 commit into from

Conversation

cjnolet
Copy link

@cjnolet cjnolet commented Aug 21, 2020

Closes #195
Closes #161

@cjnolet cjnolet changed the base branch from 1.8.0 to main August 21, 2020 23:59
@cjnolet
Copy link
Author

cjnolet commented Aug 22, 2020

cc @seunghwak @allisonvacanti

@alliepiper
Copy link
Collaborator

Can you shed some light on what CUB calls are being made in cupy in this case? I'd like to add a regression test to prevent this in the future.

@ByteHamster ByteHamster mentioned this pull request Aug 30, 2020
@alliepiper alliepiper added this to the 1.11.1 milestone Oct 20, 2020
@alliepiper alliepiper added the unverified Cannot be reproduced or confirmed. label Oct 20, 2020
@alliepiper
Copy link
Collaborator

Ping @cjnolet -- not sure if you saw my question above.

@alliepiper
Copy link
Collaborator

Closing for lack of response -- feel free to reopen with more information as detailed above if you still need this.

@alliepiper alliepiper closed this Nov 24, 2020
@leofang
Copy link
Member

leofang commented Dec 19, 2020

Hi @allisonvacanti Not sure if it could work this way: Could you reopen this PR and reassign to me so that I can take over and investigate when I have time? I can't promise a date but can put this in my queue 😁 Thanks!

@alliepiper
Copy link
Collaborator

@leofang Works for me -- thanks!

@alliepiper alliepiper reopened this Jan 4, 2021
@alliepiper alliepiper modified the milestones: 1.12.0, Backlog Feb 8, 2021
@leofang
Copy link
Member

leofang commented Feb 19, 2021

I am looking into this right now, so will bomb this thread later this afternoon. Looks like as of CUDA 11.2.0 the bug still exists, but is gone on the current master. Trying to figure out what fixed it...

@leofang
Copy link
Member

leofang commented Feb 19, 2021

For some reason git bisect complains about "Some good revs are not ancestors of the bad rev" when bisecting against the current master. Is it because of the earlier repo change?

@leofang
Copy link
Member

leofang commented Feb 19, 2021

nvm, I was being silly 😅

@leofang
Copy link
Member

leofang commented Feb 19, 2021

Hi @allisonvacanti I am done with my checks. As @cjnolet pointed out, the first commit causing the spmv error is b2e64cf, and it seems to be fixed by your commit 63e2ad4 from #249 🎉 So looks like the fix will be included in CUDA 11.2.2 or CUDA 11.3, right? Please feel free to close this PR and the linked bug reports.

FYI this is how I checked locally: with each git bisect I rebuild CuPy's cub module and run the test:

rm cupy/cuda/cub.cp*
pip install -v -e .
python test_cub_csrmv_bug.py

the test script is based on @cjnolet's reproducer from cupy/cupy#3822:

# test_cub_csrmv_bug.py
import cupy

a = cupy.sparse.random(100, 100, format='csr', density=0.9)
cupy.core.set_routine_accelerators(['cub'])
print(a.T.dot(cupy.ones(a.shape[0])))
print(a.dot(cupy.ones(a.shape[1])))

The output arrays should not contain values of order 1E-314.

@alliepiper
Copy link
Collaborator

Excellent news -- I love it when bugs accidentally fix themselves :D

Thanks for checking into this, @leofang.

@alliepiper
Copy link
Collaborator

To answer the question about releases, the cuda toolkit usually ships with a slightly old version of thrust/cub. This likely won't be included until 11.4, or possibly a 11.3.x update at the earliest. CTK 11.3 will include thrust/cub 1.11.0.

@leofang
Copy link
Member

leofang commented Feb 20, 2021

Great, thanks for the info @allisonvacanti. Will discuss with the team for working around the bug.

@alliepiper alliepiper removed this from the Backlog milestone Apr 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
unverified Cannot be reproduced or confirmed.
Projects
None yet
3 participants