Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STRICT build of dslash_mdw_fused_* fails with sm_86 #1403

Closed
jcosborn opened this issue Sep 8, 2023 · 2 comments · Fixed by #1405
Closed

STRICT build of dslash_mdw_fused_* fails with sm_86 #1403

jcosborn opened this issue Sep 8, 2023 · 2 comments · Fixed by #1405
Assignees

Comments

@jcosborn
Copy link
Contributor

jcosborn commented Sep 8, 2023

A STRICT build using sm_86 with MULTIGRID on fails with:
Building CUDA object lib/CMakeFiles/quda.dir/dslash_mdw_fused_ls20.cu.o
ptxas error : Value of threads per SM for entry ZN4quda10raw_kernelINS_18mobius_tensor_core17FusedMobiusDslashENS1_14FusedDslashArgIsLi3EL21QudaReconstructType_s8ELi20ELNS_19MdwfFusedDslashTypeE4ELi32ELi3ELb0EEELb0EEEvT0 is out of range. .minnctapersm will be ignored

@maddyscientist
Copy link
Member

@hummingtree can you take a look?

@hummingtree
Copy link
Member

@jcosborn This is due to SM 86, 87 and 89 only allow a maximum number of 1536 (as supposed to 2048) per SM. I will have a PR to fix this. Meanwhile you can disable this part of the code by having -D QUDA_MDW_FUSED_LS_LIST="" as part of the cmake parameters, which would decrease your compile time by quite a bit I expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants