Use less DataLayouts internals in DSS #2049

charleskawczynski · 2024-10-17T21:05:26Z

This is a first step towards #2048.

sriharshakandala · 2024-10-18T21:23:25Z

Generally speaking, max_threads = 256 works well. If resource constraints (shmem size, register usage, etc.) permit, CUDA will automatically schedule multiple thread blocks on the same streaming multiprocessor and no additional intervention is required.
Using threads_via_occupancy does not necessarily guarantee optimality! Its up to the user to design the best threadblock configuration for their application, and I believe we should custom tune it, especially for performance critical kernels!

charleskawczynski · 2024-10-19T01:51:52Z

Generally speaking, max_threads = 256 works well. If resource constraints (shmem size, register usage, etc.) permit, CUDA will automatically schedule multiple thread blocks on the same streaming multiprocessor and no additional intervention is required. Using threads_via_occupancy does not necessarily guarantee optimality! Its up to the user to design the best threadblock configuration for their application, and I believe we should custom tune it, especially for performance critical kernels!

I'm fine with reverting that for now because it's not really part of the refactor. I am curious how this differs from the occupancy API

charleskawczynski force-pushed the ck/dss_less_internals branch 2 times, most recently from c57b739 to 8d923f6 Compare October 18, 2024 20:57

charleskawczynski changed the title ~~Ck/dss less internals~~ Use less DataLayouts internals in DSS Oct 18, 2024

charleskawczynski force-pushed the ck/dss_less_internals branch 3 times, most recently from 540e9a7 to cd82e7b Compare October 18, 2024 21:02

charleskawczynski requested a review from dennisYatunin October 18, 2024 21:03

charleskawczynski marked this pull request as ready for review October 18, 2024 21:03

charleskawczynski requested review from Sbozzolo and sriharshakandala October 18, 2024 21:03

charleskawczynski force-pushed the ck/dss_less_internals branch 3 times, most recently from fd90d44 to 42f2474 Compare October 19, 2024 01:34

charleskawczynski force-pushed the ck/dss_less_internals branch 3 times, most recently from 90db469 to b446499 Compare October 19, 2024 17:17

Remove more use of DataLayout internals in DSS

372aea2

charleskawczynski force-pushed the ck/dss_less_internals branch from b446499 to 372aea2 Compare October 20, 2024 01:04

charleskawczynski merged commit 0635ff3 into main Oct 20, 2024
17 checks passed

charleskawczynski deleted the ck/dss_less_internals branch October 20, 2024 01:43

charleskawczynski mentioned this pull request Oct 20, 2024

Remove use of farray in dss #2047

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use less DataLayouts internals in DSS #2049

Use less DataLayouts internals in DSS #2049

charleskawczynski commented Oct 17, 2024 •

edited

Loading

sriharshakandala commented Oct 18, 2024 •

edited

Loading

charleskawczynski commented Oct 19, 2024

Use less DataLayouts internals in DSS #2049

Use less DataLayouts internals in DSS #2049

Conversation

charleskawczynski commented Oct 17, 2024 • edited Loading

sriharshakandala commented Oct 18, 2024 • edited Loading

charleskawczynski commented Oct 19, 2024

charleskawczynski commented Oct 17, 2024 •

edited

Loading

sriharshakandala commented Oct 18, 2024 •

edited

Loading