Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devel #1782

Merged
merged 15 commits into from
Oct 12, 2020
Merged

Devel #1782

merged 15 commits into from
Oct 12, 2020

Conversation

dillontsmith
Copy link
Contributor

No description provided.

dependabot bot and others added 14 commits October 6, 2020 08:04
Required by llvmlite-0.34.0, except for aarch64 which needs llvm-9.
We're not hitting the bug that restricts aarch64 to llvm-9 so bump the
version for all archs.

Signed-off-by: Jan Vesely <[email protected]>
Parameters are configured such that LCA == DDM followed by Logistic function
We're not using group synchronization and this reduces pressure on per-block resources.
Most GPUs can handle 2-3 times more warps than blocks per SM [0].
Block size of 128 creates 4 times fewer blocks than warps,
maximizing utilization of GPU resources.

[0] https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability

Signed-off-by: Jan Vesely <[email protected]>

fixup blocksize
We're not using any shared memery, but generate plenty of private data.
Signed-off-by: Jan Vesely <[email protected]>
Use explicitly sized types instead of platform specific ones
Signed-off-by: Jan Vesely <[email protected]>
Make IR generation for execution counts bitwidth agnostic.
Add minor CUDA tune ups.
@dillontsmith dillontsmith requested a review from SamKG October 12, 2020 13:55
@coveralls
Copy link

coveralls commented Oct 12, 2020

Coverage Status

Coverage increased (+0.004%) to 82.713% when pulling 78a0712 on devel into e415857 on master.

@dillontsmith dillontsmith merged commit 472a2d0 into master Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants