We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mul_op (int) in the instructions.cu example is actually doing an addition!
I'm new to cuda, but I presume the 'add.s32' should be 'mul.lo.s32'.
The output in the readme looks to be reflecting this error too.
I tested out the impact of making this change on a Tesla T4 and it went from -
int add 1.89 3 87.044762 3200 (3276800) ... int mul 1.89 3 87.348724 3200 (3276800) float mul 3.14 5 62.641941 3200 (3276800)
to -
int mul 3.14 5 62.652721 3200 (3276800) float mul 3.14 5 62.641941 3200 (3276800)
(so int and float mul taking roughly equal amounts of time.)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
mul_op (int) in the instructions.cu example is actually doing an addition!
I'm new to cuda, but I presume the 'add.s32' should be 'mul.lo.s32'.
The output in the readme looks to be reflecting this error too.
I tested out the impact of making this change on a Tesla T4 and it went from -
to -
(so int and float mul taking roughly equal amounts of time.)
The text was updated successfully, but these errors were encountered: