Small error in instructions.cu example (mul_op) #1

col-mcc · 2024-10-23T22:48:02Z

mul_op (int) in the instructions.cu example is actually doing an addition!

I'm new to cuda, but I presume the 'add.s32' should be 'mul.lo.s32'.

The output in the readme looks to be reflecting this error too.

I tested out the impact of making this change on a Tesla T4 and it went from -

int add 1.89 3 87.044762 3200 (3276800)
...
int mul 1.89 3 87.348724 3200 (3276800)
float mul 3.14 5 62.641941 3200 (3276800)

to -

int mul 3.14 5 62.652721 3200 (3276800)
float mul 3.14 5 62.641941 3200 (3276800)

(so int and float mul taking roughly equal amounts of time.)

col-mcc changed the title ~~Small error in examples.cu (mul_op)~~ Small error in instructions.cu example (mul_op) Oct 23, 2024

Provide feedback