simple-cuda-gemm

compile with

nvcc gemm-study.cu -o gemm.o

run with

./gemm.o m n p

Where m, n and p are the dimensions of the matrices to be multiplied and are powers of 2 (values randomly generated, powers of two for simplicity).

Floats may overflow for very large dimensions. No checks have been added to the code for this.

std::cout will include the results of serial and CUDA matrix multiplication.

This implementation uses shared memory feature of cuda. Each thread block fills its shared memory before performing dot products. There is a little ASCI doodle showing how this works at the top of the file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
gemm_study.cu		gemm_study.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple-cuda-gemm

About

Releases

Packages

Languages

johnbowen42/simple-cuda-gemm

Folders and files

Latest commit

History

Repository files navigation

simple-cuda-gemm

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages