This repository has been archived by the owner on Feb 27, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
clBLAS to MIOpenGEMM
James Newling edited this page Sep 7, 2017
·
13 revisions
The clBLAS API for sgemm is,
clblasStatus
clblasSgemm(
clblasOrder order,
clblasTranspose transA,
clblasTranspose transB,
size_t M,
size_t N,
size_t K,
cl_float alpha,
const cl_mem A,
size_t offA,
size_t lda,
const cl_mem B,
size_t offB,
size_t ldb,
cl_float beta,
cl_mem C,
size_t offC,
size_t ldc,
cl_uint numCommandQueues,
cl_command_queue *commandQueues,
cl_uint numEventsInWaitList,
const cl_event *eventWaitList,
cl_event *events);
A loop of GEMMs using clBLAS might look like this:
for (int i = 0; i < 10; ++i){
clblasStatus status =
clblasSgemm(
// order, transA and transB are clBLAS enums
order, transA, transB, M, N, K,
// offsets, strides and memory buffers
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,
// clBLAS allows multiple cl_command_queues
n_queues, queues, n_waitlist, waitlist, events
);
}
The equivalent code using MIOpenGEMM might look like this:
// First, a "warm-up" call for this GEMM geometry, for
// generating kernel source string, compiling, etc.
auto stat = MIOpenGEMM::xgemm<float>(
// isColMajor, tA and tB are now bool
isColMajor, tA, tB, M, N, K,
// unchanged from clBLAS
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,
// assuming no workspace for now
nullptr,0,0,
// MIOpenGEMM only allows only 1 cl_command_queue
&queues[0], n_waitlist, waitlist, &events[0],
// this is the first run with this GEMM geometry, so ID is negative
-1);
// obtain the cache ID for this geometry from the returned GemmStatus object
int ID_for_this_geometry = stat.ID;
// Now run with the cached and compiled kernel
for (int i = 1; i < 10; ++i){
stat = MIOpenGEMM::xgemm<float>(
isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta,
C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],
ID_for_this_geometry);
}
The differences between clBLAS and MIOpenGEMM APIs are,
- enums
order
,transA
,transB
are converted to boolsisColMajor
,tA
,tB
- three workspace parameters are included
-
numEventsInWaitList
is removed, and the arraysqueues
andevents
must now have length 1. -
ID
is added.
In the above example, the loop is for a unchanging GEMM Geometry. If the Geometry were to change in the loop, the ID
passed would need to -1.
There are 2 situations when a Geometry needs to be run with a negative ID more than once. They are,
- If it is run on several (different?) devices.
- If it is run on multiple (host) threads which might run GEMM on this Geometry simultaneously. (TODO: clarify this and give an example).