This repository has been archived by the owner on Feb 27, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
clBLAS to MIOpenGEMM
James Newling edited this page Sep 12, 2017
·
13 revisions
The clBLAS API for sgemm is,
clblasStatus
clblasSgemm(
clblasOrder order,
clblasTranspose transA,
clblasTranspose transB,
size_t M,
size_t N,
size_t K,
cl_float alpha,
const cl_mem A,
size_t offA,
size_t lda,
const cl_mem B,
size_t offB,
size_t ldb,
cl_float beta,
cl_mem C,
size_t offC,
size_t ldc,
cl_uint numCommandQueues,
cl_command_queue *commandQueues,
cl_uint numEventsInWaitList,
const cl_event *eventWaitList,
cl_event *events);
A loop of GEMMs using clBLAS might look like this:
for (int i = 0; i < 10; ++i){
clblasStatus status =
clblasSgemm(
// order, transA and transB are clBLAS enums
order, transA, transB, M, N, K,
// offsets, strides and memory buffers
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,
// clBLAS allows multiple cl_command_queues
n_queues, queues, n_waitlist, waitlist, events
);
}
The equivalent code using MIOpenGEMM might look like this,
for (int i = 0; i < 10; ++i){
auto stat = MIOpenGEMM::gemm0<float>(
isColMajor, tA, tB, M, N, K,
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,
&queues[0], n_waitlist, waitlist, &events[0]);
}
If the matrices A
, B
and C
are very small (< 100x100), then there is another slightly faster API function xgemm, which has less host-side overhead. Using xgemm,
// First, a "warm-up" call for this GEMM geometry, for
// generating kernel source string, compiling and getting ID.
auto stat = MIOpenGEMM::xgemm<float>(
// isColMajor, tA and tB are now bool
isColMajor, tA, tB, M, N, K,
// unchanged from clBLAS
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,
// assuming no workspace for now
nullptr,0,0,
// MIOpenGEMM only allows 1 cl_command_queue
&queues[0], n_waitlist, waitlist, &events[0],
// this is the first run with this GEMM geometry, so ID is negative
-1);
// obtain the cache ID for this geometry from the returned GemmStatus object
int ID_for_this_geometry = stat.ID;
// Now run with the cached and compiled kernel
for (int i = 1; i < 10; ++i){
stat = MIOpenGEMM::xgemm<float>(
isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta,
C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],
ID_for_this_geometry);
}
The differences between clBLAS and MIOpenGEMM APIs are,
- enums
order
,transA
,transB
are converted to boolsisColMajor
,tA
,tB
- (xgemm) three workspace parameters are added (which can safely be set to nullptr, 0, 0)
-
numEventsInWaitList
is removed, arraysqueues
andevents
must have length 1. - (xgemm)
ID
is added (which can safely be set to -1)
Workspace can be used in certain cases to accelerate GEMM, especially when A
and B
are of significantly different size with leading dimension large powers of 2.