Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

clBLAS to MIOpenGEMM

James Newling edited this page Sep 7, 2017 · 13 revisions

The clBLAS API for sgemm is,

clblasStatus
clblasSgemm(
    clblasOrder order,
    clblasTranspose transA,
    clblasTranspose transB,
    size_t M,
    size_t N,
    size_t K,
    cl_float alpha,
    const cl_mem A,
    size_t offA,
    size_t lda,
    const cl_mem B,
    size_t offB,
    size_t ldb,
    cl_float beta,
    cl_mem C,
    size_t offC,
    size_t ldc,
    cl_uint numCommandQueues,
    cl_command_queue *commandQueues,
    cl_uint numEventsInWaitList,
    const cl_event *eventWaitList,
    cl_event *events);

A loop of GEMMs using clBLAS might look like this:

for (int i = 0; i < 10; ++i){ 
  clblasStatus status = 
  clblasSgemm(
  // order, transA and transB are clBLAS enums
  order, transA, transB, M, N, K,
  
  // offsets, strides and memory buffers
  alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc, 
  
  // clBLAS allows multiple cl_command_queues
  n_queues, queues, n_waitlist, waitlist, events
  );
}

The equivalent code using MIOpenGEMM might look like this:

// first "warm-up" call for this GEMM geometry 
// (generating kernel source string, compiling, etc).    
MIOpenGEMM::GemmStatus status = MIOpenGEMM::xgemm<float>(
// isColMajor, tA and tB are now bool
isColMajor, tA, tB, M, N, K,

// unchanged from clBLAS
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,

// assuming no workspace for now
nullptr,0,0, 

// MIOpenGEMM only allows only 1 cl_command_queue
&queues[0], n_waitlist, waitlist, &events[0],  

// this is the first run with this GEMM geometry, so ID is negative  
-1);

int ID_for_this_geometry = status.ID;

// Now run with the cached and compiled kernel
for (int i = 1; i < 10; ++i){
  auto stat = MIOpenGEMM::xgemm<float>(
  isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta, 
  C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],  
  ID_for_this_geometry);
}

The differences are

  1. enums order, transA, transB are converted to bools isColMajor, tA, tB
  2. three workspace parameters are included
  3. numEventsInWaitList is removed, and the arrays queues and events must now have length 1.
  4. ID is added.

In the above example the loop is for a unchanging Geometry (TODO).

Clone this wiki locally