Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

clBLAS to MIOpenGEMM

James Newling edited this page Sep 7, 2017 · 13 revisions

Here we discuss how to convert a clBLAS API call to an MIOpenGEMM API call. The clBLAS API for sgemm is,

clblasStatus
clblasSgemm(
    clblasOrder order,
    clblasTranspose transA,
    clblasTranspose transB,
    size_t M,
    size_t N,
    size_t K,
    cl_float alpha,
    const cl_mem A,
    size_t offA,
    size_t lda,
    const cl_mem B,
    size_t offB,
    size_t ldb,
    cl_float beta,
    cl_mem C,
    size_t offC,
    size_t ldc,
    cl_uint numCommandQueues,
    cl_command_queue *commandQueues,
    cl_uint numEventsInWaitList,
    const cl_event *eventWaitList,
    cl_event *events);

A loop of GEMMs using clBLAS might look like this:

for (int i = 0; i < 10; ++i){ 
  clblasStatus status = 
  clblasSgemm(
  // order, transA and transB are clBLAS enums
  order, transA, transB, M, N, K,
  
  // offsets, strides and memory buffers
  alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc, 
  
  // clBLAS allows multiple cl_command_queues
  n_queues, queues, n_waitlist, waitlist, events
  );
}

The equivalent code using MIOpenGEMM might look like this:

// first "warm-up" call for this GEMM geometry 
// (generating kernel source string, compiling, etc).    
auto stat = MIOpenGEMM::xgemm<float>(
// isColMajor, tA and tB are now bool
isColMajor, tA, tB, M, N, K,

// unchanged from clBLAS
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,

// assuming no workspace for now
nullptr,0,0, 

// MIOpenGEMM only allows only 1 cl_command_queue
&queues[0], n_waitlist, waitlist, &events[0],  

// this is the first run with this GEMM geometry, so ID is negative  
-1);

// obtain the cache ID for this geometry from the returned GemmStatus object
int ID_for_this_geometry = stat.ID;

// Now run with the cached and compiled kernel
for (int i = 1; i < 10; ++i){
  stat = MIOpenGEMM::xgemm<float>(
  isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta, 
  C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],  
  ID_for_this_geometry);
}

The differences between clBLAS and MIOpenGEMM APIs are,

  1. enums order, transA, transB are converted to bools isColMajor, tA, tB
  2. three workspace parameters are included
  3. numEventsInWaitList is removed, and the arrays queues and events must now have length 1.
  4. ID is added.

In the above example, the loop is for a unchanging GEMM Geometry. If the Geometry were to change in the loop, the ID passed would need to -1.

There are 2 situations when a Geometry needs to be run with a negative ID more than once. They are,

  1. If it is run on several (different?) devices.
  2. If it is run on multiple (host) threads which might run GEMM on this Geometry simultaneously. (TODO: clarify this and give an example).
Clone this wiki locally