Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

clBLAS to MIOpenGEMM

James Newling edited this page Sep 10, 2017 · 13 revisions

The clBLAS API for sgemm is,

clblasStatus
clblasSgemm(
    clblasOrder order,
    clblasTranspose transA,
    clblasTranspose transB,
    size_t M,
    size_t N,
    size_t K,
    cl_float alpha,
    const cl_mem A,
    size_t offA,
    size_t lda,
    const cl_mem B,
    size_t offB,
    size_t ldb,
    cl_float beta,
    cl_mem C,
    size_t offC,
    size_t ldc,
    cl_uint numCommandQueues,
    cl_command_queue *commandQueues,
    cl_uint numEventsInWaitList,
    const cl_event *eventWaitList,
    cl_event *events);

A loop of GEMMs using clBLAS might look like this:

for (int i = 0; i < 10; ++i){ 
  clblasStatus status = 
  clblasSgemm(
  // order, transA and transB are clBLAS enums
  order, transA, transB, M, N, K,
  
  // offsets, strides and memory buffers
  alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc, 
  
  // clBLAS allows multiple cl_command_queues
  n_queues, queues, n_waitlist, waitlist, events
  );
}

The equivalent code using MIOpenGEMM might look like this (if the GEMM geometry is "small" and you want to minimise the host-side overhead):

// First, a "warm-up" call for this GEMM geometry, for 
// generating kernel source string, compiling, etc.    
auto stat = MIOpenGEMM::xgemm<float>(
// isColMajor, tA and tB are now bool
isColMajor, tA, tB, M, N, K,

// unchanged from clBLAS
alpha, A, offA, lda, B, offB, ldb, beta, C, offC, ldc,

// assuming no workspace for now
nullptr,0,0, 

// MIOpenGEMM only allows 1 cl_command_queue
&queues[0], n_waitlist, waitlist, &events[0],  

// this is the first run with this GEMM geometry, so ID is negative  
-1);

// obtain the cache ID for this geometry from the returned GemmStatus object
int ID_for_this_geometry = stat.ID;

// Now run with the cached and compiled kernel
for (int i = 1; i < 10; ++i){
  stat = MIOpenGEMM::xgemm<float>(
  isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta, 
  C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],  
  ID_for_this_geometry);
}

or more simply like this (if the GEMM geometry is not small (example : m=n=k > 200)) or you're not interested in a 15% difference on small problems:

for (int i = 0; i < 10; ++i){
  stat = MIOpenGEMM::xgemm<float>(
  isColMajor, tA, tB, M, N, K, alpha, A, offA, lda, B, offB, ldb, beta, 
  C, offC, ldc, nullptr,0,0, &queues[0], n_waitlist, waitlist, &events[0],  
  -1);
}

The differences between clBLAS and MIOpenGEMM APIs are,

  1. enums order, transA, transB are converted to bools isColMajor, tA, tB
  2. three workspace parameters are added (which can safely be set to nullptr, 0, 0)
  3. numEventsInWaitList is removed, arrays queues and events must have length 1.
  4. ID is added (which can safely be set to -1)
Clone this wiki locally