Calls to malloc/free from inside HIP kernels #175

pvelesko · 2022-10-02T13:33:48Z

How do we enable this? @pjaaskel @Kerilk

pjaaskel · 2022-10-03T13:16:41Z

The basic plan so far has been to add a shadow buffer to the kernel which is basically the "heap" when the kernel calls malloc/free and implement dynamic memory management by returning chunks from the buffer.

pvelesko · 2022-10-03T13:47:46Z

Can't we do this in SPIR-V?

pjaaskel · 2022-10-03T13:50:19Z

OpenCL (and thus SPIR-V in this case) doesn't support device-side dynamic memory allocation. We could define a new OpenCL extension that does, but it's better to provide a portable solution that works with the current Intel drivers.

pvelesko · 2022-10-03T13:53:26Z

SPIR-V Specification
3.32.8 Memory Instructions
OpVariable
Allocate an object in memory, resulting in a pointer to it, which can be used with OpLoad and OpStore.

Why can't we use this?

pjaaskel · 2022-10-03T14:01:17Z

It's for static (compile time size known) memory allocation.

pvelesko · 2022-10-03T14:08:03Z

Ah, I see.

Kerilk · 2022-10-03T15:34:00Z

Not much other way around this without an extension. The size of the buffer to allocate will be a problem though, and a hint (or upper bound) regarding the amount of memory involved would be very useful here, unfortunately in the general case this will be intractable.

Drivers that have device side enqueue must have the necessary functionalities already, so it may be an easy extension for them to implement if we define it right.

Sarbojit2019 · 2022-10-06T06:35:34Z

I played around device malloc implementation in CUDA11 and here is my observation :

malloc (heap size) is allocated once per device.
Looks like default size is 8MB which user can increase/decrease by using cudaDeviceSetLimit(cudaLimitMallocHeapSize, size).
Once kernel is launched heap size can't be changed.

With above observation I think as @pjaaskel mentioned in his response having buffer allocated a chunk of memory of fixed size will be a valid approach. Only point I have is this is device limit hence buffer/heap should be tied to per device not per kernel. Below is the test I used to check cuda behavior

#include <iostream>
#include <cuda_runtime.h>

__global__ void malloc__(int size) {
    int* ptr = (int*)malloc(size);
    if (ptr) {
        printf("1. Passed\n");
    } else {
        printf("1. Failed\n");
    }
}

__global__ void malloc__2(int size) {
    int* ptr = (int*)malloc(size);
    if (ptr) {
        printf("2. Passed\n");
    } else {
        printf("2. Failed\n");
    }
}
int main() {
    size_t limit_val =0;
    cudaError_t status = cudaDeviceGetLimit(&limit_val, cudaLimitMallocHeapSize);
    std::cout<<"Status : "<<cudaGetErrorName(status)<<std::endl;
    std::cout<<"limit_val = "<<limit_val<<std::endl;
    malloc__<<<1,1>>>((1024*1024*7));
    cudaDeviceSynchronize();

    // change the limit
    status = cudaDeviceSetLimit(cudaLimitMallocHeapSize, (limit_val*2));
    status = cudaDeviceGetLimit(&limit_val, cudaLimitMallocHeapSize);
    std::cout<<"Status : "<<cudaGetErrorName(status)<<std::endl;
    std::cout<<"limit_val = "<<limit_val<<std::endl;
    malloc__2<<<1,1>>>((1024*1024*8));
    cudaDeviceSynchronize();
    return 0;
}

pjaaskel · 2023-06-15T11:43:44Z

https://reviews.llvm.org/rGa6213088812f this seems like an interesting work to build upon for device side malloc/free and possibly other services. @linehill

pvelesko added the enhancement New feature or request label Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calls to malloc/free from inside HIP kernels #175

Calls to malloc/free from inside HIP kernels #175

pvelesko commented Oct 2, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

Kerilk commented Oct 3, 2022

Sarbojit2019 commented Oct 6, 2022

pjaaskel commented Jun 15, 2023

Calls to malloc/free from inside HIP kernels #175

Calls to malloc/free from inside HIP kernels #175

Comments

pvelesko commented Oct 2, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

pjaaskel commented Oct 3, 2022

pvelesko commented Oct 3, 2022

Kerilk commented Oct 3, 2022

Sarbojit2019 commented Oct 6, 2022

pjaaskel commented Jun 15, 2023