Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowering StridedMemoryView attribues to typed efficient C/C++/Cython accessible values #180

Open
jakirkham opened this issue Oct 17, 2024 · 1 comment
Assignees
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Comments

@jakirkham
Copy link
Collaborator

Currently the attributes of StridedMemoryView are as follows:

# TODO: switch to use Cython's cdef typing?
ptr: int = None
shape: tuple = None
strides: tuple = None # in counts, not bytes
dtype: numpy.dtype = None
device_id: int = None # -1 for CPU
device_accessible: bool = None
readonly: bool = None
obj: Any = None

There is a todo in the code noting that this is worth converting to Cython types. Would also support this recommendation

When accessing things like shape or strides in Cython, one ideally wants a C array type of some form (pointer, typed-memoryview, dynamic array, etc.) that can easily be iterated over in C for-loops. As these attributes are currently accessing them requires calling the CPython API to get the length, each value, coerce them to C friendly types, etc.

This is especially important for things like ptr, which gets accessed regularly. So having a fast access C-type really helps


If we look at the Python Buffer Protocol (PEP 3118), they have the following definition for Py_buffer (their equivalent type):

typedef struct {
    void *buf;
    PyObject *obj;        /* owned reference */
    Py_ssize_t len;
    Py_ssize_t itemsize;  /* This is Py_ssize_t so it can be
                             pointed to by strides in simple case.*/
    int readonly;
    int ndim;
    char *format;
    Py_ssize_t *shape;
    Py_ssize_t *strides;
    Py_ssize_t *suboffsets;
    void *internal;
} Py_buffer;

They also require users to call PyObject_GetBuffer to produce a buffer object and PyBuffer_Release to release a buffer object. This handles any memory allocation/deallocation for shape, strides, etc.. It also handles refcounting for obj. This functionality is wonderful to use

The lack of these semantics has made working with DLPack a chore


Thinking about how to map the C-like struct above to Cython/Python. A few things stick out

Some of these Cython can translate between Python/Cython/C like Py_ssize_t (effectively ssize_t in C (PEP 353)) to Python int's when needed.

Others can be coerced like char* to bytes (though usually one will want to decode/encode to/from str)

Still others could be translated well by Cython as long as they are typed appropriately. For example void* doesn't translate well to Python. However uintptr_t does behave like a pointer in C (sometimes with a cast) and like a Python int. Cython will handle the translation for us. Similarly bint for readonly works better when capturing Python's bool semantics while still working in C.

With a typed memoryview, it is possible to wrangle Py_ssize_t* into something better behaved like Py_ssize_t[::1], which can then move more easily between Python & Cython.

Also note that format above is a string specifying the format type according to the Python Buffer Protocol. NumPy is also able to consume and produce such format strings

This led RAPIDS to this approach:

cdef class Array:
    cdef readonly uintptr_t ptr
    cdef readonly bint readonly
    cdef readonly object obj

    cdef readonly Py_ssize_t itemsize

    cdef readonly Py_ssize_t ndim
    cdef Py_ssize_t[::1] shape_mv
    cdef Py_ssize_t[::1] strides_mv

    cdef readonly bint cuda

If StridedMemoryView is used with C/C++ directly, it may make sense to actually type a public C struct in Cython. Then this could be leveraged in C/C++ code that can hand such objects to or receive them from CUDA-Python. For example

# filename: strided_memory_view.pxd

from libc.stdint cimport uintptr_t

cdef public struct CStridedMemoryView:
    uintptr_t ptr
    ssize_t* shape
    ssize_t* strides
    char* format
    int device_id
    bint device_accessible
    bint readonly
    void* obj

cdef class StridedMemoryView:
    CStridedMemoryView data
// filename: my_program.c

#include <stdbool.h>
#include <stdint.h>

#include <sys/types.h>

#include "strided_memory_view.h"


static const char* s = "Hello World!";
static const ssize_t s_len = 13;


int main() {
    CStridedMemoryView st;
    st.ptr = (uintptr_t)s;
    st.shape = &s_len;
    st.readonly = true;

    // ...

    return 0;
}

Though there can be other valid ways to go

@leofang leofang added enhancement Any code-related improvements triage Needs the team's attention P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Oct 17, 2024
@leofang leofang self-assigned this Oct 17, 2024
@leofang
Copy link
Member

leofang commented Dec 16, 2024

On the Oct 23 meeting we discussed and agreed that this is an important feature to support. The information provided by StridedMemoryView should be host-/device- accessible. Temporarily slating this for the beta 3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

No branches or pull requests

2 participants