-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Document cuDF allocation alignment/padding requirements #9389
Comments
cc: @gmarkall @leofang @jakirkham for thoughts on:
|
This is something we can think about - I'll write a summary of thoughts / possibilities and open an issue on the Numba issue tracker (where CAI design discussions tend to take place) and link to here. |
I don't think the |
I agree that the CAI shouldn't necessarily be influenced by the Arrow spec, but the following does worry me:
This will result in poorly performing code in a lot of cases where we zero copy at the moment. For example, slices: In [6]: arr = cp.array([1, 2, 3, 4], dtype="int8")
In [7]: arr.__cuda_array_interface__
Out[7]:
{'shape': (4,),
'typestr': '|i1',
'descr': [('', '|i1')],
'stream': 1,
'version': 3,
'strides': None,
'data': (140649290531840, False)}
In [8]: arr[1:].__cuda_array_interface__
Out[8]:
{'shape': (3,),
'typestr': '|i1',
'descr': [('', '|i1')],
'stream': 1,
'version': 3,
'strides': None,
'data': (140649290531841, False)} |
Yes, if we require a 64B padded allocation that would be the case. Alternatively we'd have to do a bunch of non-trivial work and add complexity into |
It seems to me like if CAI wants to play ball in the Arrow world, it needs to be rebased on top of the Arrow physical memory spec. Otherwise, it defeats the purpose of having a common spec for memory layout to enable zero-copy interaction among libraries, e.g., cuDF can't assume anything about padding/alignment that would otherwise be guaranteed by the Arrow spec, so we would need to deep copy data into Arrow memory layout. This may not have caused any problems yet, because cuDF doesn't really exploit the alignment/padding guarantees of Arrow (yet). Likewise, users of CAI are likely implicitly following the rule of ensuring data is naturally aligned (otherwise CUDA would error on a misaligned address), but Arrow is stricter still saying all allocations should be at least 8B aligned. |
CAI is an array protocol and not Arrow specific in any way so this is out of scope for it. What could be in scope is supporting a way to describe an underlying allocation separate from the actual array data, and that is likely what would be needed to support the Arrow padding case that we want to take advantage of here. Also, the same problem exists for dlpack as it does for CAI. |
I get that CAI is just a way to pass around pointers and is solving a somewhat different (but still very similar) problem than Arrow. What is a library maintainer supposed to do if they build on the Arrow spec and say "I'll work on your data zero-copy if you follow the Arrow spec!"? Suddenly, CAI/dlpack break that promise and now the whole premise of Arrow eliminating "copy&convert" is eroded and we're back to a situation where you have to deep copy things just to ensure it's in the format you expect. |
I just realized that dlpack spec does indicate that pointers should be 256 byte aligned (https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h#L139-L141), but I'm not sure if we're following that properly. Doesn't look like CuPy follows this as expected for sliced arrays either:
|
cc @leofang ⬆️ |
There's no way to make an arbitrarily sliced array be 256B aligned. The Arrow spec states that allocations should be 8B (64B) aligned, but not that every pointer (e.g., for a sliced/offset array) must have such alignment. In fact, in libcudf, there's no way to differentiate a pointer to the beginning of an allocation vs a pointer inside an allocation. I suppose Arrow can do this because all Arrow data structures are owning with a So that means in libcudf we can never require a pointer have stricter alignment than the natural alignment of the underlying type. That said, we could say that the pointer points inside an allocation that is at least 64B aligned such that it is safe to deference |
Sorry I'm late to the party! On DLPack's 256B alignment requirement, I opened an issue in the Array API repo to discuss: data-apis/array-api#293. I (and others in that thread) agree with Jake's assessment that there's no way to make arbitrarily sliced arrays 256B aligned. |
This issue has been labeled |
This issue has been labeled |
Based on recent discussions on this issue, Arrow's padding requirements are actually far less strict than we thought. 64 byte padding is recommended solely for the usual performance reasons (e.g. ability to use vector instructions), and 8 byte padding is only required at the level of IPC packets (which cudf does not currently support) and not individual columns. For a comprehensive summary on our discussions, see rapidsai/rmm#865 (comment). With that in mind, I think we can close this issue since there are no alignment/padding requirements to be documented here. @jrhemstad please reopen if you think there is still some action to be taken on this issue. |
Is your feature request related to a problem? Please describe.
cuDF informally states that we follow the Arrow physical memory layout spec. However, we're a little loose on some of the particulars, especially around alignment/padding.
Arrow requires all allocations be 8B aligned and padded to 8B sizes. It recommends expanding this to 64B aligned and 64B padded.
cuDF isn't (formally) following this. Implicitly, we know all allocations made through RMM will be 256B aligned. As it happens, RMM will currently pad out to 8B (though it shouldn't do that) . However, due to zero-copy slicing, we can't assume
column.data<T>()
pointer will be 256B aligned, and can only expect the pointer alignment to be at leastalignof(T)
.Specifically for null masks, we expect/allocate the null mask of a column to always be padded to 64B.
Describe the solution you'd like
At the very least, we should write down our current expectations. These requirements can be best captured in requirements of the
column_view
class, ascolumn_view
is the arbiter of how we interact will all memory consumed by libcudf.data
must be aligned to at least the alignment of the underlying type[data, data + num_elements * size_of(data_type))
INT8
allocation to be any size, whereas Arrow would require it to be padded to a multiple of 8 bytes.null_mask
must be aligned toalignof(bitmask_type)
null_mask
must point to an allocation padded to a multiple of 64 bytes such that it is safe to dereference any address in the range[null_mask, null_mask + bitmask_allocation_size_bytes(num_elements))
Additionally, we should consider expanding the requirement for allocation padding to make it such that all allocations be padded out to 64B. This could be achieved easily enough with a
device_memory_resource
adapter that rounds allocations up to at least 64B.Additional Considerations
As best I can tell, these requirements are technically incompatible with the CUDA Array Interface because the CAI doesn't say anything about padding/alignment. This seems like something we should push to be addressed in the CAI.
The text was updated successfully, but these errors were encountered: