-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] There is no overflow checking on alignment #1156
Comments
We just entered burndown for 22.12. how urgent is this? |
Easy way to fix this is to just get rid of the |
Hopefully it is very rare to have this happen, but there is the potential to walk off the end of memory and cause data corruption. This is true with or without this fix. As such I would say it would really be nice to fix it sooner than later, but I am not going to go to the mat to get this fixed. |
Let's aim for 23.02. Need to discuss with cuDF team (CC @GregoryKimball @vuule @shwina @bdice @vyasr) I would like to remove the default padding from RMM as discussed in #865 and have cuDF use it's own adapter for padding for Arrow requirements (or getting away with UB requirements -- though I hope that doesn't still exist in cuIO!) Please discuss. |
@harrism Do you imagine that such a "padding adapter" would be implemented in rmm, or in cudf? I think it is of general enough interest that making it live in rmm would be most appropriate, but I am open to others' thoughts on that. |
First step is to decide if all RAPIDS libraries will need this padding. If it's a requirement for Arrow compatibility, then I suspect so -- cuSpatial definitely is using GeoArrow format, for example. Also for discussion -- what happens when a user calls a libcudf API and passes a non-arrow-compliant MR? (i.e. doesn't use this padding adapter). |
There is a long discussion around cudf padding requirements in rapidsai/cudf#9389. That issue originally just requested documentation and remains open. We should resolve that issue as part of any push here since we need to concretely lay out cudf's requirements are in order to ensure that we make rmm changes in a way that is safe for cudf. In general we are definitely going to require this padding for any library that relies on Arrow types. The case where a user provides a non-arrow-compliant MR seems pretty gnarly... my first thought is that we end up always needing to wrap input mrs in a padding adapter to be safe, but I really don't like that and will continue brainstorming. Also CC @kkraus14 |
Could something like raft`s round_up_safe help in this case? It can detect overflow when rounding up. |
We can add overflow detection, but the point of discussion is whether we should be aligning here in the first place. |
See #865 (comment) for the final decision. tl;dr we're going to get rid of alignment altogether, so there won't be any overflow issues. This will happen early in 23.08. |
Describe the bug
I know it feels a bit far fetched that anyone would ask to allocate 18446744073709551612 bytes (about 16 exa-bytes). But if someone does the allocation will succeed and return
nullptr
as the result. The reason for this is because there is no overflow checking inalign_up
, andalign_up
is called for all device allocations, so when it is aligned up and overflows the size becomes 0 bytes, which succeeds.
The problem really starts to show up when using CUDF with RMM. CUDF uses an
int32_t
for a lot of size calculations. Because it is signed it can overflow to a negative number, and when it does it gets translated into a really large positive number. The example came from CUDF when it overflowed and tried to allocate -4 bytes when doing a string concatenation. This resulted in the call succeeding and an error happening after the fact when we tried to use the result some other place and an assertion went off about having a negative sized length in one of the columns.Steps/Code to reproduce bug
Try to allocate a buffer of 18446744073709551612 bytes. It will succeed.
Expected behavior
Some kind of an exception preferably a
std::bad_alloc
is thrown.The text was updated successfully, but these errors were encountered: