Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] There is no overflow checking on alignment #1156

Closed
revans2 opened this issue Nov 10, 2022 · 10 comments · Fixed by #1278
Closed

[BUG] There is no overflow checking on alignment #1156

revans2 opened this issue Nov 10, 2022 · 10 comments · Fixed by #1278
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@revans2
Copy link
Contributor

revans2 commented Nov 10, 2022

Describe the bug
I know it feels a bit far fetched that anyone would ask to allocate 18446744073709551612 bytes (about 16 exa-bytes). But if someone does the allocation will succeed and return nullptr as the result. The reason for this is because there is no overflow checking in align_up, and align_up is called for all device allocations
, so when it is aligned up and overflows the size becomes 0 bytes, which succeeds.

The problem really starts to show up when using CUDF with RMM. CUDF uses an int32_t for a lot of size calculations. Because it is signed it can overflow to a negative number, and when it does it gets translated into a really large positive number. The example came from CUDF when it overflowed and tried to allocate -4 bytes when doing a string concatenation. This resulted in the call succeeding and an error happening after the fact when we tried to use the result some other place and an assertion went off about having a negative sized length in one of the columns.

Steps/Code to reproduce bug
Try to allocate a buffer of 18446744073709551612 bytes. It will succeed.

Expected behavior
Some kind of an exception preferably a std::bad_alloc is thrown.

@revans2 revans2 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 10, 2022
@harrism
Copy link
Member

harrism commented Nov 11, 2022

We just entered burndown for 22.12. how urgent is this?

@jrhemstad
Copy link
Contributor

Easy way to fix this is to just get rid of the align_up in the first place #865 👀

@revans2
Copy link
Contributor Author

revans2 commented Nov 11, 2022

Hopefully it is very rare to have this happen, but there is the potential to walk off the end of memory and cause data corruption. This is true with or without this fix. As such I would say it would really be nice to fix it sooner than later, but I am not going to go to the mat to get this fixed.

@harrism
Copy link
Member

harrism commented Nov 14, 2022

Let's aim for 23.02. Need to discuss with cuDF team (CC @GregoryKimball @vuule @shwina @bdice @vyasr)

I would like to remove the default padding from RMM as discussed in #865 and have cuDF use it's own adapter for padding for Arrow requirements (or getting away with UB requirements -- though I hope that doesn't still exist in cuIO!)

Please discuss.

@bdice
Copy link
Contributor

bdice commented Mar 14, 2023

@harrism Do you imagine that such a "padding adapter" would be implemented in rmm, or in cudf? I think it is of general enough interest that making it live in rmm would be most appropriate, but I am open to others' thoughts on that.

@harrism
Copy link
Member

harrism commented Mar 14, 2023

First step is to decide if all RAPIDS libraries will need this padding. If it's a requirement for Arrow compatibility, then I suspect so -- cuSpatial definitely is using GeoArrow format, for example.

Also for discussion -- what happens when a user calls a libcudf API and passes a non-arrow-compliant MR? (i.e. doesn't use this padding adapter).

@vyasr
Copy link
Contributor

vyasr commented Mar 14, 2023

There is a long discussion around cudf padding requirements in rapidsai/cudf#9389. That issue originally just requested documentation and remains open. We should resolve that issue as part of any push here since we need to concretely lay out cudf's requirements are in order to ensure that we make rmm changes in a way that is safe for cudf. In general we are definitely going to require this padding for any library that relies on Arrow types.

The case where a user provides a non-arrow-compliant MR seems pretty gnarly... my first thought is that we end up always needing to wrap input mrs in a padding adapter to be safe, but I really don't like that and will continue brainstorming.

Also CC @kkraus14

@ahendriksen
Copy link
Contributor

Could something like raft`s round_up_safe help in this case? It can detect overflow when rounding up.

@harrism
Copy link
Member

harrism commented Mar 20, 2023

We can add overflow detection, but the point of discussion is whether we should be aligning here in the first place.

@vyasr vyasr self-assigned this Apr 20, 2023
@vyasr
Copy link
Contributor

vyasr commented Apr 20, 2023

See #865 (comment) for the final decision. tl;dr we're going to get rid of alignment altogether, so there won't be any overflow issues. This will happen early in 23.08.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants