Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted UBSAN cleanups #55112

Merged
merged 5 commits into from
Sep 13, 2023
Merged

Assorted UBSAN cleanups #55112

merged 5 commits into from
Sep 13, 2023

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Sep 12, 2023

Found the first few from running the io test suite. Will be stuck at #55111 which requires closer attention than I wanted to tackle here

@@ -624,9 +664,12 @@ cdef int64_t convert_reso(
else:
# e.g. ns -> us, risk of overflow, but no risk of lossy rounding
mult = get_conversion_factor(from_reso, to_reso)
with cython.overflowcheck(True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a cython bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a bug per se. I think Cython let's the overflow happen but then adds checks after the fact to see if it overflowed. This by contrast prevents the overflow from happening in the first place. It generally gets you to the same place in the end

Copy link
Member Author

@WillAyd WillAyd Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Cython generates something like this:

static CYTHON_INLINE int __Pyx_mul_const_int_checking_overflow(int a, int b, int *overflow) {
    if (b > 1) {
        *overflow |= a > __PYX_MAX(int) / b;
        *overflow |= a < __PYX_MIN(int) / b;
    } else if (b == -1) {
        *overflow |= a == __PYX_MIN(int);
    } else if (b < -1) {
        *overflow |= a > __PYX_MIN(int) / b;
        *overflow |= a < __PYX_MAX(int) / b;
    }
    return a * b;
}

We aren't handling a negative denominator, but otherwise yea the difference is Cython still does the multiplication and just sets an overflow variable if something overflows; we are not doing the multiplication at all in this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my edification, this pattern is considered Better Practice than the one cython uses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only difference here is that this will make the sanitizer happy whereas the cython approach will not

if value > overflow_limit or value < -overflow_limit:
raise OverflowError("result would overflow")

# Note: caller is responsible for re-raising as OutOfBoundsTimedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this comment should go up a line with the OverflowError?

overflow_limit = INT64_MAX // 7
if value > overflow_limit or value < -overflow_limit:
raise OverflowError("result would overflow")
return 7 * value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we de-dup some of this with e.g.

if ...
value = get_conversion_factor(...
factor = 7
elif ...
value = get_conversion_factor(...
factor = 24
...

overflow_limit = INT64_MAX // factor
if value > ...
raise OverflowError(...)
return factor * value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea no problem - great idea

@jbrockmendel
Copy link
Member

tests?

@WillAyd
Copy link
Member Author

WillAyd commented Sep 13, 2023

UBSAN is a runtime check, so all of these were hit from values in the existing test suite. Nothing needs to be added

@mroeschke mroeschke added this to the 2.2 milestone Sep 13, 2023
@mroeschke mroeschke added the Internals Related to non-user accessible pandas implementation label Sep 13, 2023
@mroeschke mroeschke merged commit f00efd0 into pandas-dev:main Sep 13, 2023
36 checks passed
@mroeschke
Copy link
Member

Thanks @WillAyd

@WillAyd WillAyd deleted the asan-fixups branch September 13, 2023 23:46
hedeershowk pushed a commit to hedeershowk/pandas that referenced this pull request Sep 20, 2023
* first round of fixes

* fix up includes

* updates

* dedup logic

* move comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants