Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39689: _struct: Avoid undefined behavior when loading native _Bool #18925

Closed
wants to merge 3 commits into from

Conversation

encukou
Copy link
Member

@encukou encukou commented Mar 11, 2020

To avoid undefined behavior in the C code, we assume that in
every mode, the size is 1, \x00 is false, and \x01 is true.

With that assumption we can cast from char to _Bool, and avoid UB.

https://bugs.python.org/issue39689

To avoid undefined behavior in the C code, we assume that in
every mode, the size is 1, '\x00' is false, and `\x01` is true.
@skrah skrah self-assigned this Mar 11, 2020
case '?': UNPACK_SINGLE(ld, ptr, _Bool); goto convert_bool;
// memcpy-ing values other than 0 or 1 to a _Bool variable triggers
// undefined behavior, so cast from char instead. See bpo-39689.
case '?': ld = (_Bool)*ptr; goto convert_bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this can trigger an unaligned access if sizeof(_Bool) > 1. Unaligned accesses that result in bus errors are the main reason for the memcpy().

Also I wonder if UB checkers can't get smart enough to figure out that e.g. (_Bool)*ptr == 42 and then complain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this can trigger an unaligned access if sizeof(_Bool) > 1. Unaligned accesses that result in bus errors are the main reason for the memcpy().

Ah! You are right, thank you. This PR won't work as is. I won't have time to work on this soon, so I'll close it in the mean time.

Also I wonder if UB checkers can't get smart enough to figure out that e.g. (_Bool)*ptr == 42 and then complain.

I don't have access to the C standard, but from what I've read, casting to _Bool will convert non-zero values to 1. So this should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should be fine.

Yes, I misread the construct as *(_Bool *)ptr), I now see that you are deliberately dereferencing the char pointer.

Hmm, the rank of _Bool is the smallest, but sizeof(char) can still be larger than sizeof(_Bool). So I wouldn't want this construct either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(char) is guaranteed to be 1. It can't be larger than sizeof(_Bool).
Alignment may be a different matter (or may not; I don't know).

@encukou encukou closed this Mar 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants