-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Move Hub fails to raise TypeError #950
Comments
Type checks are correct, but raising the error seems to fail. Mysteriously, if I add a debug print, then it does raise where it should: diff --git a/pybricks/util_mp/pb_obj_helper.c b/pybricks/util_mp/pb_obj_helper.c
index cf397d1d4..ce20cb965 100644
--- a/pybricks/util_mp/pb_obj_helper.c
+++ b/pybricks/util_mp/pb_obj_helper.c
@@ -123,6 +123,7 @@ mp_obj_t pb_obj_get_base_class_obj(mp_obj_t obj, const mp_obj_type_t *type) {
void pb_assert_type(mp_obj_t obj, const mp_obj_type_t *type) {
if (!mp_obj_is_type(obj, type)) {
#if MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE
+ mp_printf(&mp_plat_print, "Raises if debug print present?\n");
mp_raise_TypeError(NULL);
#else
mp_raise_msg_varg(&mp_type_TypeError, MP_ERROR_TEXT("can't convert %s to %s"), Does this get optimized away somehow? |
Indeed, that seems to be the case. For future reference, the way to troubleshoot this is to run:
and search for the symbol in question. If it is not found, add |
Awesome, thanks for the hint to debug this in the future! |
Adding an FYI for @jimmo in case this ever happens to upstream MicroPython. tl;dr: If things like |
That's very interesting! @projectgus FYI I note that So it seems very surprising (to me at least) that the compiler thinks its allowed to optimise this out. What GCC version are you using? |
If you're interested in seeing the exact steps to reproduce the build let us know. Most of the steps are here, but you probably already have most dependencies. So just clone our repo prior to fixing this issue, e.g. this one, and run For context, this was only happening for our smallest build target, Move Hub with STM32F070RB6. |
Thanks for the heads-up @jimmo, you know I love some good compiler weirdness. 😁 I couldn't reproduce with gcc 12, but if I download the mentioned ARM toolchain 2021.07 compiler release then I get it straight away on the pre-fixed commit. I had a quick look at gdb disassembly (building with
So hopefully whatever the root cause bug is/was, it's been fixed in gcc. A quick look in their bug tracker didn't find anything, but I'm also not exactly sure what to look for there. |
FYI, this is workaround comes straight from the gcc docs on the |
I see, thanks @dlech. I think I had a partial understanding of this, but I hadn't seen this note and I now realise I had an out of date assumption about inline asm and side effects.[*] BTW, do I have it right that none of the functions in this call stack are explicitly marked The documentation about this workaround says it is when a "function does not have side effects", so a precondition should be that gcc has decided Does that match your understanding? [*] My new best guess of the mechanism is: Some older version of gcc would incorrectly mark functions whose side effects only happened via inline asm as "side effect free". Newer versions work around this by marking any function containing inline asm as having side effects, leading to this becoming the documented workaround for explicitly marking a function as having side effects. |
So, got curious enough to dig in linker intermediate dumps and learn a bit more gcc internals. I think this a gcc intermediate analysis bug. Passing In the gcc 12 output, without the fix patch but with correct behaviour, one of the dumps includes:
I don't understand gcc RTL, but the thing to note here is the line If we go back to gcc 10, still without the fix included, so we expect a miscompile:
Now the function is analysed only as "locally pure", not also as "locally looping". So gcc has incorrectly determined it will always return. Add the fix patch "asm" line back in and rebuild:
Now the function is "locally looping" again, but I think only because of the inline asm workaround... I'm now 99% sure this is a gcc bug because in all three versions (including the version with the bug) the callee function
... so there shouldn't be any way that a function which calls that function should not be marked as "looping" itself. [*] I initially passed |
That is correct. Although when I was investigating the problem I temporarily added to try to find the function in case it was being inlined. And luckily, this led me to the workaround. When I added
Yes, that seems to be the case. The function is used in quite a few places, so it makes sense that it wouldn't be inlined. Thanks for the additional analysis. Always handy to know the right magic gcc options to help debug stuff like this. |
Was discussing this with @jimmo and he dug up this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103052 This looks like it's probably either the root cause bug, or a bug whose fix happens to fix this case as well. Fix looks to be in gcc 10.4 onwards, but if using the ARM GNU Toolchains then the first one with the fix is likely their 11.3.Rel1 release. |
For unknown reasons, GCC 10 LTO is optimizing out pb_assert_type() (presumably because it thinks the function has no side effects as noinline has no effect). The GCC recommended workaround for such issues is to include and empty inline assembly statement to prevent the function from being optimized out. Fixes: pybricks/support#950
Describe the bug
The Move Hub does not correctly parse some arguments.
This leads to undefined behavior, which may or may not lead to the crash in https://github.com/orgs/pybricks/discussions/949
To reproduce
Expected behavior
Raise
TypeError
The text was updated successfully, but these errors were encountered: