Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mbedtls_mpi_self_test fails on Cortex M33 microcontroller #5266

Closed
lohralfen opened this issue Dec 6, 2021 · 15 comments
Closed

mbedtls_mpi_self_test fails on Cortex M33 microcontroller #5266

lohralfen opened this issue Dec 6, 2021 · 15 comments
Labels
bug component-crypto Crypto primitives and low-level interfaces

Comments

@lohralfen
Copy link

lohralfen commented Dec 6, 2021

Using STM32L562ZE chip, the mbedtls self test fails with the exact same issue as described in #4116 #4943

OS
None (bare metal)

mbedTLS build:
v3.0.0 tag, also tested with v2.27.0 tag with same result

Expected behavior:
All self tests pass

Actual behavior:
Depending on optimization (at least -Os, -O3) the 'exp_mod' self test fails

CALLOC(0): passed (distinct non-null)
CALLOC(1): passed
CALLOC(1 again): passed (same address)

SHA-224 test #1: passed
SHA-224 test #2: passed
SHA-224 test #3: passed
SHA-256 test #1: passed
SHA-256 test #2: passed
SHA-256 test #3: passed

SHA-384 test #1: passed
SHA-384 test #2: passed
SHA-384 test #3: passed
SHA-512 test #1: passed
SHA-512 test #2: passed
SHA-512 test #3: passed

AES-ECB-128 (dec): passed
AES-ECB-128 (enc): passed
AES-ECB-192 (dec): passed
AES-ECB-192 (enc): passed
AES-ECB-256 (dec): passed
AES-ECB-256 (enc): passed

AES-CTR-128 (dec): passed
AES-CTR-128 (enc): passed
AES-CTR-128 (dec): passed
AES-CTR-128 (enc): passed
AES-CTR-128 (dec): passed
AES-CTR-128 (enc): passed

Base64 encoding test: passed
Base64 decoding test: passed

MPI test #1 (mul_mpi): passed
MPI test #2 (div_mpi): passed
MPI test #3 (exp_mod): failed
Unexpected error, return code = 00000001

ECP SW test #1 (constant op_count, base point G): passed
ECP SW test #2 (constant op_count, other point): failed (0)
Unexpected error, return code = FFFFB380

Executed 7 test suites

[ 2 tests FAIL ]

Steps to reproduce:
Run the self tests of mbedTLS

I'm guessing it has to do with this piece of code?
https://github.com/ARMmbed/mbedtls/blob/development/library/bn_mul.h#L715

This file has been adjusted 1 at a time for each architecture, but it seems not all architectures have been fixed yet..

@lohralfen
Copy link
Author

lohralfen commented Dec 6, 2021

Commenting out #define MBEDTLS_HAVE_ASM makes the exp_mod test pass, but the "ECP SW test #2" still fails.

Edit due to new insights: it only passes sometimes with this change. Not always.

@daverodgman daverodgman added the bug label Dec 6, 2021
@daverodgman daverodgman added the component-crypto Crypto primitives and low-level interfaces label Dec 6, 2021
@davidhorstmann-arm
Copy link
Contributor

Just to confirm, are you using a recent version of clang?
If so, what is your target triple? I wasn't able to reproduce the issue on arm or thumb when I tried.

@gilles-peskine-arm
Copy link
Contributor

Commenting out #define MBEDTLS_HAVE_ASM makes the exp_mod test pass, but the "ECP SW test #2" still fails.

That is surprising: I'm not aware of a similar bug that doesn't involve one of the assembly snippets. Can you run the unit tests on your platform?

@lohralfen
Copy link
Author

lohralfen commented Dec 7, 2021

Just to confirm, are you using a recent version of clang? If so, what is your target triple? I wasn't able to reproduce the issue on arm or thumb when I tried.

We're using arm-none-eabi-gcc (GNU Arm Embedded Toolchain 9-2020-q2-update) 9.3.1 20200408 (release)

I'm not sure what a target triple is, but we're using a custom target based on STM32L562ZE chip.

Compile options for bignum.c (which includes bn_mul.h)
/opt/gcc-arm-none-eabi-9-2020-q2-update/bin/arm-none-eabi-gcc -DHAL_SPI_MODULE_ENABLED -DMBEDTLS_CONFIG_FILE="<mbedtls_config.h>" -DSTM32L562xx -I<snipped> -O3 -DNDEBUG -mcpu=cortex-m33 -mthumb -fdata-sections -fno-builtin-printf -fno-builtin-fprintf -ffunction-sections -Wall -g -O3 -std=c99 -o <snip>/bignum.c.obj -c <snip>/bignum.c

mbedtls config:
mbedtls_config.txt

@lohralfen
Copy link
Author

lohralfen commented Dec 7, 2021

Commenting out #define MBEDTLS_HAVE_ASM makes the exp_mod test pass, but the "ECP SW test #2" still fails.

That is surprising: I'm not aware of a similar bug that doesn't involve one of the assembly snippets. Can you run the unit tests on your platform?

Can you elaborate? The unit tests are part of the self test, right? I performed those several times, you can see the results at the top of this issue. The error I'm getting with the other test seems to be -0x4c80 (invalid key)

I'm a bit confused, because the failing unit tests seem to not be consistent. I'm getting failed test on exp_mod sometimes even with MBEDTLS_HAVE_ASM undefined, depending on if I add instructions elsewhere in the code....

Right now it seems that undefining MBEDTLS_HAVE_ASM and defining MBEDTLS_NO_UDBL_DIVISION together solve both of the failing tests, but I can't say I understand at all.

What also works, is setting entire bignum.c module to compile with -O0 with a pragma at top of file.

@gilles-peskine-arm
Copy link
Contributor

The unit tests are part of the self test, right?

No: program/test/selftest is only a very basic set of tests. Mbed TLS has more extensive unit tests in tests/suites. They do require a few more platform functions than the library does, so it might take more work to run them on a baremetal system.

the failing unit tests seem to not be consistent.

That part could still be the assembly bug: they cause data corruption, and so far we've only observed corruption that led to massive operation failures and crashes, but it could be something more subtle that only corrupts useful data in some specific heap patterns.

I'm getting failed test on exp_mod sometimes even with MBEDTLS_HAVE_ASM undefined

But that can't be the assembly bug. #4116 is a bug in the assembly code, which didn't correctly declare to the compiler what memory the assembly accessed. With MBEDTLS_HAVE_ASM, the buggy code is not compiled. So this has to be a different problem.

Random shot in the dark: does increasing the stack size or the heap size help?

@lohralfen
Copy link
Author

@gilles-peskine-arm I will try and get unit tests running.

As for the stack/heap stuff, I already verified that this isn't the problem, by doing the following things:

  1. Increase stack & heap size
  2. Make mbedTLS use a static buffer instead of heap
  3. Check during runtime if we are getting close to overflowing (not the case)

There is barely any custom code running at the moment, mostly just the mbedtls self test.

@davidhorstmann-arm
Copy link
Contributor

We're using arm-none-eabi-gcc (GNU Arm Embedded Toolchain 9-2020-q2-update) 9.3.1 20200408 (release)

Hmm that's interesting - thanks. This may be a different problem then, as #4116 and friends seem to be caused by the improved optimisations of newer versions of clang. The compiler you're using is not the latest gcc, so we'd likely have noticed by now if it had the same issue.

I'm not sure what a target triple is, but we're using a custom target based on STM32L562ZE chip.

The 'target triple' is a clang term for a 3-part platform description (e.g. "arm-linux-gnu") so not relevant in your case.

@lohralfen
Copy link
Author

lohralfen commented Dec 7, 2021

The unit tests are part of the self test, right?

No: program/test/selftest is only a very basic set of tests. Mbed TLS has more extensive unit tests in tests/suites. They do require a few more platform functions than the library does, so it might take more work to run them on a baremetal system.

Sadly I don't really see how I'd be able to run the unit tests on my target. The target is very limited. I see a bunch of .data and .function files, and I would like to run specific ecp and mpi tests, but I'm unsure how.

@lohralfen
Copy link
Author

It's also a bit inexplicable at this point. I have reduced the code to a point where the only things that are being done are:

  1. HAL_Init()
  2. initSysClocks (CubeMX generated code to set clocks to HSE/LSE or HSI, tested both)
  3. initConsoleUart (using STM LL drivers)
  4. mbedtls_self_test

There are several things that somehow cause all unit tests to consistently pass, but I don't have an explanation:

  1. Pragma -O0 at top of bignum.c
  2. #undef MBEDTLS_HAVE_ASM and #define MBEDTLS_NO_UDBL_DIVISION
  3. Not call initSysClocks (which leaves clocks at 4 MHz instead of 110 MHz)

@gilles-peskine-arm
Copy link
Contributor

Not call initSysClocks (which leaves clocks at 4 MHz instead of 110 MHz)

That sounds like a hardware problem, or a bug in some system initialization or driver code that sets the hardware into an unstable state. Can you reproduce the problem on another chip? Can you reproduce something similar with another program that's both RAM- and CPU-intensive?

@lohralfen
Copy link
Author

lohralfen commented Dec 7, 2021

Not call initSysClocks (which leaves clocks at 4 MHz instead of 110 MHz)

That sounds like a hardware problem, or a bug in some system initialization or driver code that sets the hardware into an unstable state. Can you reproduce the problem on another chip? Can you reproduce something similar with another program that's both RAM- and CPU-intensive?

Although I can't pinpoint the exact underlying cause, it seems like this is the case. When generating the clock config code using LL drivers instead of HAL, I see a bunch more while loops that wait until certain configuration is actually applied. Using this new clock configuration code, the issue is not reproducible anymore on any setup, with any optimization. I still can't really understand how this would manifest in EC/MPI operations failing in mbedTLS, though.

@gilles-peskine-arm
Copy link
Contributor

Hi @lohralfen,

I wonder if you've made any progress on your side? Have you been able to confirm whether the problem is with the platform setup or with Mbed TLS? Can you reproduce the failure with mbedtls-3.1.0?

@lohralfen
Copy link
Author

Hello @gilles-peskine-arm, I concluded there was not much more to test since we've ran a lot of tests with plenty of reboots and the issue does not occur anymore with updated clock configuration code. It would take a lot of time to go through the driver code of ST and pinpoint the exact issue. I will try to make some time on Monday to run the original code with mbedtls-3.1.0.

@gilles-peskine-arm
Copy link
Contributor

Thanks for reporting back! Given that one problem (the inline assembly code) is known to be fixed, and that the other problem (buggy execution even with assembly disabled) is confirmed to be fixed by changing clock settings, there does not seem to be any problem with Mbed TLS, so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component-crypto Crypto primitives and low-level interfaces
Projects
None yet
Development

No branches or pull requests

4 participants