-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number of allocations in ECP operations #3512
Reduce the number of allocations in ECP operations #3512
Conversation
98b214b
to
4796c69
Compare
a8a1cd8
to
527e6fb
Compare
Note: there are a number of CI failures which seem real. |
c1528dc
to
8873a50
Compare
The massive build failures were due to a bad rebase: originally “Fix missing mbedtls_psa_crypto_free in test case” added missing cleanup to a test case, and the automatic rebase added that cleanup to the wrong test case. Fortunately the correct test case had been fixed independently in the meantime (in feb0396). After that I fixed some missing compilation guards and CI is now looking good. |
If you pass a curve name to the benchmark program, the ECDH and ECDSA benchmarks will only run for that particular curve. By default, all curves are benchmarked. To simplify the implementation, if you pass multiple curves, only the last one will be benchmarked. Signed-off-by: Gilles Peskine <[email protected]>
Reduce the code size, stack consumption and heap consumption in fix_negative by encoding the special-case subtraction manually. * Code size: ecp_curves.o goes down from 7837B down to 7769 in a sample Cortex-M0 build with all curves enabled. The savings come from not having to set up C in INIT (which is used many times) and from not having to catch errors in fix_negative. * Stack consumption: get rid of C on the stack. * Heap: mbedtls_mpi_sub_abs with destination == second operand would make a heap allocation. The new code doesn't do any heap allocation. * Performance: no measurable difference. Signed-off-by: Gilles Peskine <[email protected]>
mbedtls_mpi_sub_abs systematically allocated a new mpi when the result was aliased with the right operand (i.e. X = A - X). This aliasing very commonly happens during ECP operations. Rewrite the function to allocate only if the result might not fit otherwise. This costs a few bytes of code size in bignum.o, and might make mbedtls_mpi_sub_abs very very slightly slower when no reallocation is done. However, there is a substantial performance gain in ECP operations with Montgomery curves (10-20% on my PC). test_suite_ecp drops from 1422794 to 1271506 calls to calloc(). This commit also fixes a bug whereby mbedtls_mpi_sub_abs would leak memory when X == B (so TB was in use) and the result was negative. Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Rewrite mbedtls_mpi_mul_int to call mpi_mul_hlp directly rather than create a temporary mpi object. This has the benefit of not performing an allocation when the multiplication is in place (mpi operand aliased with the result) and the result mpi is large enough. This saves about 40% of the calloc() calls in test_suite_ecp. There is no measurable performance difference on my Linux PC. The cost is a few bytes in bignum.o. When there is no aliasing, or when there is aliasing but the mpi object needs to be enlarged, the performance difference is negligible. Signed-off-by: Gilles Peskine <[email protected]>
If c == 0, no need to add it to *d. Signed-off-by: Gilles Peskine <[email protected]>
I meant to review this because it's of interest to me, but I don't have bandwidth at the moment, so I'm removing myself and labeling "needs: reviewer" again in case somebody else wants to pick this up. |
Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Right. Originally this pull request contained a few minor bug fixes that I found while I was working on the allocations, but I made a separate PR for the bug fixes so they could get in faster. |
Write a simple unit test for mbedtls_ecp_muladd(). Add just one pair of test cases. Mbed-TLS#2 fails since PR Mbed-TLS#3512. Thanks to Philippe Antoine (catenacyber) for the test case, found by ecfuzzer. Signed-off-by: Gilles Peskine <[email protected]>
Write a simple unit test for mbedtls_ecp_muladd(). Add just one pair of test cases. Mbed-TLS#2 fails since PR Mbed-TLS#3512. Thanks to Philippe Antoine (catenacyber) for the test case, found by ecfuzzer. Signed-off-by: Gilles Peskine <[email protected]>
Write a simple unit test for mbedtls_ecp_muladd(). Add just one pair of test cases. Mbed-TLS#2 fails since PR Mbed-TLS#3512. Thanks to Philippe Antoine (catenacyber) for the test case, found by ecfuzzer. Signed-off-by: Gilles Peskine <[email protected]>
Write a simple unit test for mbedtls_ecp_muladd(). Add just one pair of test cases. #2 fails since PR #3512. Thanks to Philippe Antoine (catenacyber) for the test case, found by ecfuzzer. Signed-off-by: Gilles Peskine <[email protected]>
Reduce the number of allocations in ECP operations
Write a simple unit test for mbedtls_ecp_muladd(). Add just one pair of test cases. #2 fails since PR #3512. Thanks to Philippe Antoine (catenacyber) for the test case, found by ecfuzzer. Signed-off-by: Gilles Peskine <[email protected]>
Results from #3511 show that the ECP code is where the library is at it worst when it comes to calls to
calloc()
. The goal of this PR is to make a few easy improvements.I think there are more low-hanging fruit, but there's already a significant reduction in the number of allocations (~halved the allocations in
test_suite_ecp
). Some of the improvements result in a performance improvements, others have no measurable impact on performance. The code size is almost unchanged.Also:
mbedtls_ecp_curve_list()
now lists Montgomery curves. This was done in Everest integration ARMmbed/mbed-crypto#140 but accidentally reverted in the same PR.An earlier version of this PR included a framework for instrumenting and measuring memory allocations. This framework is still experimental and I've removed it from the history. It's available in https://github.com/gilles-peskine-arm/mbedtls/tree/ecp-alloc-202007-3 and in #3511 if you want, but if all you want is to count calloc calls (as opposed to figuring out what triggers them) ordinary profiling is enough.
An earlier version of this PR included some bug fixes which have now been merged separately through #3942.