Improve FFI's botan_cipher_update() performance for stream ciphers #3951

reneme · 2024-03-27T13:19:32Z

Pull Request Dependencies

Description

This reworks the implementation of FFI botan_cipher_update(), making it comply with the general "rules of engagement" of the FFI. Note that this subtly changes the behavior when finalizing a cipher without providing a big enough output buffer up-front.

When the caller failed to provide a big enough output buffer, it now returns BOTAN_FFI_ERROR_INSUFFICIENT_BUFFER_SPACE instead of BOTAN_FFI_ERROR_INVALID_INPUT and sets output_written to the required number of bytes. In that case, re-invoking botan_cipher_update() with the "final" flag and a sufficiently sized output buffer then obtains the remaining bytes and resets the cipher object.

Before, the re-invocation had to happen without setting the final flag again to obtain the same behavior.
@randombit Frankly, I doubt that the existing implementation supported such a usage explicitly. It did feel like an inadvertent hack to me. That's why I changed it to an (IMHO) more logical interface. Nevertheless, there might be an argument to keep the existing behavior, albeit inconsistent with the generic "rules of engagement".

The rework should also significantly improve the performance of stream cipher modes (e.g. CTR-mode) via the FFI by opportunistically leveraging the ideal_update_granularity() of the cipher mode. Before, we generally processed data in chunks of update_granularity(), which is "1 byte" for stream ciphers. That resulted in very poor performance, as described in #3925.

TODO: We might look into the internal function ffi_choose_update_size() and whether it still makes sense in this new setup. Frankly, I'm lacking the historic context to conclusively judge that.

Closes #3925.

reneme · 2024-04-04T15:06:09Z

While trying to port the AEAD test down to the existing implementation I found a couple more bugs in the inner loop of botan_cipher_update() when used with "SIV" (and probably also "CCM"):

passing some input bytes but no output buffer results in "0 bytes" being consumed (despite those modes not producing any streamed output anyway -- they return require_entire_message() -> true). This is somewhat surprising but not catastrophic.
providing a "dummy" output buffer, however, results in the plaintext being copied into the output buffer. The current implementation fails to detect that cipher.update() actually calls .resize(0) on mbuf (for SIV) and blindly copies the plaintext bytes out of mbuf.data(). This might even be UB, but to the best of my knowledge .resize(0) doesn't actually de-allocate the vector's underlying memory. Nevertheless, the plaintext ending up in the "encryption output" buffer is bad enough. 😨

coveralls · 2024-04-05T00:32:15Z

coverage: 91.847% (+0.006%) from 91.841%
when pulling cda3843 on Rohde-Schwarz:fix/performance_of_ffi_stream_ciphers
into 9e6c6da on randombit:master.

reneme · 2024-04-05T08:00:06Z

Rebased to master after the underlying pull requests were merged.

reneme · 2024-04-08T06:48:05Z

Force-pushed to add GPG signatures to the commit. I'm back at home with access to a USB-C to USB-A adapter for my YubiKey. 🤦‍♂️

randombit

Thanks. This is a lot cleaner and easier to understand. Really demonstrates the benefits of scoped_cleanup and the span oriented slicers 👍

src/lib/ffi/ffi_cipher.cpp

Opportunistically uses the ideal update granularity to process bytes and implements the finalization more robustly according to the documented "rules of engagement" of the FFI. Particularly, it improves the handling of a too small output buffer during cipher finalization.

reneme · 2024-05-23T14:38:58Z

Rebased to master and addressed the review comment.

reneme · 2024-05-23T15:13:33Z

For the record: Some quick-and-dirty measurement with the tool @aahajj used initially show promising speedups especially for stream ciphers but also for the other cipher modes. See: #3925 (comment)

ni4 · 2024-05-29T09:42:42Z

@reneme This broke things for RNP AEAD implementation, as we relied on the previous behaviour where botan_cipher_update() for non-final chunk processes the whole chunk, writing out the equal size as well. Anywa, we'll fix that. The only question is whether it is correct to assume that calling (non-final) botan_cipher_update() for AEAD cipher will not return more data then it is provided by the input?

This is in response to a downstream user question: https://github.com/randombit/botan/pull/3951\#issuecomment-2136990858

reneme · 2024-05-31T14:10:56Z

This broke things for RNP AEAD implementation, as we relied on the previous behaviour where botan_cipher_update() for non-final chunk processes the whole chunk, writing out the equal size as well.

@ni4 Sorry about that!

For clarification (and for the record): Also previously, botan_cipher_update() might have left some input bytes untouched. Namely, when the number of input bytes wasn't a multiple of some internal update granularity. For stream-cipher-ish ciphers (like AES-GCM for instance) this internal granularity used to be "1 byte" inadvertently (resulting in poor performance). I.e. it used to always consume every byte.

@randombit Perhaps we should mention that gotcha in the changelog?

whether it is correct to assume that calling (non-final) botan_cipher_update() for AEAD cipher will not return more data then it is provided by the input?

Yes, it won't ever produce more bytes than provided as input. Unless during finalization to emit the AEAD's authentication tag. I clarified that in the documentation of the function as well: #4088.

reneme · 2024-05-31T14:24:14Z

src/lib/ffi/ffi_cipher.cpp

+            while(in.remaining() >= granularity && out.remaining_capacity() >= expected_output_per_iteration) {
+               copy_mem(mbuf, in.take(granularity));
+               const auto written_bytes = cipher.process(mbuf);
+               BOTAN_DEBUG_ASSERT(written_bytes == expected_output_per_iteration);
+               if(written_bytes > 0) {
+                  BOTAN_ASSERT_NOMSG(written_bytes <= granularity);
+                  copy_mem(out.next(written_bytes), std::span(mbuf).first(written_bytes));
+               }
+            }


@randombit There's more room for optimization here. We could detect that the user passed the same buffer for input and output and omit the copy to the internal mbuf in that case.

ni4 · 2024-05-31T15:03:12Z

For clarification (and for the record): Also previously, botan_cipher_update() might have left some input bytes untouched. Namely, when the number of input bytes wasn't a multiple of some internal update granularity. For stream-cipher-ish ciphers (like AES-GCM for instance) this internal granularity used to be "1 byte" inadvertently (resulting in poor performance). I.e. it used to always consume every byte.

@reneme We used the update granularity, and I believe that at some point in the past it was like 16 bytes or so. Probably at some point that changed.

Yes, it won't ever produce more bytes than provided as input. Unless during finalization to emit the AEAD's authentication tag. I clarified that in the documentation of the function as well: #4088.

Yeah, tag should not be a problem. We just use the same preallocated buffer to encrypt/decrypt (and it is the same for input and output, like you mentioned in the further comment) so it would be unwilling to do allocations during the call.

Thanks!

reneme added the performance Go faster label Mar 27, 2024

reneme added this to the Botan 3.4.0 milestone Mar 27, 2024

reneme requested a review from randombit March 27, 2024 13:19

reneme self-assigned this Mar 27, 2024

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch 2 times, most recently from 575f79d to 305552e Compare April 4, 2024 13:17

This comment was marked as resolved.

Sign in to view

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch 2 times, most recently from 51da010 to 0b432d9 Compare April 4, 2024 14:04

reneme mentioned this pull request Apr 4, 2024

FIX: FFI botan_cipher_update() for SIV and CCM #3971

Merged

reneme removed this from the Botan 3.4.0 milestone Apr 4, 2024

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 0b432d9 to 5b83288 Compare April 4, 2024 19:43

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 5b83288 to d8413fc Compare April 5, 2024 07:59

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from d8413fc to 8b2f42d Compare April 8, 2024 06:47

reneme mentioned this pull request Apr 9, 2024

AES-CTR (and probably more stream ciphers) are slow when used via the FFI's Stream_Cipher mode. #3925

Closed

randombit approved these changes May 23, 2024

View reviewed changes

src/lib/ffi/ffi_cipher.cpp Outdated Show resolved Hide resolved

reneme added 2 commits May 23, 2024 16:38

Doc: how to use botan_cipher_update()

cda3843

reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 8b2f42d to cda3843 Compare May 23, 2024 14:38

randombit added this to the Botan 3.5.0 milestone May 23, 2024

reneme merged commit 1644599 into randombit:master May 23, 2024
43 checks passed

reneme deleted the fix/performance_of_ffi_stream_ciphers branch May 31, 2024 13:29

reneme referenced this pull request in Rohde-Schwarz/botan May 31, 2024

Doc: botan_cipher_update() byte generation behavior

882bf7f

This is in response to a downstream user question: https://github.com/randombit/botan/pull/3951\#issuecomment-2136990858

reneme mentioned this pull request May 31, 2024

Doc: botan_cipher_update() byte generation behavior #4088

Merged

reneme commented May 31, 2024

View reviewed changes

This was referenced Jun 3, 2024

FFI: botan_cipher_get_update_granularity() returns 17 #4090

Closed

FIX: botan_cipher_get_update_granularity() #4094

Closed

ni4 mentioned this pull request Jun 11, 2024

Be compatible with update Botan 3.5 AEAD decryption behaviour. rnpgp/rnp#2245

Closed

reneme mentioned this pull request Jun 13, 2024

FIX: Better scaling for botan_cipher_get_update_granularity() #4122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve FFI's botan_cipher_update() performance for stream ciphers #3951

Improve FFI's botan_cipher_update() performance for stream ciphers #3951

reneme commented Mar 27, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

reneme commented Apr 4, 2024 •

edited

Loading

coveralls commented Apr 5, 2024 •

edited

Loading

reneme commented Apr 5, 2024

reneme commented Apr 8, 2024

randombit left a comment

reneme commented May 23, 2024

reneme commented May 23, 2024

ni4 commented May 29, 2024

reneme commented May 31, 2024 •

edited

Loading

reneme May 31, 2024

ni4 commented May 31, 2024

Improve FFI's botan_cipher_update() performance for stream ciphers #3951

Improve FFI's botan_cipher_update() performance for stream ciphers #3951

Conversation

reneme commented Mar 27, 2024 • edited Loading

Pull Request Dependencies

Description

This comment was marked as resolved.

This comment was marked as resolved.

reneme commented Apr 4, 2024 • edited Loading

coveralls commented Apr 5, 2024 • edited Loading

reneme commented Apr 5, 2024

reneme commented Apr 8, 2024

randombit left a comment

Choose a reason for hiding this comment

reneme commented May 23, 2024

reneme commented May 23, 2024

ni4 commented May 29, 2024

reneme commented May 31, 2024 • edited Loading

reneme May 31, 2024

Choose a reason for hiding this comment

ni4 commented May 31, 2024

reneme commented Mar 27, 2024 •

edited

Loading

reneme commented Apr 4, 2024 •

edited

Loading

coveralls commented Apr 5, 2024 •

edited

Loading

reneme commented May 31, 2024 •

edited

Loading