Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve FFI's botan_cipher_update() performance for stream ciphers #3951

Merged

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Mar 27, 2024

Pull Request Dependencies

Description

This reworks the implementation of FFI botan_cipher_update(), making it comply with the general "rules of engagement" of the FFI. Note that this subtly changes the behavior when finalizing a cipher without providing a big enough output buffer up-front.

When the caller failed to provide a big enough output buffer, it now returns BOTAN_FFI_ERROR_INSUFFICIENT_BUFFER_SPACE instead of BOTAN_FFI_ERROR_INVALID_INPUT and sets output_written to the required number of bytes. In that case, re-invoking botan_cipher_update() with the "final" flag and a sufficiently sized output buffer then obtains the remaining bytes and resets the cipher object.

Before, the re-invocation had to happen without setting the final flag again to obtain the same behavior.
@randombit Frankly, I doubt that the existing implementation supported such a usage explicitly. It did feel like an inadvertent hack to me. That's why I changed it to an (IMHO) more logical interface. Nevertheless, there might be an argument to keep the existing behavior, albeit inconsistent with the generic "rules of engagement".

The rework should also significantly improve the performance of stream cipher modes (e.g. CTR-mode) via the FFI by opportunistically leveraging the ideal_update_granularity() of the cipher mode. Before, we generally processed data in chunks of update_granularity(), which is "1 byte" for stream ciphers. That resulted in very poor performance, as described in #3925.

TODO: We might look into the internal function ffi_choose_update_size() and whether it still makes sense in this new setup. Frankly, I'm lacking the historic context to conclusively judge that.

Closes #3925.

@reneme reneme added the performance Go faster label Mar 27, 2024
@reneme reneme added this to the Botan 3.4.0 milestone Mar 27, 2024
@reneme reneme requested a review from randombit March 27, 2024 13:19
@reneme reneme self-assigned this Mar 27, 2024
@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch 2 times, most recently from 575f79d to 305552e Compare April 4, 2024 13:17
@randombit

This comment was marked as resolved.

@reneme

This comment was marked as resolved.

@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch 2 times, most recently from 51da010 to 0b432d9 Compare April 4, 2024 14:04
@reneme
Copy link
Collaborator Author

reneme commented Apr 4, 2024

While trying to port the AEAD test down to the existing implementation I found a couple more bugs in the inner loop of botan_cipher_update() when used with "SIV" (and probably also "CCM"):

  • passing some input bytes but no output buffer results in "0 bytes" being consumed (despite those modes not producing any streamed output anyway -- they return require_entire_message() -> true). This is somewhat surprising but not catastrophic.
  • providing a "dummy" output buffer, however, results in the plaintext being copied into the output buffer. The current implementation fails to detect that cipher.update() actually calls .resize(0) on mbuf (for SIV) and blindly copies the plaintext bytes out of mbuf.data(). This might even be UB, but to the best of my knowledge .resize(0) doesn't actually de-allocate the vector's underlying memory. Nevertheless, the plaintext ending up in the "encryption output" buffer is bad enough. 😨

@reneme reneme removed this from the Botan 3.4.0 milestone Apr 4, 2024
@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 0b432d9 to 5b83288 Compare April 4, 2024 19:43
@coveralls
Copy link

coveralls commented Apr 5, 2024

Coverage Status

coverage: 91.847% (+0.006%) from 91.841%
when pulling cda3843 on Rohde-Schwarz:fix/performance_of_ffi_stream_ciphers
into 9e6c6da on randombit:master.

@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 5b83288 to d8413fc Compare April 5, 2024 07:59
@reneme
Copy link
Collaborator Author

reneme commented Apr 5, 2024

Rebased to master after the underlying pull requests were merged.

@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from d8413fc to 8b2f42d Compare April 8, 2024 06:47
@reneme
Copy link
Collaborator Author

reneme commented Apr 8, 2024

Force-pushed to add GPG signatures to the commit. I'm back at home with access to a USB-C to USB-A adapter for my YubiKey. 🤦‍♂️

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is a lot cleaner and easier to understand. Really demonstrates the benefits of scoped_cleanup and the span oriented slicers 👍

src/lib/ffi/ffi_cipher.cpp Outdated Show resolved Hide resolved
reneme added 2 commits May 23, 2024 16:38
Opportunistically uses the ideal update granularity to process bytes
and implements the finalization more robustly according to the
documented "rules of engagement" of the FFI. Particularly, it improves
the handling of a too small output buffer during cipher finalization.
@reneme reneme force-pushed the fix/performance_of_ffi_stream_ciphers branch from 8b2f42d to cda3843 Compare May 23, 2024 14:38
@reneme
Copy link
Collaborator Author

reneme commented May 23, 2024

Rebased to master and addressed the review comment.

@reneme
Copy link
Collaborator Author

reneme commented May 23, 2024

For the record: Some quick-and-dirty measurement with the tool @aahajj used initially show promising speedups especially for stream ciphers but also for the other cipher modes. See: #3925 (comment)

@randombit randombit added this to the Botan 3.5.0 milestone May 23, 2024
@reneme reneme merged commit 1644599 into randombit:master May 23, 2024
43 checks passed
@ni4
Copy link

ni4 commented May 29, 2024

@reneme This broke things for RNP AEAD implementation, as we relied on the previous behaviour where botan_cipher_update() for non-final chunk processes the whole chunk, writing out the equal size as well. Anywa, we'll fix that. The only question is whether it is correct to assume that calling (non-final) botan_cipher_update() for AEAD cipher will not return more data then it is provided by the input?

@reneme reneme deleted the fix/performance_of_ffi_stream_ciphers branch May 31, 2024 13:29
@reneme
Copy link
Collaborator Author

reneme commented May 31, 2024

This broke things for RNP AEAD implementation, as we relied on the previous behaviour where botan_cipher_update() for non-final chunk processes the whole chunk, writing out the equal size as well.

@ni4 Sorry about that!

For clarification (and for the record): Also previously, botan_cipher_update() might have left some input bytes untouched. Namely, when the number of input bytes wasn't a multiple of some internal update granularity. For stream-cipher-ish ciphers (like AES-GCM for instance) this internal granularity used to be "1 byte" inadvertently (resulting in poor performance). I.e. it used to always consume every byte.

@randombit Perhaps we should mention that gotcha in the changelog?

whether it is correct to assume that calling (non-final) botan_cipher_update() for AEAD cipher will not return more data then it is provided by the input?

Yes, it won't ever produce more bytes than provided as input. Unless during finalization to emit the AEAD's authentication tag. I clarified that in the documentation of the function as well: #4088.

Comment on lines +228 to +236
while(in.remaining() >= granularity && out.remaining_capacity() >= expected_output_per_iteration) {
copy_mem(mbuf, in.take(granularity));
const auto written_bytes = cipher.process(mbuf);
BOTAN_DEBUG_ASSERT(written_bytes == expected_output_per_iteration);
if(written_bytes > 0) {
BOTAN_ASSERT_NOMSG(written_bytes <= granularity);
copy_mem(out.next(written_bytes), std::span(mbuf).first(written_bytes));
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@randombit There's more room for optimization here. We could detect that the user passed the same buffer for input and output and omit the copy to the internal mbuf in that case.

@ni4
Copy link

ni4 commented May 31, 2024

For clarification (and for the record): Also previously, botan_cipher_update() might have left some input bytes untouched. Namely, when the number of input bytes wasn't a multiple of some internal update granularity. For stream-cipher-ish ciphers (like AES-GCM for instance) this internal granularity used to be "1 byte" inadvertently (resulting in poor performance). I.e. it used to always consume every byte.

@reneme We used the update granularity, and I believe that at some point in the past it was like 16 bytes or so. Probably at some point that changed.

Yes, it won't ever produce more bytes than provided as input. Unless during finalization to emit the AEAD's authentication tag. I clarified that in the documentation of the function as well: #4088.

Yeah, tag should not be a problem. We just use the same preallocated buffer to encrypt/decrypt (and it is the same for input and output, like you mentioned in the further comment) so it would be unwilling to do allocations during the call.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AES-CTR (and probably more stream ciphers) are slow when used via the FFI's Stream_Cipher mode.
4 participants