-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passphrase-based API of Secure Cell #577
Conversation
Add API definitions to <themis/secure_cell.h>. This is new public API for passphrase-secured data where we apply a *password* key-derivation function -- existing master key API uses ZRTP KDF which cannot be securely used with passwords.
Here are some basic tests that are going to hold for the new API. We verify a bunch of things: - ability to encrypt and decrypt data - reaction to invalid argument values - incompatibility of passphrase and master-key API - compatibility with older formats We currently do *not* verify that Secure Cell is able to detect data corruptions, invalid passphrases, etc. These tests will be added later. Note the stability test suite. Some of language wrappers have something like this but Themis Core generally did not verify that old data formats are still supported. These tests verify that we are able to decrypt data encrypted be previous versions of Themis. Passphrase API is going to evolve and use different KDF configurations, effectively changing the encryption algorithm. We need to be sure that we are still able to make sense of the data after we change the implementation. The tests also verify that we understand various AES key lengths that may be used by different Themis builds. Existing Secure Cell code for master key API handles them incorrectly. We'll fix it later. And while we're here, move ARRAY_SIZE utility into common header file so that it can be used across all tests.
Describe data structures used for Secure Cell authentication tokens. Here we use much more explicit method than the existing master key definitions. If we were to use the old approach, these structs would look like struct themis_scell_auth_token_passphrase { uint32_t alg; uint32_t iv_length; uint32_t auth_tag_length; uint32_t message_length; uint32_t kdf_context_length; }; // followed by IV data, auth tag data, KDF context struct themis_scell_pbkdf2_context { uint32_t iteration_count; uint16_t salt_length; }; // followed by salt data And then we'd go and cast byte buffers into this structs: (const struct themis_scell_seal_passphrase_header*)buffer buffer->iv_length; While these structure definitions are definitely more compact, 'parsing' code ends up scattered around the file and riddled with hard-to-reason pointer magic. Furthermore, such pointer casts do not handle endianness and pointer alignment correctly, leading to undefined behavior on some ARM systems (and warnings from undefined behavior sanitizer -- that we currently squelch because of such not-exactly-correct code). New code is much more verbose -- in this file -- but the 'end-user' code in Secure Cell implementation looks much cleaner and way easier to read with all this complexity hidden elsewhere. The compiler does good job at inlining all of the parsing code and it is as efficient as possible. (Though yeah, it is *not* zero-copy as it was before.) We put definitions in a private header file and make all parsing and serialization functions "static inline" so that they do not cause duplicate symbol errors and are always available to the compiler. All of these are internal helpers and they are not intended to be reused for other encryption modes or for any other purposes. They have rather idiosyncratic API that might appear weird at first. Let it sink in.
Similar to master key API, introduce internal helpers that actually implement encryption and leave only seal mode handling in secure_cell.c Just like with master key API, the helpers can be shared between Seal and Token Protect modes (though we do not implement Token Protect now). Seal mode has to do a bit of trickery to prepend the authentication token and to detect its length relative to the message for decryption. So it's mostly the same ideas, but there are some differences: - First of all, we use way fewer magical macros in the implementation. - THEMIS_FAIL is returned when encrypted message buffer size does not match the size expected from the header. Master key API returns THEMIS_INVALID_PARAMETER in this case which is not entirely correct. New API treats this situation as data corruption. THEMIS_INVALID_PARAMETER should be used to indicate programming errors, such as passing a NULL pointer for a required argument. - We also take care to not overwrite out-parameters for message length unless we are returning a 'success': either successful decryption indicated by THEMIS_SUCCESS, or a successful measurement of required output buffer size indicated by THEMIS_BUFFER_TOO_SMALL. Also note that the user may pass a buffer larger than necessary. For this we need to update the out pointer even in THEMIS_SUCCCESS case so that the user knows actual output size. Also, expose "plain" encryption functions used by master key API so that we can avoid dealing with Soter API intricacies and reuse the same code.
Implement Secure Cell (Seal) encryption with passphrases as described in Themis RFC 2. We use the same pattern as with master key code: - themis_auth_sym_encrypt_message_with_passphrase() does all parameter validation, returns expected output buffer size if requested, and hands off processing to... - themis_auth_sym_encrypt_message_with_passphrase_() which does all preparatory tasks for encryption: fills in the message header, generates derived key and ancillary data, calls encryption function, and then finally writes the header into output buffer - themis_auth_sym_plain_encrypt() does actual encryption using Soter. Note that we have to clear the KDF field in algorithm to prevent Soter from using its own PBKDF2 implementation. There are other points to note in the code: - Before writing output buffer we verify its size again, now that we really know all the component sizes. - We always wipe intermdidate data that we store on the stack. Derived key is never stored anywhere else. Other byte buffers are copied into output buffer.
Decryption is more complex that encryption. Here we need to tread veeeery carefully, treating input data as actively hostile. Function split is the same as with encryption. Notable points: - Derived key length depends on the algorithm stored in the message. Current implementation of master key API makes a mistake of using the default key length always. - Algorithm field has some reserved bits that are set to zero by our implementation. We check that they are indeed set to zero. - Since we support only PBKDF2 code, we can cheat a little and keep everything in one function. I guess it will have to rewritten once we migrate from PBKDF2. But it's good enough for now. - There are around 4 or 5 length checks in total for the output message because I really don't want to get a buffer overflow here. - Don't forget to wipe the derived key from memory. Always.
Do not use wording "password" or "passphrase" with master key API of Secure Cell. Instead, use either generate symmetric key, or something that looks like a key and does not look like a passphrase.
Okay, sure, clang-format, that makes it much readable.
return (alg & SOTER_SYM_KEY_LENGTH_MASK) / 8; | ||
} | ||
|
||
static inline bool soter_alg_reserved_bits_valid(uint32_t alg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very cool to use such helpers and encapsulate hard to read logic into function with a human name
@@ -94,6 +94,123 @@ themis_status_t themis_secure_cell_decrypt_seal(const uint8_t* master_key, | |||
uint8_t* plain_message, | |||
size_t* plain_message_length); | |||
|
|||
/** | |||
* Encrypts and puts the provided message into a sealed cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool functions documentation
encrypted_message + auth_token_length, | ||
&ciphertext_length); | ||
if (res == THEMIS_SUCCESS || res == THEMIS_BUFFER_TOO_SMALL) { | ||
*encrypted_message_length = auth_token_length + ciphertext_length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse total_length
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional because the themis_auth_sym_encrypt_message_with_passphrase() call might have modified those values. Of course, it does not modify them and accurately reports required lengths so we could reuse the value. But I'd leave that for future-proofing.
For all functions that return buffers, it's not strictly incorrect behavior to return slightly bigger required buffer size during measurement stage (BUFFER_TOO_SMALL) and then return accurate – and possibly smaller – actual size of the buffer used on SUCCESS. Currently we are able to accurately determine required length without actually processing the data, but it might not be the case in the future.
Add a "stream" prefix so that they are better associated with file streams and similar wStream utilities from WinPR. Unify read and write API to use in/out pointers in last position, with first argument always being the stream to modify, with its new value returned from the function. The behavior is the same, but the API is neater and easier to read.
Allow non-NULL pointer to associated context buffer when its length is set to zero. This plays nicely with languages that do not return NULL for their empty byte containers. (And it was not tested either...)
Encryption paths of master key and passphrase API has an unnecessary and weird check of auth tag length output. Strictly speaking, we should measure the size of tag but we just know that it's 12 bytes for AES-GCM. Tweak the pointer type in internal helper function so that it is able to directly write the auth tag length into the header struct. This way we don't have to check the length is unchanged. However, since Soter API outputs size_t, we do need to check that it's safe to truncate the value returned by Soter.
Move KDF context parsing into seprate helper function with takes in Secure Cell header and a passphrase, and outputs a derived key. This will make it easier to extend the decryption code path to handle other KDFs when we add them.
Similar to decryption, make KDF computation dependent on whatever we write into hdr->alg so that it stays consistent. This should make adding new KDFs later much easier: you only need to write a new function which computes derived key and add it to the KDF switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Woops, this is a good typo. clang-analyzer catches it, by the way.
This PR blocks a number of other PRs so I'm merging it right now with only one approval. Since most of the issues have been addressed and this PR has received some reviewer attention, I guess it's okay. @ignatk, @storojs72, if you have any concerns about these changes, please write them here, I'll address them separately. |
Implement passphrase-based API of Secure Cell in Themis Core:
The API is described in Themis RFC 2 (not available publicly at the moment). This pull request implements the API in Themis Core on which we’ll base implementations in all high-level language wrappers.
Main issue that is solved by passphrase API is inadeqacy of ZRTP KDF used by master key API when applied to human-readable passphrases with low entropy. Previously we suggested and allowed such use in high-level languages, however it is not secure enough. Master key API should be used only with randomly-generated binary keys. Our symmetric key generation API produces 32-byte-long keys.
New API can be safely used with human-readable strings containing way less entropy. This is made possible by using a proper password key derivation function — PBKDF2 — to generate intermediate key used for encryption. Currently we are using 200,000 iterations — the slowdown is not noticeable for single-time use but is frustrating enough for repeated computation during brute-force attacks.
Passphrase support is currently implemented only for Seal mode of Secure Cell. It can be extended to Token Protect mode in the future. Context Imprint mode will never support passphrases directly because it does not provide storage space for KDF context. However, we may provide separate KDF interface later which will make it possible to use passphrases with Context Imprint mode too.
The implementation is split into five main parts:
Define authentication token format
I’ve opted for more explicit approach to parsing and producing data formats. I believe that such approach is less error-prone and contains less magic. Though it is much more verbose that fast, direct, and incorrect casts to struct pointers that we are currently doing in master key API code.
Give it a close look as this will be our main sponsor of hidden segmentation faults.
Prepend authentication token in Seal mode
Token Protect mode keeps the token as a separate buffer while in Seal mode it is prepended to the encrypted message. We have to carefully gauge the output buffer size during encryption, and carefully parse input data during decryption.
Secure Cell encryption
It is pretty straightforward. There are some Soter quirks that we need to deal with, and there are a couple of restrictions on the input data that we need to honor.
Also note that we encrypt data using default KDF parameters which are fixed for Themis release. However, we may review and change them in future releases. Unless we introduce a new KDF algorithm, simple parameter tweaks are backwars and forward-compatible. With PBKDF2 we can adjust salt length (unlikely) and iteration count.
Secure Cell decryption
This code path needs close attention as here we have to deal with actively hostile user input. Separate data format parsing makes it easier to follow, but still be alert.
Tests, documentation comments, changelog updates, etc.
The tests are basic but they cover the most important points. New additions include explicit tests for all AES key lengths that we support. (Yes, I did fail to take that into account during development.)
As for what they lack, we currently do not verify that Secure Cell is able to detect data corruption, for example. I hope to not forget to add them later.
Have fun reviewing these 1.5k lines of code next week!
Checklist
Benchmark results are attached(coming soon)Example projects and code samples are updated(later)Address feedback:
stream_
prefix to parsing helpersauth_tag_length
comparisons