Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2n_random.c: memory leaks of per-thread rand state #3525

Closed
grrtrr opened this issue Sep 30, 2022 · 6 comments · Fixed by #3771
Closed

s2n_random.c: memory leaks of per-thread rand state #3525

grrtrr opened this issue Sep 30, 2022 · 6 comments · Fixed by #3771

Comments

@grrtrr
Copy link
Contributor

grrtrr commented Sep 30, 2022

Problem description

s2n assigns allocated memory to a per-thread data structure, which is not cleaned up by the thread that created it.

We discovered this issue (reproducible on each run) when using ASAN/LeakSanitizer.

Backtrace

=================================================================
==25334==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x4e65ad in malloc /local/mnt/workspace/bcain_clang_bcain-ubuntu_7561/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145
    #1 0x7fd51e02f4aa in OPENSSL_malloc external/boringssl/src/crypto/mem.c:157
    #2 0x7fd51df65fd1 in EVP_CIPHER_CTX_new external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:76
    #3 0x7fd520e6def5 in s2n_drbg_instantiate external/aws-s2n/crypto/s2n_drbg.c:165
    #4 0x7fd52106816d in s2n_init_drbgs external/aws-s2n/utils/s2n_random.c:141
    #5 0x7fd521065f55 in s2n_ensure_initialized_drbgs external/aws-s2n/utils/s2n_random.c:151
    #6 0x7fd521065c59 in s2n_rand_init external/aws-s2n/utils/s2n_random.c:374
    #7 0x7fd5210579ad in s2n_init external/aws-s2n/utils/s2n_init.c:54
    #8 0x7fd52122dc00 in aws_tls_init_static_state external/aws-c-io/source/s2n/s2n_tls_channel_handler.c:185
    #9 0x7fd5211f6032 in aws_io_library_init external/aws-c-io/source/io.c:345
    #10 0x7fd5215ef3f8 in aws_mqtt_library_init external/aws-c-mqtt/source/mqtt.c:186
    #11 0x7fd5216feeb7 in Aws::Crt::s_initApi(aws_allocator*) external/aws-crt/source/Api.cpp:47
    #12 0x7fd5216fed8d in ApiHandle external/aws-crt/source/Api.cpp:60
    #13 0x7fd521a0a813 in Aws::Crt::ApiHandle* Aws::New<Aws::Crt::ApiHandle, aws_allocator*>(char const*, aws_allocator*&&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:67
    #14 0x7fd521a0a0d5 in Aws::InitializeCrt() external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Globals.cpp:49
    #15 0x7fd521a0246e in Aws::InitAPI(Aws::SDKOptions const&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Aws.cpp:31
...

Direct leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x4e65ad in malloc /local/mnt/workspace/bcain_clang_bcain-ubuntu_7561/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145
    #1 0x7fd51e02f4aa in OPENSSL_malloc external/boringssl/src/crypto/mem.c:157
    #2 0x7fd51df65fd1 in EVP_CIPHER_CTX_new external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:76
    #3 0x7fd520e6def5 in s2n_drbg_instantiate external/aws-s2n/crypto/s2n_drbg.c:165
    #4 0x7fd521068028 in s2n_init_drbgs external/aws-s2n/utils/s2n_random.c:140
    #5 0x7fd521065f55 in s2n_ensure_initialized_drbgs external/aws-s2n/utils/s2n_random.c:151
    #6 0x7fd521065c59 in s2n_rand_init external/aws-s2n/utils/s2n_random.c:374
    #7 0x7fd5210579ad in s2n_init external/aws-s2n/utils/s2n_init.c:54
    #8 0x7fd52122dc00 in aws_tls_init_static_state external/aws-c-io/source/s2n/s2n_tls_channel_handler.c:185
    #9 0x7fd5211f6032 in aws_io_library_init external/aws-c-io/source/io.c:345
    #10 0x7fd5215ef3f8 in aws_mqtt_library_init external/aws-c-mqtt/source/mqtt.c:186
    #11 0x7fd5216feeb7 in Aws::Crt::s_initApi(aws_allocator*) external/aws-crt/source/Api.cpp:47
    #12 0x7fd5216fed8d in ApiHandle external/aws-crt/source/Api.cpp:60
    #13 0x7fd521a0a813 in Aws::Crt::ApiHandle* Aws::New<Aws::Crt::ApiHandle, aws_allocator*>(char const*, aws_allocator*&&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:67
    #14 0x7fd521a0a0d5 in Aws::InitializeCrt() external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Globals.cpp:49
    #15 0x7fd521a0246e in Aws::InitAPI(Aws::SDKOptions const&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Aws.cpp:31
...

Indirect leak of 272 byte(s) in 1 object(s) allocated from:
    #0 0x4e65ad in malloc /local/mnt/workspace/bcain_clang_bcain-ubuntu_7561/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145
    #1 0x7fd51e02f4aa in OPENSSL_malloc external/boringssl/src/crypto/mem.c:157
    #2 0x7fd51df66a88 in EVP_CipherInit_ex external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:159
    #3 0x7fd51df6792a in EVP_EncryptInit_ex external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:233
    #4 0x7fd520e6e2b2 in s2n_drbg_instantiate external/aws-s2n/crypto/s2n_drbg.c:175
    #5 0x7fd52106816d in s2n_init_drbgs external/aws-s2n/utils/s2n_random.c:141
    #6 0x7fd521065f55 in s2n_ensure_initialized_drbgs external/aws-s2n/utils/s2n_random.c:151
    #7 0x7fd521065c59 in s2n_rand_init external/aws-s2n/utils/s2n_random.c:374
    #8 0x7fd5210579ad in s2n_init external/aws-s2n/utils/s2n_init.c:54
    #9 0x7fd52122dc00 in aws_tls_init_static_state external/aws-c-io/source/s2n/s2n_tls_channel_handler.c:185
    #10 0x7fd5211f6032 in aws_io_library_init external/aws-c-io/source/io.c:345
    #11 0x7fd5215ef3f8 in aws_mqtt_library_init external/aws-c-mqtt/source/mqtt.c:186
    #12 0x7fd5216feeb7 in Aws::Crt::s_initApi(aws_allocator*) external/aws-crt/source/Api.cpp:47
    #13 0x7fd5216fed8d in ApiHandle external/aws-crt/source/Api.cpp:60
    #14 0x7fd521a0a813 in Aws::Crt::ApiHandle* Aws::New<Aws::Crt::ApiHandle, aws_allocator*>(char const*, aws_allocator*&&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:67
    #15 0x7fd521a0a0d5 in Aws::InitializeCrt() external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Globals.cpp:49
    #16 0x7fd521a0246e in Aws::InitAPI(Aws::SDKOptions const&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Aws.cpp:31
...

Indirect leak of 272 byte(s) in 1 object(s) allocated from:
    #0 0x4e65ad in malloc /local/mnt/workspace/bcain_clang_bcain-ubuntu_7561/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145
    #1 0x7fd51e02f4aa in OPENSSL_malloc external/boringssl/src/crypto/mem.c:157
    #2 0x7fd51df66a88 in EVP_CipherInit_ex external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:159
    #3 0x7fd51df6792a in EVP_EncryptInit_ex external/boringssl/src/crypto/fipsmodule/cipher/cipher.c:233
    #4 0x7fd520e6e0d2 in s2n_drbg_instantiate external/aws-s2n/crypto/s2n_drbg.c:172
    #5 0x7fd521068028 in s2n_init_drbgs external/aws-s2n/utils/s2n_random.c:140
    #6 0x7fd521065f55 in s2n_ensure_initialized_drbgs external/aws-s2n/utils/s2n_random.c:151
    #7 0x7fd521065c59 in s2n_rand_init external/aws-s2n/utils/s2n_random.c:374
    #8 0x7fd5210579ad in s2n_init external/aws-s2n/utils/s2n_init.c:54
    #9 0x7fd52122dc00 in aws_tls_init_static_state external/aws-c-io/source/s2n/s2n_tls_channel_handler.c:185
    #10 0x7fd5211f6032 in aws_io_library_init external/aws-c-io/source/io.c:345
    #11 0x7fd5215ef3f8 in aws_mqtt_library_init external/aws-c-mqtt/source/mqtt.c:186
    #12 0x7fd5216feeb7 in Aws::Crt::s_initApi(aws_allocator*) external/aws-crt/source/Api.cpp:47
    #13 0x7fd5216fed8d in ApiHandle external/aws-crt/source/Api.cpp:60
    #14 0x7fd521a0a813 in Aws::Crt::ApiHandle* Aws::New<Aws::Crt::ApiHandle, aws_allocator*>(char const*, aws_allocator*&&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:67
    #15 0x7fd521a0a0d5 in Aws::InitializeCrt() external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Globals.cpp:49
    #16 0x7fd521a0246e in Aws::InitAPI(Aws::SDKOptions const&) external/com_github_aws-sdk-cpp/aws-cpp-sdk-core/source/Aws.cpp:31

Analysis

These 4 allocations happen on the following per-thread structure:

// utils/s2n_random.c
static __thread struct s2n_rand_state s2n_per_thread_rand_state = {
    .cached_fork_generation_number = 0,
    .public_drbg = {0},
    .private_drbg = {0},
    .drbgs_initialized = false
};

The first 2 allocations of 152B each (for public/private_drbg) allocate the ctx:

// crypto/s2n_drbg.c
S2N_RESULT s2n_drbg_instantiate(struct s2n_drbg *drbg, struct s2n_blob *personalization_string, const s2n_drbg_mode mode)
{
//...
    drbg->ctx = EVP_CIPHER_CTX_new(); // <=== HERE
//...
}

The remaining 2 allocations result from initializations:

// crypto/s2n_drbg.c
S2N_RESULT s2n_drbg_instantiate(struct s2n_drbg *drbg, struct s2n_blob *personalization_string, const s2n_drbg_mode mode)
{
  // ...
    switch(mode) {
        case S2N_AES_128_CTR_NO_DF_PR:
            RESULT_GUARD_OSSL(EVP_EncryptInit_ex(drbg->ctx, EVP_aes_128_ecb(), NULL, NULL, NULL), S2N_ERR_DRBG); // <== L 172
            break;
        case S2N_AES_256_CTR_NO_DF_PR:
            RESULT_GUARD_OSSL(EVP_EncryptInit_ex(drbg->ctx, EVP_aes_256_ecb(), NULL, NULL, NULL), S2N_ERR_DRBG); // <== L175
            break;
        default:
            RESULT_BAIL(S2N_ERR_DRBG);
    }

A de-allocation function exists:

// utils/s2n_random.c
S2N_RESULT s2n_rand_cleanup_thread(void)
{
    /* Currently, it is only safe for this function to mutate the drbg states
     * in the per thread rand state. See s2n_ensure_uniqueness().
     */
    RESULT_GUARD(s2n_drbg_wipe(&s2n_per_thread_rand_state.private_drbg));
    RESULT_GUARD(s2n_drbg_wipe(&s2n_per_thread_rand_state.public_drbg));

    s2n_per_thread_rand_state.drbgs_initialized = false;

    return S2N_RESULT_OK;
}

The problem is that the allocation is per-thread, while the de-allocation is only in the main thread:

// utils/s2n_init.c
static bool s2n_cleanup_atexit_impl(void)
{
    /* all of these should run, regardless of result, but the
     * values to need to be consumed to prevent warnings */
    
    /* the configs need to be wiped before resetting the memory callbacks */
    s2n_wipe_static_configs();

    bool a = s2n_result_is_ok(s2n_rand_cleanup_thread()); // <=== HERE
    bool b = s2n_result_is_ok(s2n_rand_cleanup());
    bool c = s2n_mem_cleanup() == 0;

    return a && b && c;
}

Similar for aws-c-io/source/io.c:aws_io_library_clean_up() => aws_tls_clean_up_static_state():

// aws-c-io/source/s2n/s2n_tls_channel_handler.c
void aws_tls_clean_up_static_state(void) {
    s2n_cleanup();
}

Where s2n_cleanup is

// utils/s2n_init.c
int s2n_cleanup(void)
{
    /* s2n_cleanup is supposed to be called from each thread before exiting,
     * so ensure that whatever clean ups we have here are thread safe */
    POSIX_GUARD_RESULT(s2n_rand_cleanup_thread());  // <=== HERE

    /* If this is the main thread and atexit cleanup is disabled,
     * perform final cleanup now */
    if (pthread_equal(pthread_self(), main_thread) && !atexit_cleanup) {
        POSIX_ENSURE(s2n_cleanup_atexit_impl(), S2N_ERR_ATEXIT);
    }
    return 0;
}

Conclusion

The allocations to s2n_per_thread_rand_state are assigned to the thread-local storage where s2n_init_drbgs / s2n_ensure_initialized_drbgs are run. The corresponding clean-up happens from a different (main) thread, hence do not free the original allocations.

To illustrate, here is a call trace where the value of pthread_self() is printed (from the above run):

s2n_init() 139765995271936
s2n_rand_init() 139765995271936
s2n_init_drbgs() 139765995271936
s2n_drbg_instantiate() 3464492800
s2n_drbg_instantiate() 3464492800
...
aws_io_library_clean_up() 139766064971776
aws_tls_clean_up_static_state() => s2n_cleanup() 139766064971776
s2n_cleanup() 139766064971776
s2n_rand_cleanup_thread() 139766064971776
grrtrr added a commit to grrtrr/s2n-tls that referenced this issue Oct 3, 2022
s2n_init_drbgs allocates, via s2n_drbg_instantiate, dynamic memory to fields of a static,
thread-local struct. For this reason, the de-allocation/clean-up of these fields can only
happen through the same thread that created them. Else the allocations will leak.

Hence track these allocations as thread-specific data, using an allocation key and a
destructor function that gets called at thread exit.

This resolves aws#3525 (tested via LSAN sanitizer).
@lrstewart
Copy link
Contributor

Thanks for reaching out!

I think I'm missing something here. Are you / aws-c-io not calling s2n_cleanup from each thread? Or are you saying that allocations that should be cleaned up in s2n_rand_cleanup_thread are actually only being cleaned up in s2n_rand_cleanup?

@grrtrr
Copy link
Contributor Author

grrtrr commented Oct 3, 2022

I understand the intention, that s2n_cleanup() should be called at thread exit.

The current reality is that aws_io_library_clean_up() calls aws_tls_clean_up_static_state(), which calls s2n_cleanup().
This call chain is happening apparently from the main thread.

I have included a trace with pthread_self() ids at the end of the above, the call to s2n_cleanup() happens from a thread different from the one that initially called s2n_init_drbgs().

@jeking3
Copy link
Contributor

jeking3 commented Oct 3, 2022

I agree with the notion that if S2N allocates something in thread local storage, that it should register a thread exit handler to clean that up. It should not be the responsibility of the consumer in this case.

@jeking3
Copy link
Contributor

jeking3 commented Oct 3, 2022

I recently made changes in this area (#3512) and it was suggested (in #3446) that reference counting the calls to s2n_init and s2n_cleanup would allow the library to clean up better. I can see in that case, it would be possible for a different thread to call s2n_cleanup than the one that calls s2n_init and that would become acceptable, whereas with now s2n requires the main thread to be used in cleanup to actually cleanup. Would that be a better solution?

@jeking3
Copy link
Contributor

jeking3 commented Oct 4, 2022

Per current documentation, each consumer using S2N needs to call s2n_cleanup on each thread that uses S2N. The first thread that calls s2n_init is considered to be the "main thread" (whether it is the originating process thread is irrelevant). It performs global initialization, and also performs thread local storage initialization (stuff for rand gen). Each subsequent thread that needs it will also allocate thread local storage.

The consumer is responsible for ensuring s2n_cleanup is called on every thread. When s2n_cleanup is called on any thread but the first thread that called s2n_init (i.e., the "main thread"), thread local storage is cleaned up. When s2n_cleanup is called on the main thread, both thread local storage and global state are cleaned up. There's no requirement that these happen in any sort of order, they simply all need to happen.

Your issue does not sound like it is an S2N library issue, as S2N makes no attempt to handle cleanup on its own. Whatever is consuming the S2N library is not honoring the current API contract, if it calls a single s2n_init on one thread and then calls a single s2n_cleanup on another thread per your example above. Perhaps this issue needs to be taken up with aws_c_io, aws-cpp-sdk core, or perhaps your code is not using the same thread to initialize and clean up the SDK?

For my statement one post up about reference counting, that is unnecessary given the current behavior.

For my statement two posts up about the responsibility, I think it would be better if S2N used a thread exit handler to clean up the drbg that is creates in thread local storage from an API contract perspective, removing the burden from the consumer. However that's not what's there today.

@grrtrr
Copy link
Contributor Author

grrtrr commented Oct 4, 2022

@jeking3 - I traced this all the way back to the AWS C++ SDK where init/clean-up is done via Aws::InitAPI and Aws::ShutdownAPI. They are called from different threads (again a trace printing the value of pthread_self()):

Aws::InitAPI 140004202379008
# ... application code running
Aws::ShutdownAPI 140004272230400

Both Aws::InitAPI and Aws::ShutdownAPI have long call chains that eventually end up in s2n_init and s2n_cleanup, respectively.

As a consequence, the same constraints ("s2n_init/s2n_cleanup must be called from exactly the same thread") also hold true for InitAPI/ShutdownAPI. This is nowhere documented and - as this issue demonstrates - not easy to debug.

We ran into this in an application that initializes the SDK only if cloud calls are needed. As a consequence, we can not do dynamic initialization, due to the s2n constraint.

Question: is it at all possible to register a thread-exit cleanup handler to prevent external users (C++ SDK in this case) from running into this issue?

grrtrr added a commit to grrtrr/aws-sdk-cpp that referenced this issue Jan 28, 2023
…be called from the same thread

Running InitAPI() and ShutdownAPI() from different threads can cause subtle problems, see e.g.
- aws/s2n-tls#3525
- awslabs/aws-c-common#891

Add a note for users to use the same thread.
SergeyRyabinin pushed a commit to aws/aws-sdk-cpp that referenced this issue Jan 30, 2023
…be called from the same thread

Running InitAPI() and ShutdownAPI() from different threads can cause subtle problems, see e.g.
- aws/s2n-tls#3525
- awslabs/aws-c-common#891

Add a note for users to use the same thread.
jmklix pushed a commit to aws/aws-sdk-cpp that referenced this issue Aug 11, 2023
…be called from the same thread

Running InitAPI() and ShutdownAPI() from different threads can cause subtle problems, see e.g.
- aws/s2n-tls#3525
- awslabs/aws-c-common#891

Add a note for users to use the same thread.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants