Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with c_obj_create. #272

Open
ghost opened this issue Oct 4, 2023 · 6 comments
Open

Race condition with c_obj_create. #272

ghost opened this issue Oct 4, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Oct 4, 2023

Hi,

I'm currently testing oqs-provider (from commit fab30c7) with OpenSSL 3.1 (from commit 9c20f5d) in a multi-threaded application.

In this multi-threaded application, I instantiate a OSSL_LIB_CTX per thread, and for each library context I load the default provider and the oqsprovider sequentially:

extern OSSL_provider_init_fn oqs_provider_init;

static int initialize_lib_ctx(OSSL_LIB_CTX *lib_ctx) {
  if (OSSL_PROVIDER_load(lib_ctx, "default") == NULL) {
    goto err;
  }
  if (OSS_PROVIDER_add_builtin(lib_ctx, "oqsprovider", oqs_provider_init) != 1) {
    goto err;
  }
  if (OSS_PROVIDER_load(lib_ctx, "oqsprovider") == NULL) {
    goto err;
  }
  …
  return OK;

err:
  …
}

(Note that I'm compiling oqs-provider as a static library, but it doesn't seem to be involved in this issue)

However, it sometimes fails with the following error:

error registering NID for dilithium2

I've tracked down the problem, and here is what I found: the oqs-provider init function calls the function pointer c_obj_create which happens to be crypto/provider_core.c:core_obj_create:

if (!c_obj_create(handle, oqs_oid_alg_list[i], oqs_oid_alg_list[i + 1],
oqs_oid_alg_list[i + 1])) {
ERR_raise(ERR_LIB_USER, OQSPROV_R_OBJ_CREATE_ERR);
fprintf(stderr, "error registering NID for %s\n",
oqs_oid_alg_list[i + 1]);
goto end_init;
}

By looking at the c_obj_create from OpenSSL, it makes a call to OBJ_create:

https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133

OBJ_create verifies certain things, and eventually inserts the new NID/OID to a global list:

https://github.com/openssl/openssl/blob/831602922f19a8f39d0c0fae425b81e9ab402c69/crypto/objects/obj_dat.c#L774-L794

It seems that two threads may compete between the check in c_obj_create and the actual call to OBJ_create:

static int core_obj_create(const OSSL_CORE_HANDLE *prov, const char *oid,
                           const char *sn, const char *ln)
{
    /* Check if it already exists and create it if not */
    return OBJ_txt2nid(oid) != NID_undef             // Race condition here?
           || OBJ_create(oid, sn, ln) != NID_undef;
}

leading to the obvious error displayed above, because OBJ_create is going to complain that the OID already exists.

In this specific case, ERR_LIB_OBJ … OBJ_R_OID_EXISTS is thrown. I'm wondering if we could check for this error and ignore it if returned by c_obj_create. I've attached a patch fix_race_condition_obj_oid_exists.patch that may temporary solve the issue in oqsprov.c, and the following is a small PoC to reproduce the error:

PoC
#include <pthread.h>

#include <openssl/crypto.h>
#include <openssl/provider.h>

static const size_t N_THREADS = 32;

extern OSSL_provider_init_fn oqs_provider_init;

static void load_oqs_provider(OSSL_LIB_CTX* lib_ctx) {
  if (OSSL_PROVIDER_load(lib_ctx, "default") != NULL) {
    if (OSSL_PROVIDER_add_builtin(lib_ctx, "oqsprovider", oqs_provider_init) == 1) {
      if (OSSL_PROVIDER_load(lib_ctx, "oqsprovider") != NULL) {
      } else {
        putchar('-');
      }
    } else {
      putchar('+');
    }
  } else {
    putchar('@');
  }
}

static void *thread_create_ossl_lib_ctx(void *) {
  OSSL_LIB_CTX *lib_ctx = OSSL_LIB_CTX_new();
  load_oqs_provider(lib_ctx);
  OSSL_LIB_CTX_free(lib_ctx);
  return NULL;
}

int main() {
  pthread_t threads[N_THREADS];

start:
  for (size_t i = 0; i < N_THREADS; ++i) {
    pthread_create(threads + i, NULL, thread_create_ossl_lib_ctx, NULL);
  }

  for (size_t i = 0; i < N_THREADS; ++i) {
    pthread_join(threads[i], NULL);
  }

  goto start;
}

Here is the environment I used to build the PoC:

  • oqs-provider from commit fab30c7
  • OpenSSL 3.1 from commit 9c20f5d
  • oqs-provider is built using -DCMAKE_BUILD_TYPE=Release and -DOQS_PROVIDER_BUILD_STATIC=ON
  • OpenSSL is built using --debug (it also works with --release).

Note that I also encountered this bug with OpenSSL 3.2.

The question is: is it a bug from oqs-provider (is something wrongly done?) or from OpenSSL?
From my point of view, OpenSSL maintains a global (i.e. accessible from within all threads) list of OBJ, and this list should be self-contained in a OSSL_LIB_CTX object.

What do you think about this?

@baentsch
Copy link
Member

baentsch commented Oct 4, 2023

Well reading the documentation

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.

Thus, what about creating a mutex around the registration loop?

@ghost
Copy link
Author

ghost commented Oct 4, 2023

Well reading the documentation

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.

I feel stupid now… I haven't taken the time to read the doc of the what-I-called buggy function OBJ_create

Sorry for the noise here. Definitely not a bug in oqs-provider.

Thank you for your reply @baentsch

@ghost ghost closed this as completed Oct 4, 2023
@baentsch
Copy link
Member

baentsch commented Oct 4, 2023

Definitely not a bug in oqs-provider.

No -- but one that hits it and IMO is worth while considering (read: providing a work-around). My suggestion is to keep it open and resolve it. If the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?

@baentsch baentsch reopened this Oct 4, 2023
@ghost
Copy link
Author

ghost commented Oct 5, 2023

if the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?

I dont' have a opinion on this Mutex idea, but I do think it's worth asking the OpenSSL team!

@ghost
Copy link
Author

ghost commented Oct 5, 2023

Hmm, I was in the middle of writing an email to the OpenSSL mailing list when I saw something quite interesting: you mentioned the OpenSSl 3.0 man page of OBJ_create, which indeed mentions the fact that OBJ_create isn't thread safe:

BUGS

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

However, that BUG section in OpenSSL 3.1 man page of OBJ_create was replaced by the following statement in the NOTE section:

These functions were not thread safe in OpenSSL 3.0 and before.

Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?

@baentsch
Copy link
Member

baentsch commented Oct 5, 2023

Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?

Good catch -- sorry I didn't check different doc versions :-( Conceptually, I'd agree, you shouldn't encounter the issue based on this doc improvement.

But I fail to see how this code https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133 can be thread-safe:

return OBJ_txt2nid(oid) != NID_undef
           || OBJ_create(oid, sn, ln) != NID_undef;

Thread 1 calls OBJ_txt2nid, gets suspended, thread 2 executes both statements (successfully) and thread 1 will be left with a (by now) incorrect result and fail (when calling OBJ_create based on the wrong retval of its prior invocation of OBJ_txt2nid).

In sum, I'd think this is a question to be asked to the OpenSSL community to help make the core better.

But whatever results from those discussions, this will not help with existing code (incl. OpenSSL3.0 installations), so we ought to also add your proposal (discarding the error message as per your patch -- possibly augmented by another call to OBJ_txt2nid to ascertain that the object in fact has been created) to oqsprovider. PR welcome :-)

ghost pushed a commit that referenced this issue Oct 24, 2023
See #272.

This commit adds a lock to run `c_obj_create` and `OBJ_create` thread-safely in
`OQS_PROVIDER_ENTRYPOINT_NAME`,
ghost pushed a commit that referenced this issue Oct 24, 2023
See #272.

This commit adds a lock to run `c_obj_create` and `OBJ_create` thread-safely in
`OQS_PROVIDER_ENTRYPOINT_NAME`,
ghost pushed a commit that referenced this issue Nov 2, 2023
See #272.

This commit adds a lock to run `c_obj_create` and `OBJ_create` thread-safely in
`OQS_PROVIDER_ENTRYPOINT_NAME`,
@baentsch baentsch added the enhancement New feature or request label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant