Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token authentication very slow > 5s #684

Closed
dwt opened this issue Nov 12, 2024 · 14 comments
Closed

Token authentication very slow > 5s #684

dwt opened this issue Nov 12, 2024 · 14 comments
Assignees
Labels

Comments

@dwt
Copy link

dwt commented Nov 12, 2024

Describe the bug

When authenticating against a oidc server written with Authlib 1.3.2 generating the token for the client is really slow. Depending on the runtime context this can take more than 5s.

Error Stacks

put error stacks here

To Reproduce

import struct
import base64


from cryptography.hazmat.backends import default_backend

from cryptography.hazmat.primitives.asymmetric.rsa import (
    RSAPublicKey, RSAPrivateKeyWithSerialization,
    RSAPrivateNumbers, RSAPublicNumbers,
    rsa_recover_prime_factors, rsa_crt_dmp1, rsa_crt_dmq1, rsa_crt_iqmp
)

def urlsafe_b64decode(s):
    s += b'=' * (-len(s) % 4)
    return base64.urlsafe_b64decode(s)

def to_bytes(x, charset='utf-8', errors='strict'):
    if x is None:
        return None
    if isinstance(x, bytes):
        return x
    if isinstance(x, str):
        return x.encode(charset, errors)
    if isinstance(x, (int, float)):
        return str(x).encode(charset, errors)
    return bytes(x)


def base64_to_int(s):
    data = urlsafe_b64decode(to_bytes(s, charset='ascii'))
    buf = struct.unpack('%sB' % len(data), data)
    return int(''.join(["%02x" % byte for byte in buf]), 16)


obj = None ## I took these out of production so I can't reproduce them here. You will have to provide your own.


public_numbers = RSAPublicNumbers(
    base64_to_int(obj['e']), base64_to_int(obj['n']))

numbers = RSAPrivateNumbers(
    d=base64_to_int(obj['d']),
    p=base64_to_int(obj['p']),
    q=base64_to_int(obj['q']),
    dmp1=base64_to_int(obj['dp']),
    dmq1=base64_to_int(obj['dq']),
    iqmp=base64_to_int(obj['qi']),
    public_numbers=public_numbers)

print('before loading')
print(numbers.private_key(default_backend()))
print('after loading')

When running this on my arm machine (apple silicon) this is fast as in < .2 seconds. On a xeon virtual machine, running in Podman this however takes > 5s when executed via the code path in rsa_key.py.

The strange thing is, that this reproduction in isolation, is fast there too (<.2s) but when run in the full project this call to numbers.private_key() reliably takes more than 5 seconds.

Expected behavior

numbers.private_key() should always be fast.

Environment:

  • OS: Ubuntu 24.04.1 (host), Debian 12 latest (in docker)
  • Python Version: 3.12.7.
  • Authlib Version: 1.3.2

Additional context

I do not understand what kind of difference in the environment is the reason, but this behavior feels really weird. Especially since I can reliably reproduce this when running inside the application, but not when running the above reproduction independently (inside the same docker container on the same system)

Inspired by this bug in cryptography I tried downgrading to cryptography < 37 - and that restores the missing speed in the application.

This is my current workaround, but of course using such a really old version of openssl (1.1.1) is a bad idea long term.

@dwt dwt added the bug label Nov 12, 2024
@dwt
Copy link
Author

dwt commented Nov 19, 2024

Friendly ping @lepture

Is there any context I can provide? Do you have any hints for me what could cause this behavior? I would be happy to do further debugging to track this down.

@dwt
Copy link
Author

dwt commented Dec 10, 2024

Another friendly ping @lepture

This behavior is really bugging me. I would love to get some guidance on how to debug this further.

@bjmc
Copy link

bjmc commented Dec 10, 2024

If the behavior varies depending on the silicon and is affected by the version of cryptography then I'd guess the issue is lower down the stack than authlib. Could it be something like the prod environment doesn't have enough entropy available so it takes a really long time to accumulate sufficient randomness?

You might try generating a flamegraph to see where it's hanging. My gut feeling is this is not a bug in authlib, but I could be wrong.

@lepture
Copy link
Owner

lepture commented Dec 13, 2024

This seems not a thing that Authlib can solve.

@dwt
Copy link
Author

dwt commented Dec 13, 2024

I like the suggestion about the entropy, as that is something I have wondered about too. Do you by chance have any suggestions what the best way to check this could be? I thought that docker provides the kernel randomness in the containers too (since the kernel is shared). Do you happen to know if this is not the case?

@dwt
Copy link
Author

dwt commented Dec 13, 2024

I will be looking into the flame graph, that could help. However the runtime environment is.... kind of hard to reach, so I'm not yet sure if that is really possible.

@dwt
Copy link
Author

dwt commented Dec 13, 2024

@lepture Regarding what could be done by Authlib: Perhaps reloading and verifying the key from disk less frequently (i.e. on every request). That would probably help, as the key does not need to be verified for each request.

@lepture
Copy link
Owner

lepture commented Dec 14, 2024

@dwt is it related with docker? If you run outside docker, will it be that slow?

@dwt
Copy link
Author

dwt commented Dec 14, 2024

This is running in Podman, outside of Podman it seemed fast.

@lepture
Copy link
Owner

lepture commented Jan 18, 2025

Do you have to generate the RSA key every time? Can you cache the key like:

_cached_keys = {}

def get_rsa_key(name: str):
    if name in _cached_keys:
        return _cached_keys[name]
    key = _generate_rsa_key(name)
    _cached_keys[name] = key
    return key

@lepture
Copy link
Owner

lepture commented Jan 18, 2025

According to cryptography

The cause of this is almost certainly our upgrade to OpenSSL 3.0, which has a new algorithm for checking the validity of RSA private keys, which is known to be much slower for valid RSA keys.

Unfortunately we're in between a rock and a hard place and don't have a way to avoid this cost without silently accepting invalid keys which can have all sorts of negative consequences.

@lepture lepture added wontfix and removed bug labels Jan 18, 2025
@lepture
Copy link
Owner

lepture commented Jan 18, 2025

Closed, since we can't fix it in Authlib.

@lepture lepture closed this as completed Jan 18, 2025
@dwt
Copy link
Author

dwt commented Jan 19, 2025

Thanks for trying!

@dwt
Copy link
Author

dwt commented Jan 19, 2025

Regarding the caching: I would like to. These keys are read from the filesystem on each request - and that is slow because of the check in cryptography.

Caching them so they are only re-read when they change on disk would probably solve this problem perfectly fine. That might be something that authlib can actually do to speed this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants