Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance/CPU usage regression in v0.40.0 #1033

Closed
6 tasks done
David-Wobrock opened this issue Nov 15, 2022 · 10 comments · Fixed by #1042
Closed
6 tasks done

Performance/CPU usage regression in v0.40.0 #1033

David-Wobrock opened this issue Nov 15, 2022 · 10 comments · Fixed by #1042
Assignees
Labels
bug Something is not working.

Comments

@David-Wobrock
Copy link
Contributor

Preflight checklist

Describe the bug

When upgrading from v0.39.4 to v0.40.0, we experience an increase of CPU load on our Oathkeeper instances.
In our case, our Kubernetes pods were throttled and our response times at least doubled.

There is the experienced increased CPU usage:
image
From roughly 20-30 mcores, to around 200 mcores.

Discussed in the Ory Slack here https://ory-community.slack.com/archives/C01340V8KSM/p1668072872425319

Reproducing the bug

In version v0.40.0 and in our oathkeeper config we have 3 rules and one mutator.

  • oauth2_introspection, being the most used
  • cookie_session
  • bearer_token
    And a mutator that injects an HTTP header.

Relevant log output

No response

Relevant configuration

No response

Version

v0.40.0

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Kubernetes

Additional Context

No response

@David-Wobrock David-Wobrock added the bug Something is not working. label Nov 15, 2022
@aeneasr
Copy link
Member

aeneasr commented Nov 15, 2022

Thank you for the report! I think this is for sure a regression introduced by the new config system. Maybe, we're now unmarshalling JSON on every request or something similar. Could you maybe make a memory / cpu profile of oathkeeper so that we can identify the culprit?

@David-Wobrock
Copy link
Contributor Author

David-Wobrock commented Nov 15, 2022

Hey @aeneasr
Indeed, it seems that when serving a request (in my profiling the /anything/header route from the local config):

In v0.39.4, we are unmarshalling JSON only once, on pipeline.mutate.Validate
image

Whereas in v0.40.0, it seems that we unmarshal JSON 4 times, after:

  • pipeline.authn.Authenticate
  • pipeline.authn.Validate
  • pipeline.mutate.Mutate
  • pipeline.mutate.Validate

image

Any idea what happened there? :)

@aeneasr
Copy link
Member

aeneasr commented Nov 16, 2022

This is probably for @hperl to pick up. He's however busy for the next 4-8 weeks. We will add this to our backlog but can't promise an immediate remedy.

@daviddelucca
Copy link
Contributor

Hey there! @hperl @alnr

Any news on it? CPU usage is being insane after bump to v0.40.0

@aeneasr
Copy link
Member

aeneasr commented Jan 27, 2023

There's unfortunately not an easy and straight forward fix for this. The profiling above shows where the problem is, and there is also a PR, but unfortunately there are several edge cases that would break with such a cache in place, in particular hot reloading.

I don't currently know what the appropriate solution is. Most likely it's improving the decoding in https://github.com/ory/oathkeeper/pull/1042/files#diff-cc3b4579387dc2abd0841eff38ec96aa687447a9bd2f1327b124e5678c443cd8R340 by using koanf built ins instead of marshal/unmarshal.

But as we're ok with the CPU usage (even if it is significantly higher it's still quite low), it's currently not a priority on our side to fix, but we do encourage PRs and ideas how to resolve it!

@aeneasr
Copy link
Member

aeneasr commented Jan 27, 2023

Ok, I double checked and think I found a solution, posted it here: #1042 (comment)

@aeneasr
Copy link
Member

aeneasr commented Jan 30, 2023

Merged, please check if this resolves the issues!

@David-Wobrock
Copy link
Contributor Author

Seems to work for us 👍
Looking forward a release with this patch 🚀

@daviddelucca
Copy link
Contributor

Same here!

How long will take to release a version with this patch?

@aeneasr
Copy link
Member

aeneasr commented Feb 15, 2023

We just merged another refactor to fix a pesky bug, we'll be testing that in our env and once we have confidence do a release. The release policy for bigger changes is once every quarter (at least) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants