Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disable ratelimit quota handler #22314

Closed
clwluvw opened this issue Aug 12, 2023 · 16 comments
Closed

Allow disable ratelimit quota handler #22314

clwluvw opened this issue Aug 12, 2023 · 16 comments

Comments

@clwluvw
Copy link
Contributor

clwluvw commented Aug 12, 2023

Is your feature request related to a problem? Please describe.
In performance benchmarks, it has been seen that rateLimitQuotaWrapping would eat a significant CPU time and brings ~0.5ms latency on each request.

Describe the solution you'd like
Like other handlers, make it able to drop rateLimitQuotaWrapping at all.

Describe alternatives you've considered
Drop the effect of the wrapper when there is no rate limiter in place.

Additional context
Screenshot 2023-08-13 at 01 47 38

@biazmoreira
Copy link
Contributor

@clwluvw which backends you made requests to in your performance benchmark? We are currently fixing a bug related to quotas increasing requests latency, but this only happens for certain backends and certain types of quotas. I'm interested in hearing more about which test cases you've run.

@biazmoreira biazmoreira self-assigned this Aug 31, 2023
@clwluvw
Copy link
Contributor Author

clwluvw commented Aug 31, 2023

I'm not sure what I see in the wrapper code relates to a specific backend but my tests were on raft and etcd backends.
@biazmoreira Can you point me to the PR you think might fix it?

@biazmoreira
Copy link
Contributor

@clwluvw sorry, I meant which auth method you were using to issue requests to Vault.

And the PR: #22597 and #22620 and #22583

We think those will fix the latency issues you were noticing.

@mpalmi
Copy link
Contributor

mpalmi commented Aug 31, 2023

Note that with the fixes above, you'll still incur 1xRTT latency for logins when using auth methods that implement the ResolveRoleOperation.

There's one more fix in flight which should allow you to get parity with pre-1.12 version latency for such cases (unless you have a role-based quota applied to the namespace/mount) by setting a config option:

#22651

@clwluvw
Copy link
Contributor Author

clwluvw commented Aug 31, 2023

@biazmoreira I used token-based auth (basically the root token) for my requests. Do you still think those PRs would resolve the CPU time I saw in the wrapper?
I didn't have any rate-limit config in place so I'm surprised that ~0.5ms latency is added to the requests because of a wrapper that is going to do nothing in the end...
From what I see in the code there were many flows taken until it gets there was no rate limit configured so the wrapper could be ended.
Probably introducing the config I proposed in my PR could help in those situations where the setup is not expecting to have any rate limit and is willing to gain microseconds of latency.

@biazmoreira
Copy link
Contributor

@clwluvw Instead of bypassing the wrapper altogether, we pinpointed what in the wrapper was causing the increase in latency and decided to fix it. You can check that https://github.com/hashicorp/vault/pull/22597/files modifies the wrapper you mentioned in the PR description. We think this will fix the issue, but as Mike mentioned, it might happen for cases in which role-based quotas exist and the auth method implements ResolveRoleOperation.

@biazmoreira
Copy link
Contributor

Also, can you give us more details about the requests you used for the benchmark? Which auth methods, APIs, secret engines, etc?

@clwluvw
Copy link
Contributor Author

clwluvw commented Aug 31, 2023

I used the root token for authentication and transit engine (https://developer.hashicorp.com/vault/api-docs/secret/transit#generate-data-key) to generate the data key. The latency for key generation was in microseconds but with the rate limit wrapper, I always get more than 1ms... After disabling that (and building the source again) I could gain sub-ms latency from the API. The CPU profiler was showing that...
With this PR (#22597) Can you help me to understand now:

  1. Are we expecting a DB query on every request to see if the path has a role-based quota (
    quota, err := txn.First(qType, indexNamespaceMount, req.NamespacePath, req.MountPath, false, true)
    )?
  2. Are we expecting the wrapper to be finished here (https://github.com/hashicorp/vault/blob/main/vault/quotas/quotas.go#L789) if no rate-limit is in place?

@biazmoreira
Copy link
Contributor

Are we expecting a DB query on every request to see if the path has a role-based quota

No, we make a query in-memory for login requests specifically to see if the path has a role-based quota.

@biazmoreira
Copy link
Contributor

Thanks for all the context, @clwluvw. We have just merged the last change that we think could fix the issue in latency you're seeing. Would you be able to test it once more with the latest commit from main?

@clwluvw
Copy link
Contributor Author

clwluvw commented Sep 1, 2023

@biazmoreira I built from (e1895d8) and in addition, I put some time measurement over the rateLimitQuotaWrapping func (before calling handler.ServeHTTP(w, r)):

diff --git a/http/util.go b/http/util.go
index 19bdbf836c..f2be83350f 100644
--- a/http/util.go
+++ b/http/util.go
@@ -12,6 +12,7 @@ import (
 	"net"
 	"net/http"
 	"strings"
+	"time"
 
 	"github.com/hashicorp/vault/sdk/logical"
 
@@ -40,6 +41,7 @@ var (
 
 func rateLimitQuotaWrapping(handler http.Handler, core *vault.Core) http.Handler {
 	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		startTime := time.Now()
 		ns, err := namespace.FromContext(r.Context())
 		if err != nil {
 			respondError(w, http.StatusInternalServerError, err)
@@ -127,8 +129,8 @@ func rateLimitQuotaWrapping(handler http.Handler, core *vault.Core) http.Handler
 			return
 		}
 
+		core.Logger().Info("ratelimit time", "spend", time.Since(startTime).Seconds())
 		handler.ServeHTTP(w, r)
-		return
 	})
 }

and here are the sample times (in seconds):

2023-09-01T13:40:16.649Z [INFO]  core: ratelimit time: spend=0.00050588
2023-09-01T13:40:17.917Z [INFO]  core: ratelimit time: spend=0.000561915
2023-09-01T13:40:18.924Z [INFO]  core: ratelimit time: spend=0.000503233
2023-09-01T13:40:19.437Z [INFO]  core: ratelimit time: spend=0.000239324
2023-09-01T13:40:19.788Z [INFO]  core: ratelimit time: spend=0.00051691
2023-09-01T13:40:20.496Z [INFO]  core: ratelimit time: spend=0.000536714
2023-09-01T13:40:21.295Z [INFO]  core: ratelimit time: spend=0.000484667
2023-09-01T13:40:22.093Z [INFO]  core: ratelimit time: spend=0.000508987
2023-09-01T13:40:22.820Z [INFO]  core: ratelimit time: spend=0.000500172
2023-09-01T13:40:23.675Z [INFO]  core: ratelimit time: spend=0.000493217

Still, ~0.5ms is consumed for the wrapper.

@clwluvw
Copy link
Contributor Author

clwluvw commented Sep 7, 2023

@biazmoreira Do you have any thoughts on this? Would the PR I linked to this issue be good for such use cases?

@biazmoreira
Copy link
Contributor

biazmoreira commented Sep 14, 2023

Hey @clwluvw, we would like to understand your issue more in depth. Would you be able to give us more details about the use case in which this problem appears? Was this meant to be only a performance benchmark or are you facing this issue in production every time you issue a request? Would you be able to give us more insight on your use case?

Unfortunately, we won't be moving forward with your suggestion implemented in the PR. We would like to understand your use case a bit more before we think of other solutions to this specific case.

One suggestion, though, is to configure exempt_rate_limit_paths when configuring rate limits. Be aware that this configuration option overrides the default values and only take absolute paths.

Looking forward to hearing more from you about the use case. Let me know if there's anything else I can be of help!

@clwluvw
Copy link
Contributor Author

clwluvw commented Sep 14, 2023

@biazmoreira The issue is on a default vault installation (without any rate limit configuration - authenticate with root token - no matter what the API is (here in my case was the one I mentioned above)) - the rate limit handler always consumes 0.5ms time. This would become a problem on installations that microseconds matter. I mean sometimes you cannot close your eyes on 0.5ms time consumption...

So now to reproduce that you can apply the diff I paste for you - bring up a vault instance (no HA - backend on memory - no rate limit) - issue a request to one of the APIs (probably the one I mentioned would be better) - and see the output. In my case, I see 0.5ms consumed by the wrapper while I haven't configured any rate limits anywhere.
So the question I got is why the latency for the request I issued should always be +0.5ms (i.e. I'm looking to reduce the effect of the rate limit wrapper because I haven't configured any rate limits and will not as well).

Can't we have the wrapper with less microseconds effect? or make it configurable to disable it (as there is already one disabling option for one of the wrappers)?

@akshya96
Copy link
Contributor

We acknowledge your concerns and share the belief that we should minimize unneeded latency, especially in request hot paths. However, there are instances where substantial latency increases cannot be entirely avoided. In practice, our preference is to refrain from adding configuration options to disable features solely due to latency concerns, as it may set an undesirable precedent. Imagine if we were to follow this approach for every feature that introduces latency. One potential solution to address your latency issues without resorting to feature toggles is to explore the option of using rate_limit_exempt_paths. Have you considered this as an alternative to your proposed solution?
While the benchmarks provided in the issue demonstrate a hypothetical latency increase, it’s important to note that actual latency can vary significantly based on performance characteristics and the load within the environment. At present, due to our team’s roadmap and resource constraints, we are unable to allocate resources to address this issue. However, if a compelling real-world example arises where this problem significantly impacts users, we would be willing to adjust our priorities accordingly. Until then, we may need to defer further action on this matter, but rest assured that we are open to revisiting it should the need arise.

@heatherezell
Copy link
Contributor

As per @akshya96's reply and our engineering methodologies, I'm going to go ahead and close this issue at this time. If needed, please feel free to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants