How do I set up RPS limit #670

jawlitkp · 2019-03-07T14:44:13Z

No description provided.

rramkumar1 · 2019-03-09T00:29:52Z

@jawlitkp FWIW, we currently, we set the RPS per backend to be 1 for both the InstanceGroup and NEG cases [1].

We can definitely surface this setting to users via the BackendConfig CRD [2] but note that the setting only makes sense when using Ingress w/ NEG's [3]. If you are using InstanceGroup's the setting does not behave how you would expect.

[1] https://github.com/kubernetes/ingress-gce/blob/master/pkg/backends/ig_linker.go#L62
[2] https://github.com/kubernetes/ingress-gce/blob/master/pkg/apis/backendconfig/v1beta1/types.go
[3] https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing

jskeet · 2019-03-16T20:46:11Z

I've manually updated the RPS in my GCE Load Balancer to 100 for each service. (Currently NEG is out of scope as my cluster doesn't have VPC enabled, and I'd rather not go down that path right now if I don't need to.)

Will that RPS configuration be "sticky", or will it be lost next time I update the ingress, e.g. to add another domain?

bowei · 2019-03-17T06:28:01Z

What is your use case is for setting an RPS limit? Traffic is double balanced coming into node then pod. Traffic will be balanced across nodes first. You will probably have worse balancing behavior with the limit than setting it to 1 (i.e. completely uniform spread across all nodes).

jskeet · 2019-03-17T06:34:18Z

Ah, I see. It was mostly just to remove the warning when looking at monitoring, to be honest.

lentzi90 · 2019-05-21T07:03:58Z

The documentation for the RPS limit is quite confusing when combined with how GKE is using it. If I understood the comments above correctly, it is not really a capacity, but rather a weight to make sure the load is spread evenly over all instances. This makes sense, but I think the documentation should also mention this. Maybe it should even be renamed and the warning removed from the cloud console. (I can open a new issue for this.)

Perhaps more severe is the following. What happens if we use multiple regions? With 1 RPS capacity, would the load be spread uniformly over all regions or would requests go to the closest region even if "usage is at capacity" (preferred since it is not really capacity)?

This seem to indicate that requests would actually go to other regions if there is no available capacity:

If the closest instances to the user have available capacity, then the request is forwarded to that closest set of instances.

Incoming requests to the given region are distributed evenly across all available backend services and instances in that region. However, at very small loads, the distribution may appear to be uneven.

If there are no healthy instances with available capacity in a given region, the load balancer instead sends the request to the next closest region with available capacity.

fejta-bot · 2019-08-19T07:37:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

lgelfan · 2019-08-25T00:51:37Z

We recently started testing GCE ingress / HTTP(s) LB as all our others are nginx-ingress and noticed this issue. Certainly would be good to clarify the MAX=1 (per instance) default and if there are use cases for increasing it to improve performance. I'm not sure how using sessionAffinity might affect the optimal setting. We are using NEG so I assume from the previous comments that's how it was set but it's not clear if increasing it would improve performance. We are using websockets for this app and see a fair amount of 502 errors so I was wondering if that might have anything to do with it. We increased the number of instances to more than double what it was for nginx-ingress setup and that did help. It increased the MAX=n value do to there being more instances per zone, but again, I don't know if it's the increased application capacity or that this (effectively) changed the RPS setting. While there were a few long-running connections (websocket and long-polling), overall the traffic was pretty low and the CPU and memory were minimal (Node.js app).
The backend-services output includes:

- balancingMode: RATE
  capacityScaler: 1.0
  group: https://www.googleapis.com/compute/v1/projects/xxx
  maxRatePerEndpoint: 1.0

I would be interested in any updates or clarification.

Jannis · 2019-10-02T12:46:30Z

We're using Kubernetes on Google Cloud, with the GCE ingress for load balancing. As far as I can see there is currently no way to configure the RPS per instance/group in BackendConfig. It would be fantastic if that was possible, so that you don't have to the Google Cloud Platform console to configure it.

Am I maybe missing something and configuring this is already possible through Kubernetes?

fejta-bot · 2019-12-31T13:04:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-01-30T13:48:14Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-02-29T14:30:16Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-02-29T14:30:24Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mheffner · 2020-06-24T17:38:41Z

Was there a fix here? I see this got closed due to inactivity, but I didn't see a followup.

marcin-ptaszynski · 2020-07-01T13:59:27Z

Hi @rramkumar1 I've been using GKE with NEGs and so far I was able to manually set BackendService maxRatePerEndpoint and the setting persisted, but now it seems the setting is reverted back to 1 RPS/Endpoint by the controller after a few minutes. Is there any way to workaround this?

rramkumar1 · 2020-07-06T12:49:12Z

@marcin-ptaszynski This is not something that is surfaced today in any of our APIs.

cc @freehan to see if there are any workarounds

freehan · 2020-07-06T18:44:12Z

@marcin-ptaszynski Can you check if the backend service contains any beta or alpha feature? We recently fixed a bug where NEG linker will refresh the backends when there is alpha/beta feature enabled on backend-service. #1162 I think the fix will be included in the latest roll out in GKE rapid channel.

marcin-ptaszynski · 2020-07-08T12:24:11Z

@freehan , thank you. We're using grpc and our service annotations look like:

  annotations:
    beta.cloud.google.com/backend-config: '{"ports": {"http2":"grpc-backend-config"}}'
    cloud.google.com/app-protocols: '{"http2":"HTTP2"}'
    cloud.google.com/neg: '{"ingress": true}'

So it seems we're hitting all of beta features.

In any case, exposing MaxRPS config via BackendConfig CRD would be great to have as it (with managedcertificates operator) would allow automating full LB lifecycle from GKE.

williambrode · 2020-08-07T16:36:06Z

It's worth noting that if you can't change the MaxRPS then Session Affinity is virtually worthless. Session Affinity only works when the endpoints/instances aren't at max load. I might open a separate issue that references this for clarity on the user-facing issue.

fejta-bot · 2020-11-05T17:03:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bowei · 2021-02-01T17:57:08Z

/reopen

k8s-ci-robot · 2021-02-01T17:57:19Z

@bowei: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bowei · 2021-02-01T17:57:22Z

/remove-lifecycle rotten

grzegorz655 · 2021-03-03T07:30:03Z

This is still an issue any way for adding balancing-mode and max RPS to backendconfig with NEG?

bowei · 2021-03-03T07:33:28Z

Yes, we are looking at pulling it from our backlog, the most likely avenue is as an annotation on Service.

ivansenic · 2021-05-24T17:02:10Z

Can anyone clarify what Max RPS means in this context? Is it really the max request per second allowed to be executed against the back-end. This would be incredibly stupid?

However, we had a traffic spike during this weekend and indeed there were bunch of 502 most likely coming from the ingress. GKE backends were healthy all the time and were perfectly fine using almost no resources at all. No restarts as well. Can this be the reason? And what's the logic for setting this by default to one?

One of our services has at the moment 165 RPS and it labeled with The usage is at capacity with a nice ⚠️ sign. Why the hack are we paying for this if it's giving us such a ridiculous defaults? Increasing manually did remove the warning, but is not the way we want it. Fix this!

williambrode · 2021-05-24T17:10:40Z

@ivansenic no it doesn't mean it will refuse requests if you go beyond the limit. Its used for load balancing so if you have 2 endpoints in your NEG and MAX 10 RPS and one endpoint starts getting more than 10 RPS it will start routing requests to the other one. With 1 RPS on each endpoint it just means distribute the requests as evenly as possible.

If you use a global multi-cluster load balancer it will also sum the total RPS of a region and route to other regions if it goes beyond the max. So in that case you really don't want 1 RPS max because it would start routing your US-west traffic to your Australia cluster for example (I know from experience).

bowei · 2021-05-24T17:10:47Z

Max rps means try to fill backends up to max rps before spilling over; i.e. if you have 5 backends with max RPS / endpoint = 10, they will each get 10 rps before getting more. No requests will be dropped -- the only effect is on how the traffic is distributed.

ivansenic · 2021-05-24T18:12:45Z

Ok thanks sorry for confusion.

ben-xo · 2021-05-28T11:17:20Z

We have also hit this issue. Is there a way to attach a maxRatePerEndpoint to an ingress in GKE yet? (the issue is as above that each backend can do about 12 RPS, but when the pods are unevenly distributed by zone, google will round robin them as it doesn't know where there's free capacity, despite knowing that the backends are unevenly distributed).

williambrode · 2021-05-28T16:05:21Z

@ben-xo I'm confused what you mean? The loadbalancer should be able to directly balance to each pod in GKE - so I'm not sure why unevenly distributing by zone would be a problem?

If you are using the autoneg-controller
Then you set the maxRatePerEndpoint via the annotation on the service:

anthos.cft.dev/autoneg: '{"name":"autoneg_test", "max_rate_per_endpoint":1000}'

abrahammartin · 2021-07-12T15:16:11Z

Is there any ETA for this? We currently have to change this manually via the UI instead of via K8s manifest, which is far from ideal.

cparedes · 2021-09-09T17:19:09Z

Running into this too. Would love to be able to set this programmatically with either BackendConfig or a service annotation

k8s-triage-robot · 2021-12-08T17:54:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-01-07T18:16:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

ben-xo · 2022-01-24T17:28:09Z

/remove-lifecycle stale

ben-xo · 2022-01-24T17:29:09Z

/remove-lifecycle rotten

k8s-triage-robot · 2022-04-24T22:00:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-05-24T22:37:20Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

swetharepakula · 2022-06-14T22:46:58Z

We do not plan to add this feature to Ingress. This feature is available with the GKE Gateway API Implementation. Please use the Gateway API for this feature.

rramkumar1 changed the title ~~How do I set up RPS limit ,currently by default it is being set up to 100000000000000~~ How do I set up RPS limit Mar 11, 2019

rramkumar1 added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 11, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2019

spencerhance mentioned this issue Oct 8, 2019

Add v1.5.2, v1.6.0 and v1.7.0 to CHANGELOG.md #887

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 30, 2020

k8s-ci-robot closed this as completed Feb 29, 2020

rramkumar1 reopened this Jul 6, 2020

rramkumar1 removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 6, 2020

k8s-ci-robot reopened this Feb 1, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 1, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 8, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 7, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 24, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2022

swetharepakula closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2022

How do I set up RPS limit #670

How do I set up RPS limit #670

Comments

jawlitkp commented Mar 7, 2019

rramkumar1 commented Mar 9, 2019

jskeet commented Mar 16, 2019

bowei commented Mar 17, 2019

jskeet commented Mar 17, 2019

lentzi90 commented May 21, 2019

fejta-bot commented Aug 19, 2019

lgelfan commented Aug 25, 2019 • edited Loading

Jannis commented Oct 2, 2019

fejta-bot commented Dec 31, 2019

fejta-bot commented Jan 30, 2020

fejta-bot commented Feb 29, 2020

k8s-ci-robot commented Feb 29, 2020

mheffner commented Jun 24, 2020

marcin-ptaszynski commented Jul 1, 2020 • edited Loading

rramkumar1 commented Jul 6, 2020

freehan commented Jul 6, 2020

marcin-ptaszynski commented Jul 8, 2020

williambrode commented Aug 7, 2020

fejta-bot commented Nov 5, 2020

bowei commented Feb 1, 2021

k8s-ci-robot commented Feb 1, 2021

bowei commented Feb 1, 2021

grzegorz655 commented Mar 3, 2021

bowei commented Mar 3, 2021

ivansenic commented May 24, 2021

williambrode commented May 24, 2021

bowei commented May 24, 2021

ivansenic commented May 24, 2021

ben-xo commented May 28, 2021

williambrode commented May 28, 2021

abrahammartin commented Jul 12, 2021

cparedes commented Sep 9, 2021

k8s-triage-robot commented Dec 8, 2021

k8s-triage-robot commented Jan 7, 2022

ben-xo commented Jan 24, 2022

ben-xo commented Jan 24, 2022

k8s-triage-robot commented Apr 24, 2022

k8s-triage-robot commented May 24, 2022

swetharepakula commented Jun 14, 2022

lgelfan commented Aug 25, 2019 •

edited

Loading

marcin-ptaszynski commented Jul 1, 2020 •

edited

Loading