Make gRPC response code configurable when rate limiting occurs #4735

nikolay-pshenichny · 2018-10-15T22:55:36Z

Current envoy implementation returns UNAVAILABLE if a gRPC call was rate-limited (rate limit service responded with OVER_LIMIT). This behavior matches the HTTP to gRPC Status Code Mapping , but may not always be what a "client" wants to see ( Google Cloud documentation suggests mapping 429 to RESOURCE_EXHAUSTED).

Use cases:

a client service needs to know more details in order to pick the right retry policy, or not retry the call at all.
a service owner needs to know more details about why a gRPC call failed.

Feature request details:

This feature request is to make the gRPC response code configurable for the scenarios when a gRPC call was rate-limited by Envoy on the server-side.

Suggested config option names: rate_limited_response_code or just response_code
Suggested choices: UNAVAILABLE (default) and RESOURCE_EXHAUSTED
Suggested location for the new config option: config.ratelimit.v2.RateLimitServiceConfig

Relevant Links:

The text was updated successfully, but these errors were encountered:

venilnoronha · 2018-10-18T18:40:25Z

I'd like to work on this.

ramaraochavali · 2018-10-19T03:33:52Z

@venilnoronha How are you planning to fix this? Are you planning to introduce the config in rate limiting service configuration for this specific status and return the status accordingly or are you planning to provide a higher level config that selects one of grpc code state mapping that applies to all services/status codes?

venilnoronha · 2018-10-22T23:05:39Z

I'm thinking in the direction of a generic status mapping approach, as there are a few statuses that map to UNAVAILABLE. See https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md.

lizan · 2018-10-23T01:52:34Z

Looks like gRPC C core does map 429 to RESOURCE_EXHAUSTED:
https://github.com/grpc/grpc/blob/master/src/core/lib/transport/status_conversion.cc#L82

But Java and Go does map 429 to UNAVAILABLE:
https://github.com/grpc/grpc-java/blame/master/core/src/main/java/io/grpc/internal/GrpcUtil.java#L305
https://github.com/grpc/grpc-go/blob/cfb96008516cbda549042f43fdd23befea331172/internal/transport/http_util.go#L91

This is confusing so I would recommend to double check with gRPC maintainers.

Aside from that, it might be useful anyway by adding a response status (HTTP/gRPC) in RateLimitResponse, so the rate limit service can override the status code Envoy send to downstream.

ramaraochavali · 2018-10-23T08:49:33Z

@lizan Yes, We should check with gRPC maintainers and sort that confusion. On the Envoy side implementation instead of doing specific status code mappings at specific services as needed, can we have couple of well known mappings and allow the user to choose one via CLI/bootstrap? This may be more generic than implementing some thing specific to rate limit. WDYT?

lizan · 2018-10-23T10:10:57Z

@ramaraochavali Given the only deviation is mapping of HTTP 429 Too Many Requests, I doubt there is a need for generic code mapping override. IMO making rate limit service respond custom status code is more usable in long term, just like we added header there before.

ramaraochavali · 2018-10-23T10:28:07Z

@lizan See the status mapping here in Envoy

envoy/source/common/grpc/status.cc

Line 6 in 1570696

Status::GrpcStatus Utility::httpToGrpcStatus(uint64_t http_response_status) {

. It maps currently maps 429, 502,503 and 504 to UNAVAILABLE as per https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md. However this https://cloud.google.com/apis/design/errors#generating_errors maps 504 to DEAD_LINE_EXCEEDED for example. Should we handle that difference as well?
Related issue #3619

What --- Change the returned gRPC code from `Unavailable` to `ResourceExhausted` when the user reaches concurrency limits. Why --- In https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8056 and https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8071 we saw a single user paging the on-call because of the high error rate. When we looked at the error rate it was because they were reaching [concurrency limits](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8056#note_1174303820). Rate limiting a user is not an error for us but normal behavior just like a 429 HTTP status code. In https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16844#note_1175622086 we looked into the best gRPC code to return, and `ResourceExhausted` was the best one where we could differeiencate between a real server error and a user error. This goes against the [grpc mapping](https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md) but also follows what the [envoy proxy does](envoyproxy/envoy#4735). Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16844 Reference: https://gitlab.com/gitlab-org/gitaly/-/issues/4637 Signed-off-by: Steve Azzopardi <[email protected]>

mattklein123 added the enhancement label Oct 17, 2018

lizan assigned venilnoronha Oct 18, 2018

ramaraochavali mentioned this issue Oct 25, 2018

retry of rate limited requests #4855

Closed

venilnoronha mentioned this issue Oct 27, 2018

rate-limit: make 429 response mapping configurable #4879

Merged

mattklein123 closed this as completed in #4879 Nov 7, 2018

legendtang mentioned this issue Jun 21, 2019

Inconsistent error code conversion with gRPC cpp grpc/grpc-java#5916

Closed

BrendanGraham14 mentioned this issue May 26, 2022

Support rate_limited_as_resource_exhausted for local rate limit events #21461

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make gRPC response code configurable when rate limiting occurs #4735

Make gRPC response code configurable when rate limiting occurs #4735

nikolay-pshenichny commented Oct 15, 2018

venilnoronha commented Oct 18, 2018

ramaraochavali commented Oct 19, 2018

venilnoronha commented Oct 22, 2018

lizan commented Oct 23, 2018

ramaraochavali commented Oct 23, 2018

lizan commented Oct 23, 2018

ramaraochavali commented Oct 23, 2018

Make gRPC response code configurable when rate limiting occurs #4735

Make gRPC response code configurable when rate limiting occurs #4735

Comments

nikolay-pshenichny commented Oct 15, 2018

venilnoronha commented Oct 18, 2018

ramaraochavali commented Oct 19, 2018

venilnoronha commented Oct 22, 2018

lizan commented Oct 23, 2018

ramaraochavali commented Oct 23, 2018

lizan commented Oct 23, 2018

ramaraochavali commented Oct 23, 2018