-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ratelimit API #1767
Add Ratelimit API #1767
Conversation
😊 Welcome @hzxuzhonghu! This is either your first contribution to the Istio api repo, or it's been You can learn more about the Istio working groups, code of conduct, and contributing guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
@hzxuzhonghu: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@@ -631,3 +635,27 @@ message Certificate { | |||
// multiple DNS names. | |||
repeated string dns_names = 2; | |||
} | |||
|
|||
// RateLimitService describes the configuration for an external rate limit service provider. | |||
message RateLimitService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual: do we expect the rate limit service to be the same in all namespaces and pods ? No possible use case where different namespaces would use different rate services ?
Why not add it to ProxyConfig instead, and use the existing pattern of RemoteService ?
If you want to keep it in MeshConfig - maybe the ExtensionProvider (which was just added) would be needed, but I don't think we have an approved design on using ExtensionProvider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, i think this should be global. It is easy to operate. I understand ProxyConfig is used to generate configs for proxy in proxy side. But rls is used in istiod. As ExtensionProvider
, it is only for external authz.
|
||
// The timeout for the rate limit service RPC. | ||
// If not set, this defaults to 20ms. | ||
google.protobuf.Duration timeout = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all services we use in Istio, we need to document how auth is going to be performed. Are all rate services using standard Istio mTLS or do we need to support other mechanisms ?
Also, is this service subject to normal Istio APIs - DestinationRule for example ? If yes ( and I assume it should ), do we want to duplicate timeout here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout here is from the application view(envoy->rls). DR can be used, but it will require an additional config
// The filter’s behaviour in case the rate limiting service does not respond back. | ||
// When it is set to true, Envoy will allow traffic in case of communication failure | ||
// between rate limiting service and the proxy. | ||
bool fail_open = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default bool value is false - i.e. fail close. I don't think that's correct for a rate service - I would rather have the setting be 'fail_closed', so the user needs to explicitly add it if he wants this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is the behavior of the legacy mixer.
// RateLimitService describes the configuration for an external rate limit service provider. | ||
message RateLimitService { | ||
// REQUIRED. Specifies the service that implements rate limit service. | ||
// The format is "[<Namespace>/]<Hostname>". The <Hostname> is the full qualified host name in the Istio service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we use this pattern in some configs, like Gateway, for a specific purpose ( delegation ).
Do we need this here ? If there is a strong use cases - I will request any implementation before we unhide this API will have test cases for each option. Maybe start with the simple solution first, of using a ServiceEntry/DestinationRule/etc in istio-system and not provide too much complexity of allowing arbitrary namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not enforce it has to be in istio-system
.
|
||
// The following descriptor entry is appended to the descriptor: | ||
// `("generic_key", "<descriptor_value>")` | ||
message GenericKey{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is introducing a lot of things that feel duplicated and quite complex. Don't we have other APIs that deal with header matching, etc ( in Policy, routing ) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to allow specifying any key-val desctriptor entry and optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better make input be the properties from request like machs instead of introducing a key-val pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are useful. Generic value could be the route name for example.
string prefix = 5; | ||
|
||
// RE2 style regex-based match (https://github.com/google/re2/wiki/Syntax). | ||
string regex = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usual question about regex - what are the test plans ( I don't think the design doc includes enough detail ), and do we really need regex ? Perf impact, complexity, etc.
Even if Envoy supports it - we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex provides much more flexibility, like uri regex match? Shouldn't we make it extensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all new APIs must be added as hidden, and will only be unhidden when the implementation is ready ( including tests for anything that is getting added, in particular for beta APIs.
|
||
// A RateLimitDescriptor is a list of hierarchical entries that are used by the service to | ||
// determine the final rate limit key and overall allowed limit. Here are some examples of how | ||
// they might be used for the domain "envoy". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the examples are missing, also to be able to have test coverage we'll need a more explicit list of what is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could global rate limit api provide the similar attribute such as max_tokens interval which defines the rate limit ? The current global rate limit api defines required input to Envoy's ratelimit server rather than the ratelimit rule. And it's hard for end ursers.
@@ -1273,6 +1397,53 @@ message RouteDestination { | |||
// version. If there is only one destination in a rule, all traffic will be | |||
// routed to it irrespective of the weight. | |||
int32 weight = 2; | |||
|
|||
// L4 local rate limiting policy. | |||
LocalRateLimit local_rate_limit = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the goal to switch to BTS, and the likely situation where L4 rate limit will need to take into account attributes of the request - including TLS and metadata we add (telemetry for example): I would very much prefer to fold this into a single RateLimit, that applies for both HTTP and TCP.
We will need to document which headers/features are supported depending on connection type - but there are too many variations depending on TLS, transport, protocol to create a separate proto for each case - it seems cleaner to just treat all 'features' the same for http and TCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing making TCP and HTTP together, they have many differences. For tcp, i donot think metadata can be rate limited with upstream envoy
int32 status_code = 1; | ||
|
||
// The token bucket configuration to use for rate limiting requests. | ||
TokenBucket token_bucket = 2 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the HTTP rate limit doesn't use any of the request attributes ? It seems identical with the L4, except the status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, request attributes is not supported with envoy's local rate limit.
message HTTPLocalRateLimit { | ||
// StatusCode allows for a custom HTTP response status code to the downstream client | ||
// when the request has been rate limited. Defaults to 429 (TooManyRequests). | ||
int32 status_code = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use case ? 429 is a pretty standard response, why would we need another option ? It adds tests and complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I can remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also since this is touching beta APIs that we'll have to support for a LONG time, we need again to figure out how to clearly document that the added fields are experimental/brand new and may change ( given how many and complex it looks it's likely ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing: sorry for not reading the design carefully, I realize there is a major issue here, the VirtualService and model here is applied outbound - i.e. the rate service will have no effect if a client doesn't use Istio.
It is possible to apply the rate limit in Gateway for example, but that would be a different API ( i.e. if only gateway is supposed to do rate limit calls ).
We do need some way to enforce rate limits on inbound as well.
On Mon, Dec 7, 2020 at 4:47 AM Zhonghu Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In networking/v1alpha3/virtual_service.proto
<#1767 (comment)>:
> + // If not set it defaults to ‘generic_key’ as the descriptor key.
+ string descriptor_key = 1;
+
+ // The value to use in the descriptor entry.
+ string descriptor_value = 2 [(google.api.field_behavior) = REQUIRED];
+ }
+}
+
+// HTTPLocalRateLimit used to rate limit the HTTP requests locally
+message HTTPLocalRateLimit {
+ // StatusCode allows for a custom HTTP response status code to the downstream client
+ // when the request has been rate limited. Defaults to 429 (TooManyRequests).
+ int32 status_code = 1;
+
+ // The token bucket configuration to use for rate limiting requests.
+ TokenBucket token_bucket = 2 [(google.api.field_behavior) = REQUIRED];
Yeah, request attributes is not supported with envoy's local rate limit.
From an API perspective, the question is if this is a fundamental
design/intentional choice, that needs
to be reflected in the API design - or just a limitation of the current
implementation.
I strongly suspect it's the later.
In addition, advanced rate limit ( as API ) may use a combination of local
token bucket and remote calls.
I think it is critical to agree that a Istio RateLimit API should be an
abstraction that allows multiple
implementations - including 'proxy-less' or non-envoy proxies, as well as
different implementations
inside envoy ( WASM, other styles ).
So I would also use a single proto to represent both rate limits, and maybe
add a section to the design
doc with the previous rate limit API we had ( when mixer provided an
implementation ), and maybe
an analysis of what was wrong with the old API and explain how we want to
improve.
I also strongly believe in "less is more" - so maybe we can start with a
smaller API, and later if
we find the need to have 4 we can add the others ( i.e. dedicated TCP pair,
and local/remote differentiation)
|
On Mon, Dec 7, 2020 at 4:20 AM Zhonghu Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mesh/v1alpha1/config.proto
<#1767 (comment)>:
> @@ -631,3 +635,27 @@ message Certificate {
// multiple DNS names.
repeated string dns_names = 2;
}
+
+// RateLimitService describes the configuration for an external rate limit service provider.
+message RateLimitService {
Generally, i think this should be global. It is easy to operate. I
understand ProxyConfig is used to generate configs for proxy in proxy side.
But rls is used in istiod. As ExtensionProvider , it is only for external
authz.
Why ? In a large org, MeshConfig is operated by the 'istiod admin team'. We
are trying to reduce the pressure
on the istiod team ( which may be an external vendor for 'managed istiod'
). Without Istio a user has the
ability to deploy its own redis ( or other rate limit service ) and
configure its workloads to use them without
requesting a central team to make changes ( that involve running operator
and in practice restart Istio - at least
with current istioctl or helm, since MeshConfig is overridden )
ExtensionProvider: I agree, it was designed for external authz and
unfortunately uses a generic and confusing name
that suggest that all external providers will be covered. To be fair, there
is a need for a common style and
design for all integrations - I don't think ExtensionProvider in MeshConfig
is the right one, in part for the reasons above
but there are far more issues. However telemetry team is attempting to use
it, and TOC may override my objections,
so I have to ask the question: if TOC decides ExtensionProvider is
required, this PR will also be affected.
|
On Mon, Dec 7, 2020 at 4:20 AM Zhonghu Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mesh/v1alpha1/config.proto
<#1767 (comment)>:
> @@ -631,3 +635,27 @@ message Certificate {
// multiple DNS names.
repeated string dns_names = 2;
}
+
+// RateLimitService describes the configuration for an external rate limit service provider.
+message RateLimitService {
Generally, i think this should be global. It is easy to operate. I
understand ProxyConfig is used to generate configs for proxy in proxy side.
But rls is used in istiod. As ExtensionProvider , it is only for external
authz.
MeshConfig.default is also gobal, and as easy to operate as fields in
MeshConfig ( yes, users have the ability to override - but they
don't have to).
ProxyConfig is intended for configs affecting the proxy - doesn't matter
who generates the configs. For example we
discussed the option to generate the bootstrap in istiod, and tracing which
is now in bootstrap is moving to proper
dynamic config.
I think we are confusing implementation details ( where some code happens
to run or how API is implemented ) with
the intent. For example if we wanted Istiod itself to make rate limit
calls - I would put it in MeshConfig, because it
would clearly not be a config of the proxies. But everything that affects
the Proxy - I would keep in ProxyConfig.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 2 open design docs, and now a PR, for the same feature. Can we keep discussions in one place?
Agreed... it might be easier to just have a meeting and sort it out.. @hzxuzhonghu : what do you think? Docs in review: |
The meeting time is not friendly for my timezone(1am for me), can we make it like PST 17:00 |
@hzxuzhonghu : We are discussing, good time to meet.. will get back to you in a day or two.. |
We can discuss in the networking meeting, currently at 9 PST.
PST 17:00 sounds good too - maybe we should discuss moving the networking
meeting at this time if it is more convenient for
people. I don't think we have a lot of participants from Europe, but East
Coast people may be unhappy.
I am starting to lean towards simply defining separate protos, in dedicated
packages (rate.istio.io/v1alpha1) - with an
eye towards adopting external specs or converging with other APIs in this
space instead of istio-only - and moving
them out of istio.io space. Similar with what we plan for Gateway ( moving
towards K8S CR)
…On Mon, Dec 7, 2020 at 6:48 PM Zhonghu Xu ***@***.***> wrote:
The meeting time is not friendly for my timezone(1am for me), can we make
it like PST 17:00
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1767 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2VRCQJWZFZWCFHPA53STWHYZANCNFSM4UKGNZ4Q>
.
|
I am all for this, what's more, we should provide a ratelimit server which can also consume this api, so users donot need to config twice(both in istio and the rate limit server side). Previously when we have mixer, this is natural. |
@costinm @gargnupur This is a separate API that is much direct and without coupling with VS. It can be applied on inbound/outbound based on users' choice. |
@hzxuzhonghu: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
How is this pr going? Need this feature, too. |
@kyessenov @mandarjog @gargnupur @costinm @howardjohn I am not sure how to push this feature forward, I am starting working on it since last year, and it has been half year passed, and we haven't come to any agreement. Most users including us are using envoyfilter or designed a separate CRD, and translate it to envoyflter, which is very bad UX. cc @istio/technical-oversight-committee is this still on the roadmap? |
@hzxuzhonghu It's still we something we want. Besides the concerns around the low level details of actions/descriptors, we need to figure out how to decouple rate limit policy from networking APIs. |
@kyessenov You are absolutely quite right, with a separate API, maybe we could implement rate limit as a plugin as auth does. |
Is this obsolete by #2028 ? |
I think so, though it has some downsides |
Based on the design