Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ratelimit API #1767

Closed
wants to merge 2 commits into from
Closed

Add Ratelimit API #1767

wants to merge 2 commits into from

Conversation

hzxuzhonghu
Copy link
Member

Based on the design

@istio-policy-bot
Copy link

😊 Welcome @hzxuzhonghu! This is either your first contribution to the Istio api repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Dec 2, 2020
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 2, 2020
@istio-testing
Copy link
Collaborator

@hzxuzhonghu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
release-notes_api ab20dd7 link /test release-notes_api

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hzxuzhonghu
Copy link
Member Author

@@ -631,3 +635,27 @@ message Certificate {
// multiple DNS names.
repeated string dns_names = 2;
}

// RateLimitService describes the configuration for an external rate limit service provider.
message RateLimitService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual: do we expect the rate limit service to be the same in all namespaces and pods ? No possible use case where different namespaces would use different rate services ?

Why not add it to ProxyConfig instead, and use the existing pattern of RemoteService ?

If you want to keep it in MeshConfig - maybe the ExtensionProvider (which was just added) would be needed, but I don't think we have an approved design on using ExtensionProvider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, i think this should be global. It is easy to operate. I understand ProxyConfig is used to generate configs for proxy in proxy side. But rls is used in istiod. As ExtensionProvider , it is only for external authz.


// The timeout for the rate limit service RPC.
// If not set, this defaults to 20ms.
google.protobuf.Duration timeout = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all services we use in Istio, we need to document how auth is going to be performed. Are all rate services using standard Istio mTLS or do we need to support other mechanisms ?

Also, is this service subject to normal Istio APIs - DestinationRule for example ? If yes ( and I assume it should ), do we want to duplicate timeout here ?

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout here is from the application view(envoy->rls). DR can be used, but it will require an additional config

// The filter’s behaviour in case the rate limiting service does not respond back.
// When it is set to true, Envoy will allow traffic in case of communication failure
// between rate limiting service and the proxy.
bool fail_open = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default bool value is false - i.e. fail close. I don't think that's correct for a rate service - I would rather have the setting be 'fail_closed', so the user needs to explicitly add it if he wants this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is the behavior of the legacy mixer.

// RateLimitService describes the configuration for an external rate limit service provider.
message RateLimitService {
// REQUIRED. Specifies the service that implements rate limit service.
// The format is "[<Namespace>/]<Hostname>". The <Hostname> is the full qualified host name in the Istio service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we use this pattern in some configs, like Gateway, for a specific purpose ( delegation ).

Do we need this here ? If there is a strong use cases - I will request any implementation before we unhide this API will have test cases for each option. Maybe start with the simple solution first, of using a ServiceEntry/DestinationRule/etc in istio-system and not provide too much complexity of allowing arbitrary namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not enforce it has to be in istio-system.


// The following descriptor entry is appended to the descriptor:
// `("generic_key", "<descriptor_value>")`
message GenericKey{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is introducing a lot of things that feel duplicated and quite complex. Don't we have other APIs that deal with header matching, etc ( in Policy, routing ) ?

Copy link
Member Author

@hzxuzhonghu hzxuzhonghu Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow specifying any key-val desctriptor entry and optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better make input be the properties from request like machs instead of introducing a key-val pairs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are useful. Generic value could be the route name for example.

string prefix = 5;

// RE2 style regex-based match (https://github.com/google/re2/wiki/Syntax).
string regex = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usual question about regex - what are the test plans ( I don't think the design doc includes enough detail ), and do we really need regex ? Perf impact, complexity, etc.

Even if Envoy supports it - we don't have to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex provides much more flexibility, like uri regex match? Shouldn't we make it extensible?

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all new APIs must be added as hidden, and will only be unhidden when the implementation is ready ( including tests for anything that is getting added, in particular for beta APIs.


// A RateLimitDescriptor is a list of hierarchical entries that are used by the service to
// determine the final rate limit key and overall allowed limit. Here are some examples of how
// they might be used for the domain "envoy".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the examples are missing, also to be able to have test coverage we'll need a more explicit list of what is supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could global rate limit api provide the similar attribute such as max_tokens interval which defines the rate limit ? The current global rate limit api defines required input to Envoy's ratelimit server rather than the ratelimit rule. And it's hard for end ursers.

@@ -1273,6 +1397,53 @@ message RouteDestination {
// version. If there is only one destination in a rule, all traffic will be
// routed to it irrespective of the weight.
int32 weight = 2;

// L4 local rate limiting policy.
LocalRateLimit local_rate_limit = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the goal to switch to BTS, and the likely situation where L4 rate limit will need to take into account attributes of the request - including TLS and metadata we add (telemetry for example): I would very much prefer to fold this into a single RateLimit, that applies for both HTTP and TCP.

We will need to document which headers/features are supported depending on connection type - but there are too many variations depending on TLS, transport, protocol to create a separate proto for each case - it seems cleaner to just treat all 'features' the same for http and TCP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing making TCP and HTTP together, they have many differences. For tcp, i donot think metadata can be rate limited with upstream envoy

int32 status_code = 1;

// The token bucket configuration to use for rate limiting requests.
TokenBucket token_bucket = 2 [(google.api.field_behavior) = REQUIRED];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the HTTP rate limit doesn't use any of the request attributes ? It seems identical with the L4, except the status code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, request attributes is not supported with envoy's local rate limit.

message HTTPLocalRateLimit {
// StatusCode allows for a custom HTTP response status code to the downstream client
// when the request has been rate limited. Defaults to 429 (TooManyRequests).
int32 status_code = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case ? 429 is a pretty standard response, why would we need another option ? It adds tests and complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I can remove

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since this is touching beta APIs that we'll have to support for a LONG time, we need again to figure out how to clearly document that the added fields are experimental/brand new and may change ( given how many and complex it looks it's likely ).

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: sorry for not reading the design carefully, I realize there is a major issue here, the VirtualService and model here is applied outbound - i.e. the rate service will have no effect if a client doesn't use Istio.

It is possible to apply the rate limit in Gateway for example, but that would be a different API ( i.e. if only gateway is supposed to do rate limit calls ).

We do need some way to enforce rate limits on inbound as well.

@costinm
Copy link
Contributor

costinm commented Dec 7, 2020 via email

@costinm
Copy link
Contributor

costinm commented Dec 7, 2020 via email

@costinm
Copy link
Contributor

costinm commented Dec 7, 2020 via email

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 2 open design docs, and now a PR, for the same feature. Can we keep discussions in one place?

@gargnupur
Copy link
Contributor

gargnupur commented Dec 7, 2020

We have 2 open design docs, and now a PR, for the same feature. Can we keep discussions in one place?

Agreed... it might be easier to just have a meeting and sort it out.. @hzxuzhonghu : what do you think?
Looks like networking group sync up is coming this week, we can use time after that?

Docs in review:

  1. https://docs.google.com/document/d/1628LHcwuCvTRFhk8rsQKQUmgCxnY6loPFSxliADQDIc/edit?ts=5fc57f4f#heading=h.ob0qozngww1j
  2. https://docs.google.com/document/d/1N4eeCuBme6ljXDlnBjObHRuOLhLcawkV7pP7TQDiO64/edit#heading=h.xw1gqgyqs5b

@hzxuzhonghu
Copy link
Member Author

The meeting time is not friendly for my timezone(1am for me), can we make it like PST 17:00

@gargnupur
Copy link
Contributor

@hzxuzhonghu : We are discussing, good time to meet.. will get back to you in a day or two..

@costinm
Copy link
Contributor

costinm commented Dec 9, 2020 via email

@hzxuzhonghu
Copy link
Member Author

I am starting to lean towards simply defining separate protos, in dedicated
packages (rate.istio.io/v1alpha1)

I am all for this, what's more, we should provide a ratelimit server which can also consume this api, so users donot need to config twice(both in istio and the rate limit server side). Previously when we have mixer, this is natural.

@hzxuzhonghu
Copy link
Member Author

@costinm @gargnupur This is a separate API that is much direct and without coupling with VS. It can be applied on inbound/outbound based on users' choice.
https://docs.google.com/document/d/1ySvR6s-6Ngs0Uaj_e3-8MaM4bkm44I_HkHLuP6fABVA/edit#

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Dec 15, 2020
@istio-testing
Copy link
Collaborator

@hzxuzhonghu: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SpecialYang
Copy link
Member

How is this pr going? Need this feature, too.

@hzxuzhonghu
Copy link
Member Author

@kyessenov @mandarjog @gargnupur @costinm @howardjohn I am not sure how to push this feature forward, I am starting working on it since last year, and it has been half year passed, and we haven't come to any agreement. Most users including us are using envoyfilter or designed a separate CRD, and translate it to envoyflter, which is very bad UX.

cc @istio/technical-oversight-committee is this still on the roadmap?

@kyessenov
Copy link
Contributor

@hzxuzhonghu It's still we something we want. Besides the concerns around the low level details of actions/descriptors, we need to figure out how to decouple rate limit policy from networking APIs.

@hzxuzhonghu
Copy link
Member Author

@kyessenov You are absolutely quite right, with a separate API, maybe we could implement rate limit as a plugin as auth does.

@jwendell
Copy link
Member

Is this obsolete by #2028 ?

@hzxuzhonghu
Copy link
Member Author

I think so, though it has some downsides

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. needs-rebase Indicates a PR needs to be rebased before being merged size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants