-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single merged policy #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the proposal and I am willing to implement that.
However, I see a potantial caveat on this approach regarding RLP's targeting a gateway. As a cluster administrator I have detected unusual traffic for suspect-subdomain.domain.com
a I want to rate lmit that traffic for "Host: suspect-subdomain.domain.com". Assuming there is a HTTPRoute for this tenant for *.domain.com
rfcs/0000-policy_target_scope.md
Outdated
`FakeRLPolicy` attached to `my-gw`, the resulting policy would look like this: | ||
|
||
```yaml | ||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enabled
should be false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is set as true
as an override on the GW attached policy. That's what I describe in the paragraph above...
rfcs/0000-policy_target_scope.md
Outdated
(semantically or atomically?)? | ||
- Can we inject the proper config in each of them? Only if all is applicable? What if they are not? | ||
What when that changes over time? | ||
- Can these then be backed by a single e.g. `Limitador`? (in that case I think yes, as we could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes, multiple gateways can be backend by a single Limitador. Currently, the namespace
field has a prefix withthe name/NS of the gateway, so limits are well scoped to each gateway. This an implementation detail that can be discussed, of course.
specification according to the `override` and `default` section of the `Policy`'s `spec`. Let's use a | ||
simple example: | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I like the simplicity being aimed at here but I worry we will need to allow some form of conditional here
example if we were to always send common descriptors (if we agree they are common) e.g Headers. Would we see a way for someone to define (as an example)
limits:
- conditions:
- "req.path=='/checkout'"
- "req.method=='POST'"
maxValue: 5
seconds: 10
variables: []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I feel you are aiming at something very different. I had thought perhaps by applying a policy to all traffic, it had meant all traffic would be sent to the implementing service and then the implementing service could make a conditional decision
So there are a couple of things to consider here: | ||
|
||
- status reporting on "not yet" existing `HTTPRouteRule`, e.g. a Policy declares a set of `matches` that don't (yet?) | ||
exist. The policy must _not_ be enforced... yet. But could eventually. This needs proper reporting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be report the policy as invalid
, that could simplify things at least at the beginning... a typical error would be RateLimitPolicy/AuthPolicy rules don't match the HTTPRouteRule definition
[drawbacks]: #drawbacks | ||
|
||
- The use cases this approach doesn't address… | ||
- Not very DRY... the need to keep `HTTPRouteRule`s in sync between the Policy and the `HTTPRoute` is suboptimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the most discouraging point, but the good thing is we could mitigate a bit with status reporting and ux tooling.
I'd like to propose an exercise to put this to proof. First, of course, we'd need to agree on the exact set of use cases to cover. IMO, the following one seems "real" enough. At the same time, it's simple (with only 2 Gateways, 2 HTTPRoutes, not too many matching rules) and yet possibly intricate enough. ExerciseGiven the following initial network resources: apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: gateway-1
namespace: gwns
spec:
gatewayClassName: internal
listeners:
- hostname: *.acme.internal
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: gateway-2
namespace: gwns
spec:
gatewayClassName: internet
listeners:
- hostname: *.acme.com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: toys
namespace: toys
spec:
parentRefs:
- kind: Gateway
namespace: gwns
name: gateway-1
- kind: Gateway
namespace: gwns
name: gateway-2
hostnames:
- toys.acme.com
- toys.acme.internal
rules:
- backendRefs:
- kind: Service
name: toys
- matches:
- headers:
- name: X-Env
value: canary
backendRefs:
- kind: Service
name: toys-canary
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: telemetry
namespace: telemetry
spec:
parentRefs:
- kind: Gateway
namespace: gwns
name: gateway-1
hostnames:
- *.telemetry.acme.internal
rules:
- matches:
- method: GET
- method: POST
backendRefs:
- kind: Service
name: telemetry What are the set of policies (RLP and KAP) and possibly required changes to the network resources to implement the following use cases?
|
requires to know how to applicable it becomes to the union and/or the intersection of the said | ||
subsets. While a heuristic could be applied, there are use-cases where the other option will be | ||
desirable. The question then becomes which policy is the authority as to decide which it should be. In | ||
order to avoid having to answer these questions, this proposal looks at not letting a policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this proposal looks at not letting a policy discriminate which part of traffic it applies to: it's the whole traffic, always.
this statement is still valid?
|
||
Say, we now have a `Backend` (e.g. `foo`) wired to the `my-gw` using some `HTTPRoute` (e.g. | ||
`foo-route`), none of which have yet any `FakeRLPolicy` attached. Yet, since the `Gateway` used as a | ||
policy defined, Kuadrant will create a `RateLimit` `CR` and eventually configure `Limitador` to apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RateLimit
CR
a bit outdated, isn't it?
starting with `/foo`), the rate limiting will apply to the same traffic: `example.com/foo*`, so that | ||
`foo` will not see more than 42 rps on port `8080`. | ||
|
||
Given the `override` on the `my-gw` `FakeRLPolicy` there no way for our route to disable rate limiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the
override
on themy-gw
FakeRLPolicy
there no way for our route to disable rate limiting on the traffic
This is, IMHO, the main pain point of overall defaults/overrides approach applied to rate limiting (auth is another history here). It is not that implementing defaults/overrides for rate liming can be complex because it is the outcome of unions and/or intersections of rules. It is not that, only. It is that the gateway provider can say (with overrides) that this traffic is going to be 42 no matter what I say as a API owner. And my service may not afford 42, only 22 (to say something).
For the rate limiting context, the "most restrictive wins" approach seems to me much more natural and appropriate. No matter which rol I play I can say I want X rps. And the "most restrictive wins" approach grants me that not matter what, my service will not get more than X rps. Maybe, because some other level rate limiting policy, my service will not get more than, let's say X/2 rps.. but my limit still holds, no more than X rps. My contract with the traffic system specified in the policy is never broken. If the traffic is not going to be higher than X/2. it is also true that it is not going to be more than X rps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it would "just" be up to the GW to express this properly, i.e. using a default
rather than an override
. Also, there might be a need to say 10
(i.e. less than 42), say when being DDoS'ed, so that'd be an override... The fact that the API lets users express what we think are "none sensible" things, doesn't invalidate it.
But I think we could also have Kuadrant "merge" the policies based of the most restrictive wins and "only" support defaults
initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there might be a need to say 10 (i.e. less than 42), say when being DDoS'ed, so that'd be an override...
That is a valid use case and it is covered by "most restrictive policy wins"
But I think we could also have Kuadrant "merge" the policies based of the most restrictive wins
That is what I call "conditional defaults/overrides" which is worth elaborating/investigating more, but I find it as a complex API to expose to the customers.
"only" support defaults initially.
Is this a valid use case? I mean, a gateway provider saying I want (default) 10 rps but I do not care if you as API provider sets 1000 rps. My point being, if the gateway provider applies 10 rps, that 10 is not a magical number it is something specific for the existing resources and constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need more real world use cases to be sure. A clear use case is a gateway provider/infra provider setting a global limit for the gateway to protect it lets say 1000 RPS and wanting that to be a "override" limit. This is a use case we have in other services where we have used limitador to do this at the gateway level as the infra provider in order to enforce a limit based the tier.
So then the question and use case comes for a default RL. This is less clear as @eguzki points out. That said, I can imagine as a "gateway admin", I don't want any one service to be able to just accidentally consume this entire limit without clearly expressing this intention. I don't want any endpoint deployed to my gateway that some random client can accidentally get into a loop and swamp the gateway. So I now set a sane but restrictive default based on remote ip or some common property to a low number lets say 50 RPS. Now I know if an endpoint is created without any RLP, it can only at worst consume 50 RPS.
So now it is up to the service provider/developer to say explicitly "I need more than 50 RPS I need 200 ~RPS" for my service. So the piece I see as missing here is setting an "upper limit" override. IE by default its 100 but you can override it up to 250 but beyond that we need to have a serious conversation.
@eguzki Does the above make sense in your world? I don't claim to have all the correct use cases here but the above seems sane to me. Interested in your thoughts on the default and being able to set a "upper limit" override.
I think it is also important that we don't think of RLP as the only control involved. As a gateway admin, depending on the environment, I am going to want to setup monitoring so that I can identify what limits are set, and what traffic is flowing to certain endpoints. This will allow me to identify a potential problem before it happens. Also I might have guidelines in place for the teams using my gateway and perhaps I even have an SRE member reviewing any new limits set as part of a git ops flow etc. I think we need to think of RLP as part of a whole rather than the whole.
port: 8080 | ||
``` | ||
|
||
It is important to consider that only _one_ rule will ever be match for any given requests. It makes for a simpler model to have only one "Policy" to also only ever being match, aligning the both behaviors. To apply a policy on the first rule only, the user needs to duplicate that match on the policy targeting `bar-route`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user needs to duplicate that match on the policy targeting
bar-route
This is essentially what we have now, isn't it?
superseded by #58 |
No description provided.