Single merged policy #4

alexsnaps · 2022-11-10T19:54:36Z

No description provided.

eguzki

Overall I like the proposal and I am willing to implement that.

However, I see a potantial caveat on this approach regarding RLP's targeting a gateway. As a cluster administrator I have detected unusual traffic for suspect-subdomain.domain.com a I want to rate lmit that traffic for "Host: suspect-subdomain.domain.com". Assuming there is a HTTPRoute for this tenant for *.domain.com

rfcs/0000-policy_target_scope.md

eguzki · 2022-11-14T10:57:01Z

rfcs/0000-policy_target_scope.md

+`FakeRLPolicy` attached to `my-gw`, the resulting policy would look like this:
+
+```yaml
+enabled: true


enabled should be false?

No, it is set as true as an override on the GW attached policy. That's what I describe in the paragraph above...

eguzki · 2022-11-14T11:01:11Z

rfcs/0000-policy_target_scope.md

+  (semantically or atomically?)?
+  - Can we inject the proper config in each of them? Only if all is applicable? What if they are not?
+    What when that changes over time?
+  - Can these then be backed by a single e.g. `Limitador`? (in that case I think yes, as we could


I would say yes, multiple gateways can be backend by a single Limitador. Currently, the namespace field has a prefix withthe name/NS of the gateway, so limits are well scoped to each gateway. This an implementation detail that can be discussed, of course.

rfcs/0000-policy_target_scope.md

maleck13 · 2022-11-29T11:16:25Z

rfcs/0000-policy_target_scope.md

+specification according to the `override` and `default` section of the `Policy`'s `spec`. Let's use a
+simple example:
+
+```yaml


so I like the simplicity being aimed at here but I worry we will need to allow some form of conditional here
example if we were to always send common descriptors (if we agree they are common) e.g Headers. Would we see a way for someone to define (as an example)

limits: - conditions: - "req.path=='/checkout'" - "req.method=='POST'" maxValue: 5 seconds: 10 variables: []

But I feel you are aiming at something very different. I had thought perhaps by applying a policy to all traffic, it had meant all traffic would be sent to the implementing service and then the implementing service could make a conditional decision

rfcs/0000-policy_target_scope.md

didierofrivia · 2023-02-21T13:59:29Z

rfcs/0000-policy_target_scope.md

+So there are a couple of things to consider here:
+
+- status reporting on "not yet" existing `HTTPRouteRule`, e.g. a Policy declares a set of `matches` that don't (yet?)
+  exist. The policy must _not_ be enforced... yet. But could eventually. This needs proper reporting. 


An alternative could be report the policy as invalid, that could simplify things at least at the beginning... a typical error would be RateLimitPolicy/AuthPolicy rules don't match the HTTPRouteRule definition

didierofrivia · 2023-02-21T14:02:00Z

rfcs/0000-policy_target_scope.md

+[drawbacks]: #drawbacks
+
+- The use cases this approach doesn't address… 
+- Not very DRY... the need to keep `HTTPRouteRule`s in sync between the Policy and the `HTTPRoute` is suboptimal.


This is probably the most discouraging point, but the good thing is we could mitigate a bit with status reporting and ux tooling.

guicassolato · 2023-02-21T16:35:40Z

I'd like to propose an exercise to put this to proof.

First, of course, we'd need to agree on the exact set of use cases to cover. IMO, the following one seems "real" enough. At the same time, it's simple (with only 2 Gateways, 2 HTTPRoutes, not too many matching rules) and yet possibly intricate enough.

Exercise

Given the following initial network resources:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway-1
  namespace: gwns
spec:
  gatewayClassName: internal
  listeners:
    - hostname: *.acme.internal
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway-2
  namespace: gwns
spec:
  gatewayClassName: internet
  listeners:
    - hostname: *.acme.com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: toys
  namespace: toys
spec:
  parentRefs:
    - kind: Gateway
      namespace: gwns
      name: gateway-1
    - kind: Gateway
      namespace: gwns
      name: gateway-2
  hostnames:
    - toys.acme.com
    - toys.acme.internal
  rules:
    - backendRefs:
        - kind: Service
          name: toys
    - matches:
        - headers:
            - name: X-Env
              value: canary
      backendRefs:
        - kind: Service
          name: toys-canary
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: telemetry
  namespace: telemetry
spec:
  parentRefs:
    - kind: Gateway
      namespace: gwns
      name: gateway-1
  hostnames:
    - *.telemetry.acme.internal
  rules:
    - matches:
        - method: GET
        - method: POST
      backendRefs:
        - kind: Service
          name: telemetry

What are the set of policies (RLP and KAP) and possibly required changes to the network resources to implement the following use cases?

RLP override 500 rps on all routes matching *.acme.com.
RLP default 100 rps on POST requests to all routes matching *.acme.com.
KAP override IP deny-list on all requests to *.acme.com.
KAP default X.509 certificate authentication on all requests to *.acme.internal.
RLP default 150 rps on all requests to toys.acme.(com|internal).
RLP default 50 rps on all requests to toys.acme.(com|internal) containing X-Env: canary request header.
RLP override unlimited rps on all requests to toys.acme.internal/admin/*.
KAP override API key authentication on all requests to toys.acme.com.
KAP override DELETE requests forbidden at toys.acme.com.
KAP override all requests forbidden at toys.acme.com/admin/*.
KAP override JSON pattern-matching authorisation to check path param {org_name} matches value stored in API key or X.509 cert on all requests to toys.acme.(com|internal)/[^admin/]orgs/{org_name}/*.
KAP override K8s SAR authorisation on all requests to toys.acme.internal/admin/*.
KAP default OIDC/JWT authentication on all endpoints matching *.telemetry.acme.internal.
KAP override additional API key authentication to all requests to foo.telemetry.acme.internal.

eguzki · 2023-02-24T10:29:48Z

rfcs/0000-policy_target_scope.md

+requires to know how to applicable it becomes to the union and/or the intersection of the said
+subsets. While a heuristic could be applied, there are use-cases where the other option will be
+desirable. The question then becomes which policy is the authority as to decide which it should be. In
+order to avoid having to answer these questions, this proposal looks at not letting a policy


this proposal looks at not letting a policy discriminate which part of traffic it applies to: it's the whole traffic, always.

this statement is still valid?

eguzki · 2023-02-24T10:34:58Z

rfcs/0000-policy_target_scope.md

+
+Say, we now have a `Backend` (e.g. `foo`) wired to the `my-gw` using some `HTTPRoute` (e.g.
+`foo-route`), none of which have yet any `FakeRLPolicy` attached. Yet, since the `Gateway` used as a
+policy defined, Kuadrant will create a `RateLimit` `CR` and eventually configure `Limitador` to apply


RateLimit CR

a bit outdated, isn't it?

eguzki · 2023-02-24T10:47:46Z

rfcs/0000-policy_target_scope.md

+starting with `/foo`), the rate limiting will apply to the same traffic: `example.com/foo*`, so that
+`foo` will not see more than 42 rps on port `8080`.
+
+Given the `override` on the `my-gw` `FakeRLPolicy` there no way for our route to disable rate limiting


Given the override on the my-gw FakeRLPolicy there no way for our route to disable rate limiting on the traffic

This is, IMHO, the main pain point of overall defaults/overrides approach applied to rate limiting (auth is another history here). It is not that implementing defaults/overrides for rate liming can be complex because it is the outcome of unions and/or intersections of rules. It is not that, only. It is that the gateway provider can say (with overrides) that this traffic is going to be 42 no matter what I say as a API owner. And my service may not afford 42, only 22 (to say something).

For the rate limiting context, the "most restrictive wins" approach seems to me much more natural and appropriate. No matter which rol I play I can say I want X rps. And the "most restrictive wins" approach grants me that not matter what, my service will not get more than X rps. Maybe, because some other level rate limiting policy, my service will not get more than, let's say X/2 rps.. but my limit still holds, no more than X rps. My contract with the traffic system specified in the policy is never broken. If the traffic is not going to be higher than X/2. it is also true that it is not going to be more than X rps.

But it would "just" be up to the GW to express this properly, i.e. using a default rather than an override. Also, there might be a need to say 10 (i.e. less than 42), say when being DDoS'ed, so that'd be an override... The fact that the API lets users express what we think are "none sensible" things, doesn't invalidate it.

But I think we could also have Kuadrant "merge" the policies based of the most restrictive wins and "only" support defaults initially.

Also, there might be a need to say 10 (i.e. less than 42), say when being DDoS'ed, so that'd be an override...

That is a valid use case and it is covered by "most restrictive policy wins"

But I think we could also have Kuadrant "merge" the policies based of the most restrictive wins

That is what I call "conditional defaults/overrides" which is worth elaborating/investigating more, but I find it as a complex API to expose to the customers.

"only" support defaults initially.

Is this a valid use case? I mean, a gateway provider saying I want (default) 10 rps but I do not care if you as API provider sets 1000 rps. My point being, if the gateway provider applies 10 rps, that 10 is not a magical number it is something specific for the existing resources and constraints.

We probably need more real world use cases to be sure. A clear use case is a gateway provider/infra provider setting a global limit for the gateway to protect it lets say 1000 RPS and wanting that to be a "override" limit. This is a use case we have in other services where we have used limitador to do this at the gateway level as the infra provider in order to enforce a limit based the tier.
So then the question and use case comes for a default RL. This is less clear as @eguzki points out. That said, I can imagine as a "gateway admin", I don't want any one service to be able to just accidentally consume this entire limit without clearly expressing this intention. I don't want any endpoint deployed to my gateway that some random client can accidentally get into a loop and swamp the gateway. So I now set a sane but restrictive default based on remote ip or some common property to a low number lets say 50 RPS. Now I know if an endpoint is created without any RLP, it can only at worst consume 50 RPS.
So now it is up to the service provider/developer to say explicitly "I need more than 50 RPS I need 200 ~RPS" for my service. So the piece I see as missing here is setting an "upper limit" override. IE by default its 100 but you can override it up to 250 but beyond that we need to have a serious conversation.
@eguzki Does the above make sense in your world? I don't claim to have all the correct use cases here but the above seems sane to me. Interested in your thoughts on the default and being able to set a "upper limit" override.
I think it is also important that we don't think of RLP as the only control involved. As a gateway admin, depending on the environment, I am going to want to setup monitoring so that I can identify what limits are set, and what traffic is flowing to certain endpoints. This will allow me to identify a potential problem before it happens. Also I might have guidelines in place for the teams using my gateway and perhaps I even have an SRE member reviewing any new limits set as part of a git ops flow etc. I think we need to think of RLP as part of a whole rather than the whole.

eguzki · 2023-02-24T11:50:14Z

rfcs/0000-policy_target_scope.md

+      port: 8080
+```
+
+It is important to consider that only _one_ rule will ever be match for any given requests. It makes for a simpler model to have only one "Policy" to also only ever being match, aligning the both behaviors. To apply a policy on the first rule only, the user needs to duplicate that match on the policy targeting `bar-route`:


the user needs to duplicate that match on the policy targeting bar-route

This is essentially what we have now, isn't it?

alexsnaps · 2024-02-27T15:04:31Z

superseded by #58

alexsnaps added 2 commits November 10, 2022 14:51

Initial draft

3142c47

reformat

084723d

alexsnaps marked this pull request as draft November 10, 2022 20:11

eguzki reviewed Nov 14, 2022

View reviewed changes

alexsnaps added the target/current label Nov 17, 2022

maleck13 reviewed Nov 29, 2022

View reviewed changes

rfcs/0000-policy_target_scope.md Show resolved Hide resolved

maleck13 reviewed Nov 29, 2022

View reviewed changes

guicassolato reviewed Dec 12, 2022

View reviewed changes

rfcs/0000-policy_target_scope.md Outdated Show resolved Hide resolved

guicassolato added the RFC Request For Comments label Dec 13, 2022

guicassolato mentioned this pull request Jan 12, 2023

AuthPolicy ApiKey allNamespace doesn't work Kuadrant/kuadrant-operator#127

Closed

alexsnaps self-assigned this Feb 16, 2023

alexsnaps added 3 commits February 20, 2023 12:48

Enabling targeting a HTTPRouteRule by HTTPRouteMatch dup

28e6451

Note on merging

c361f41

Fix to title

dc37c52

didierofrivia reviewed Feb 21, 2023

View reviewed changes

maleck13 mentioned this pull request Feb 22, 2023

RateLimitPolicy v2 #8

Merged

eguzki reviewed Feb 24, 2023

View reviewed changes

alexsnaps mentioned this pull request Feb 28, 2023

*No* merging of policies #10

Closed

alexsnaps added the status/blocked label Feb 28, 2023

guicassolato added target/next and removed target/current labels Jul 13, 2023

alexsnaps closed this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single merged policy #4

Single merged policy #4

alexsnaps commented Nov 10, 2022

eguzki left a comment

eguzki Nov 14, 2022

alexsnaps Feb 20, 2023

eguzki Nov 14, 2022

maleck13 Nov 29, 2022 •

edited

Loading

maleck13 Nov 29, 2022

didierofrivia Feb 21, 2023

didierofrivia Feb 21, 2023

guicassolato commented Feb 21, 2023

eguzki Feb 24, 2023

eguzki Feb 24, 2023

eguzki Feb 24, 2023

alexsnaps Feb 27, 2023

eguzki Feb 27, 2023

maleck13 Feb 28, 2023

eguzki Feb 24, 2023

alexsnaps commented Feb 27, 2024

Single merged policy #4

Single merged policy #4

Conversation

alexsnaps commented Nov 10, 2022

eguzki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleck13 Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guicassolato commented Feb 21, 2023

Exercise

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsnaps commented Feb 27, 2024

maleck13 Nov 29, 2022 •

edited

Loading