-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RLP Enforced condition #414
Comments
Some thoughts on this: In order to check that the RateLimitPolicy is enforced in Limitador, there is multiple options for this:
or
Option 1 and 2 require shelling into the Limitador pod in order to compare / access the Limitador API unless we want expose a service endpoint for Limitador in Limitador operator if we want to go with option 2 at https://github.com/Kuadrant/limitador-operator/blob/main/pkg/limitador/k8s_objects.go#L51 Either or, I feel this logic should exist in Limtiador Operator itself to check if the Limits are correctly loaded into Limtiador. Interested in @eguzki @didierofrivia @alexsnaps thoughts on this and on what's the preferred approach 🤔 |
I'd only say that we shouldn't go to the storage option of Limitador to check this, since that is going to far for the Operator. The first option, as simple as it sounds, could be a good first approach IMHO. Then we could decide to evolve it to a more thorough probe... However, I agree with you that it should be Limitador Operator where the logic should reside. |
My take on this: The RLP controller should rely on the Limitador CR status. The Limitador operator should populate the Limitador CR status with available state. I am thinking that the same way the replication controller populates the deployment status objects based on readiness/liveness kubernetes probes (some of which are HTTP based endpoints), the Limitador Operator should be doing something similar. Ideally, the operator would setup a probe mechanism to periodically test the |
@eguzki I think the simplest approach for the probe mechanism is for it to be part of the reconcilation loop in Limitador controller in limitador operator 🤔 We can requeue the reconcilation until the limits config map is synced / updated in the pod which the available status in the Limitador CR would imply. With this, any updates to the Limitador CR would trigger a reconcilation loop and update the status state accordingly. In Kuadrant Operator, RateLimitPolicy would not be enforced until these conditions are true in the Limitador CR. If we are going with this approach and would like to probe periodically, we can configure a default requeue interval after a successful reconcile in Limitador controller |
respectfully, I disagree. For two reasons mainly:
Happy to discuss about this further and look for an approach we all feel confortable |
Fair, though I feel this is quite a bit of constraint as some Kubernetes controllers manage external resources not in the Kubernetes cluster but on cloud providers, such as the DNS Operator. Though this point assumes we are going with the approach of HTTP requests to /limits/{namespace} in Limitador, rather than checking the config maps in the Kubernetes cluster vs config map in Limitador pod also.
Yes, I would rather not create a goroutine either for this. Agreed, periodic requeue is not required if we are going with this reconciliation route. The check for enforcement can purely be triggered from watched events from the Limitador CR but only requeued until the config maps are synced. Ideally, any user using the Limitador operator interacts with the CR (or watched resources) to update rate limits. Periodic requeue, I feel, is only necessary if we feel a user may work around this and potentially force the rate limit config map to be out of sync by interacting directly with the Limitador API, thus bypassing watched events 🤔
Another idea that I feel is overkill is having a sort of middle service that is incorporated into a Limitador deployment as a separate container. It has the sole responsibility of just checking the sync of the Limitador config map in Kubernetes and the loaded limits in the Limitador application. This check is called periodically as a readiness or liveness check (though this has potential downsides - if the pod is not ready, then the pod will stop receiving traffic until limits are synced unless this is something we want). It's possible to add this instead to Limitador as a health check endpoint, though that endpoint will be coupled to be deployed into the Kubernetes cluster, which is something I think we don't want. Anyhow, this seems overkill/over-engineering for what we want anyway, but just a thought. |
I agree about being overkilling. Ideally, this would be a deployment readiness probe that labels the deployment as not ready and the limitador operator flags the CR as not ready in the status. The issue is that I do not see how we can implement such complex readiness probe in the deployment. We would need to pass the limits as env vars to the pod and then some custom script living in the limitador's container read the limits from env var and make sure the response of the multiple HTTP requests to That leaves us with the following options:
summoning @roivaz here. You have some use cases where you monitor the content of a configmap/secret is being mounted in the container and if it takes too long, you force it with some hack based on annotations. Any input, ideas? |
When a ConfigMap mounted as a volume in a container changes, the container doesn't see that change immediately. The reason for this is that, for performance reasons, kubelet only resyncs the pod under 2 circumstances:
The solution to this problem can't be applied by changing the resync interval, as this affect the whole cluster. I don't even know if this is something you can configure in OCP. This still doesn't solve the problem of "knowing" if the update was applied, it just ensures that under normal circumstances the change happens immediately. |
groundhog day effect. We had this very same discussion one year ago #140 (comment) |
Cool, though as you guys mentioned this doesn't quite solve the ask of this issue, which is knowing if RLP limits are enforced (i.e. knowing that config map changes are synced to limitador) 🤔 |
But arguably if the immediate sync of the configmap is implemented, then the need for actually "knowing" if the configmap has been reloaded is greatly reduced, because almost 100% of the time it will be synced. In fact when it does not sync immediately it would be due to some cluster malfunction, which other operators and monitoring in the cluster have the responsibility of detecting and resolving. An option is for limitador to have an endpoint where it returns the currently loaded config or just the hash of the loaded config (not sure if this has been already proposed). Envoy for example, has a similar admin endpoint where you can retrieve the currently loaded config. |
That's a fair assumption, in this case RLP that simply depend on the current "Ready" condition status. Though the pod restart will defeat the purpose the file watcher implementation that @eguzki mentioned so not sure do we want to do this 🤷
Hmm, that's an option also. Though this would involve making HTTP request to the application 🤔 For reference, I've done a quick implementation / PoC for just checking the CM limits file vs file loaded in limitador pod Kuadrant/limitador-operator#125 as part of the status reconcile loop to which I feel is simplest without having to make request to the Limitador application and based on watched events if anyone wants to check it out |
The pod is not restarted when you add/update an annotation to force the resync of the configmap. The annotation is updated directly in the Pod resource/s, not in the owning Deployment, and that does not trigger a rollout. |
Cool, nice ! 👍 @eguzki Any thoughts on this approach? If config map changes is immediately synced to Pods, we can rely on the limitador cr |
That would be a good enhancement, as the changes would take effect quite soon, which is the complain right now. Maybe this is good enough. Let me add some thoughts.
Certainly I do not want to over-engineer here. I rather want to be pragmatic. But good to discuss anyway. |
Hmm, I guess if the change event is lost, then the config map in pod will be out of sync until k8s syncs it eventually. There'll be a time delay if this happens, but RLP & Limitador CR wont detect this and will keep it's current
I dont think so. The main 'issue' is probably having to ensure all replicas are created and that all are updated to the latest annotation value |
Good enough for me, at least for now. |
Completed as part of #523 |
Part of Kuadrant/architecture#38
Related issues: #140
Implement the
Enforced
condition type for RateLimitPolicyThe text was updated successfully, but these errors were encountered: