-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High availability of the visibility extension API server #1510
Comments
Thanks for raising this issue. Yes, the visibility server needs to access the queuemanager in spite of the kueue-controller-manager doesn't start the queuemanager when that replica isn't primary. I think we can select the following options:
@alculquicondor @mimowo WDYT? |
Another solution would be for all managers to be able to serve the information. This is possible if every manager can react to the events to update the cache and queues, even though it wouldn't be scheduling. We do the same in kube-scheduler. However, I'm not sure if, in controller-runtime, event handlers are registered even if reconcilers aren't running. I would not expect so, otherwise the work queue would grow indefinitely. Continue the comparison, in kube-scheduler we don't have reconcilers at all. |
When all managers can serve the information, is there a synchronization gap between managers In busy clusters? |
Yes, some managers might be ahead of others. But it shouldn't really matter, as even the output of the leader can be out of date pretty quickly. |
An alternate approach would be to have all webhooks running in a separate Deployment, so they can serve in HA. And then we would have the scheduler+visibility server in a different Deployment, where only one Pod can be ready at a time. |
That sounds like the best option functionally, and in an overall eventually consistent system, I agree it's acceptable to have some synchronisation gap between replicas, as they converge toward the stationary state. I just had a quick look, and it seems like it should be possible to have all the replicas serving the visibility API, with the event watchers running and maintaining the cache / queue manager, but skipping reconcilers altogether in non-leading replicas. I can work on a PoC to validate the feasibility and share my findings. |
/assign |
I've open #1554, that gives an idea about the most minimal / less disruptive approach I could come up with. There may be other ways, like separating the reconciler part from the watchers part of the controllers that contribute to the queue manager, but that felt more disruptive / risky. I've successfully tested #1554 manually, by checking all replicas correctly serve the visibility API, also checking fail-over to a non leading replica works as expected when the leading replica terminates. That should ideally be covered by an e2e test. I was thinking we could have a new e2e test dedicated to covering the HA aspects, including webhooks as well, to guarantee non-regression for #1445. WDYT? |
We can probably just configure the existing tests to run in HA mode. But we can have a separate suite package to exercise restarts. |
Right, running existing tests in HA mode sounds like a good idea. It won't guarantee non-leading replicas work correctly, given the API server can still send traffic to the leading one. Having a separate test suite exercising HA and all replicas specifically can mitigate that in a second step. |
I've updated #1554 so the e2e tests deploy two replicas of the Kueue controller manager. The e2e tests pass, and we can see the two Pods for Kueue deployment in the run below, where their respective logs look correct / as expected to me: How would you like to track the work for that dedicated HA e2e suite package? Maybe I can create a separate issue, or you have another suggestion / preference? |
What would you like to be added:
Enable high-availability setup for the visibility API service.
Why is this needed:
It's possible to deploy multiple replicas of the Kueue manager.
High-availability of admission webhooks has been fixed with #1509.
However, more work is required to enable high-availability for the visibility API service, given the stateful nature of the ClusterQueueReconciler.
More context is available in #1509.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: