-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master-Worker Distributed processing on Workflow Controller #4318
Comments
Have you made some benchmarks showing the limits at which a given workflow controller can be pushed (according to a defined number of CPU and memory allocated to the controller)? I guess the most pertinent metric for such benchmarks are the number of workflows and the distribution of the number of nodes per workflow. |
If i understand it right this is our concrete current problem.
From my experience this depends on the number of workflows that are in "argo list" but less on their state or does it ? |
In Argo Events, the master controller effectively creates a single slave controller per namespace. This is something we could consider. Anyone could write this master controller today and it would be possible to use it with any version Argo Workflows. This would be the operator pattern, of course the CRD you'd be operating on would be |
I think built-in/automatic sharding -- basically what #9990 proposes -- would be a simpler approach to this and require less change. |
Reading the diagram specifics, this and #9990 are actually fairly similar proposals with an My improvements in #9990 (comment) could take it a few steps further as well. If the coordination-free implementation is possible, where no leader is necessary, that would significantly simplify the architecture and effectively make it infinitely scalable. As such, I'm going to close this out in favor of #9990 |
Summary
Currently, Single controller deployment has a performance impact if the Controller processing 50+ concurrent workflow or many larger workflows. There are few workarounds to medicate this issue like
Option 1 and 2 have a huge operation cost to maintain multiple deployments (install, upgrade, monitoring).
Option3, if really big workflow, it will always get processing timeout.
What change needs making?
Support Master-Worker distributed architecture on workflow-controller.
Master
1 It will responsible to distribute workflows to workers.
2. It will maintain the active worker list for distribution.
3. It will reassign the workflow to the active worker if the previous worker is not active.
4. It can act as a worker too.
Worker
Detail Flow diagram:
Use Cases
When would you use this?
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: