-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/argo-cd] Make it possible to run ArgoCD in HA mode #27585
[bitnami/argo-cd] Make it possible to run ArgoCD in HA mode #27585
Conversation
Converted this back to a draft. I ran into some major issues with the sharding of the application controller today. Apparently it only really works, when the controller is installed as a StatefulSet :( |
43ff3c9
to
cff50e4
Compare
694596a
to
63cad23
Compare
…teful set Signed-off-by: Max Nitze <[email protected]>
Signed-off-by: Max Nitze <[email protected]>
2c03212
to
7d967b0
Compare
Signed-off-by: Bitnami Containers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maxnitze Thanks a lot for your contribution, this will be very useful for all users!
I left some comments, Could you give them a glance?
affinity: {{- include "common.tplvalues.render" ( dict "value" .Values.controller.affinity "context" $) | nindent 8 }} | ||
{{- else }} | ||
affinity: | ||
podAffinity: {{- include "common.affinities.pods" (dict "type" .Values.controller.podAffinityPreset "component" "controller" "customLabels" $podLabels "context" $) | nindent 10 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$podLabels
variable is not declared
…Labels variable Signed-off-by: Max Nitze <[email protected]>
Hey @fmulero , thanks for the review! Looking at the Pod partial I saw, that the whole indentation was wrong. In the file itself, but also the calls to |
Signed-off-by: Bitnami Containers <[email protected]>
…dation messages Signed-off-by: Max Nitze <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @maxnitze for your contribution.
LGTM
Description of the change
When the
replicaCount
of the application controller or the server deployments in the ArgoCD chart are set to a number higher than one, an additional env variable is set on the respective containerARGOCD_CONTROLLER_REPLICAS
andARGOCD_API_SERVER_REPLICAS
.In addition a change was added, that allows the application controller to be installed as a StatefulSet. I cannot find direct confirmation in the docs for it, but it seems the sharding of clusters, that is enabled when running with multiple application controller instances, is only working, when the pods are running as part of a stateful set. Otherwise they can't figure out their shard number and some clusters are not handled by any replica anymore (see here for example: argoproj/argo-cd#17016).
The "official" Helm chart also installs the application controller as a StatefulSet (see https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd/templates/argocd-application-controller). There are cases, where it is installed as a Deployment, but that's for a (currently in alpha) feature called Dynamic Cluster Distribution.
Also a validation was added, that multiple replicas of the application controller can only be set if the kind is changed to a StatefulSet (see "Benefits" for an explanation why).
Benefits
As per ArgoCD documentation on high availability, both the application controller and the server should have an environment variable set to the number of replicas in case they are run with multiple.
The application controller uses this value to "shard clusters across multiple controller replicas". The server uses it to "divide the limit of concurrent login requests (
ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
) between each replica." The repo-server has no such environment variable.Especially on the application controller, if the variable is not set, then the different replica will concurrently try to do the same operations on the clusters. Which results in constantly out-of-sync applications and other issues with the apps.
Possible drawbacks
None for the application itself.
Is this change breaking?
There are some interesting cases though.
Installations with Multiple Replicas of the Application Controller
In case you already have an installation with multiple replicas of the application controller, the chart will fail to validate, if you have not set the
.Values.controller.kind
to"StatefulSet"
. But in that case your application would not be running properly anyways. I had a similar setup for a week, or so, and the controller pods failed every couple of minutes due to memory issues created by the fact, that they concurrently worked on my clusters.Env variable is set already in
extraEnvVars
In case a user set the respective env variable already in the
extraEnvVars
block, this leads to a duplicate entry:This is not a problem, since according to the documentation the last occurrence of an environment variable takes precedence.
Env variable is set already in
extraEnvVarsCM
orextraEnvVarsSecret
This one is a bit more tricky. According to the docs, the environment variables from
env
take precedence over those fromenvFrom
. So in theory this could overwrite a value set in theextraEnvVarsCM
orextraEnvVarsSecret
.BUT: This is only an issue, if those values differ. And since this value should be the same value as the number of replicas, this is a misconfiguration, from my perspective. I don't actually know what the implications of a wrong value are though. I could not find detailed info on the sharding of the application controller, but reading the linked docs I assume this could lead to clusters not being served by any of the replicas!
Applicable issues
None
Additional information
There are some issues and PRs in the ArgoCD repo talking about this:
Checklist
Chart.yaml
according to semver. This is not necessary when the changes only affect README.md files.README.md
using readme-generator-for-helm