Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot update self-managed ArgoCD to 2.12 due to race condition between argocd-redis and argocd-application-controller #19798

Open
3 tasks done
akloss-cibo opened this issue Sep 5, 2024 · 9 comments
Assignees
Labels
bug Something isn't working component:application-controller component:redis version:2.12 Latest confirmed affected version is 2.12

Comments

@akloss-cibo
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

When updating an ArgoCD self-managed installation (from 2.10.9) to 2.12.3 using the Kustomization at https://github.com/argoproj/argo-cd/manifests/cluster-install?ref=v2.12.3, things go badly. Several times that I've tried, the argocd-application-controller StatefulSet gets updated before the argocd-redis Deployment. The new pod for the updated argocd-application-controller StatefulSet won't start because the argocd-redis Secret hasn't been populated by the init container in the argocd-redis Deployment, and the argocd-redis Deployment will never be updated because there's no running argocd-application-controller pod any more.

To Reproduce

Steps to reproduce:

  1. Create and ArgoCD Application to install ArgoCD from the manifest at github.com/argoproj/argo-cd/manifests/cluster-install?ref=v2.10.9
  2. Update the Application to target github.com/argoproj/argo-cd/manifests/cluster-install?ref=v2.12.3
  3. Observe that the new argocd-application-controller-0 pod won't start because the argocd-redis Secret doesn't exit and the argocd-redis Deployment is still out-of-sync because there is no argocd-application-controller pod running to sync it.

Expected behavior

ArgoCD should apply updated from the Kustomization in an order that ensures the argocd-redis Secret exists before updating the argocd-application-controller to depend on the argocd-redis Secret.

I've addressed this by adding sync-waves into a kustomization overlay, but adding things to the mainline would make this work for everyone without patching the redis resources:

diff --git a/manifests/base/redis/argocd-redis-deployment.yaml b/manifests/base/redis/argocd-redis-deployment.yaml
index c591db0d0..9861b5656 100644
--- a/manifests/base/redis/argocd-redis-deployment.yaml
+++ b/manifests/base/redis/argocd-redis-deployment.yaml
@@ -1,6 +1,8 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
+  annotations:
+    argocd.argoproj.io/sync-wave: "-1"
   labels:
     app.kubernetes.io/name: argocd-redis
     app.kubernetes.io/part-of: argocd
diff --git a/manifests/base/redis/argocd-redis-network-policy.yaml b/manifests/base/redis/argocd-redis-network-policy.yaml
index 145487474..bdb4ae9b8 100644
--- a/manifests/base/redis/argocd-redis-network-policy.yaml
+++ b/manifests/base/redis/argocd-redis-network-policy.yaml
@@ -1,6 +1,8 @@
 kind: NetworkPolicy
 apiVersion: networking.k8s.io/v1
 metadata:
+  annotations:
+    argocd.argoproj.io/sync-wave: "-1"
   name: argocd-redis-network-policy
 spec:
   podSelector:
diff --git a/manifests/base/redis/argocd-redis-role.yaml b/manifests/base/redis/argocd-redis-role.yaml
index a7a33f48a..c19c4356a 100644
--- a/manifests/base/redis/argocd-redis-role.yaml
+++ b/manifests/base/redis/argocd-redis-role.yaml
@@ -1,6 +1,8 @@
 apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
 metadata:
+  annotations:
+    argocd.argoproj.io/sync-wave: "-1"
   labels:
     app.kubernetes.io/component: redis
     app.kubernetes.io/name: argocd-redis
diff --git a/manifests/base/redis/argocd-redis-rolebinding.yaml b/manifests/base/redis/argocd-redis-rolebinding.yaml
index f396914df..68a84cfe6 100644
--- a/manifests/base/redis/argocd-redis-rolebinding.yaml
+++ b/manifests/base/redis/argocd-redis-rolebinding.yaml
@@ -1,6 +1,8 @@
 apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
 metadata:
+  annotations:
+    argocd.argoproj.io/sync-wave: "-1"
   labels:
     app.kubernetes.io/component: redis
     app.kubernetes.io/name: argocd-redis

Screenshots

Version

Upgrading from:

% k exec argocd-application-controller-0 -- /usr/local/bin/argocd version
time="2024-09-05T12:47:14Z" level=fatal msg="Argo CD server address unspecified"
argocd: v2.10.9+c071af8
  BuildDate: 2024-04-30T15:53:28Z
  GitCommit: c071af808170bfc39cbdf6b9be4d0212dd66db0c
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64
command terminated with exit code 1
%

Upgrading to:

% k exec argocd-application-controller-0 -- /usr/local/bin/argocd version
time="2024-09-05T12:47:41Z" level=fatal msg="Argo CD server address unspecified"
argocd: v2.12.3+6b9cd82
  BuildDate: 2024-08-27T11:57:48Z
  GitCommit: 6b9cd828c6e9807398869ad5ac44efd2c28422d6
  GitTreeState: clean
  GoVersion: go1.22.4
  Compiler: gc
  Platform: linux/amd64
command terminated with exit code 1
%

Logs

Paste any relevant application logs here.
@akloss-cibo akloss-cibo added the bug Something isn't working label Sep 5, 2024
@argoproj argoproj deleted a comment Sep 5, 2024
@argoproj argoproj deleted a comment Sep 5, 2024
@andrii-korotkov-verkada andrii-korotkov-verkada added the version:2.12 Latest confirmed affected version is 2.12 label Nov 11, 2024
@andrii-korotkov-verkada
Copy link
Contributor

I don't see the sync waves in master anymore, so maybe try getting new manifests and/or upgrading to v2.13.0.

@akloss-cibo
Copy link
Author

I think there's a misunderstanding. There are no sync-waves in upstream; i have added sync-wave annotations to our local kustomization to make this work for us.

@andrii-korotkov-verkada
Copy link
Contributor

@akloss-cibo, okay, wasn't sure if that was the case. Thanks for confirming. Can you define sync waves to get the proper order? If not, what's blocking?

@andrii-korotkov-verkada andrii-korotkov-verkada added the more-information-needed Further information is requested label Nov 11, 2024
@akloss-cibo
Copy link
Author

Yes, the sync-wave changes in my original post to cause redis to install prior to the argocd-application-controller DaemonSet work for us.

@andrii-korotkov-verkada andrii-korotkov-verkada removed the more-information-needed Further information is requested label Nov 12, 2024
@akloss-cibo
Copy link
Author

I'm confused by closing this. Yes, we can mitigate the problem by applying our own sync-waves, but it seems to me like the provided kustomization should work for this use case out-of-the-box.

@andrii-korotkov-verkada
Copy link
Contributor

Re-opening. Do you suggest to just add sync waves to the manifests in upstream for people who manage argocd by argocd?

@akloss-cibo
Copy link
Author

I do, yes. I think the folks who engineered the password-enabled redis should probably be consulted to see if they have some other strategy they'd like to see used.

@mtang-pton
Copy link

mtang-pton commented Nov 25, 2024

Just wanted to add that I've run into this issue as well while upgrading self managed Argo from 2.10 -> 2.12. In our case, the upgrade went through in stage but failed in prod, leaving our prod Argo in a bad state while we attempted to debug the issue. I ended up having to disable redis auth to get argo-application-controller working before enabling redis auth again.

I'd like to see the sync waves added upstream and bugfixed in 2.12/2.13.

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 25, 2024
Fixes argoproj#19798

Some Redis stuff needs to be updated before others, same for config maps. That's relevant when managing ArgoCD with ArgoCD.

Signed-off-by: Andrii Korotkov <[email protected]>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Nov 25, 2024
Fixes argoproj#19798

Some Redis stuff needs to be updated before others, same for config maps. That's relevant when managing ArgoCD with ArgoCD.

Needs to be cherry-picked to v2.11-v2.13

Signed-off-by: Andrii Korotkov <[email protected]>
@crenshaw-dev
Copy link
Member

For folks who hit this issue: how did you upgrade Argo CD? It doesn't make sense to me that OP's argocd-redis was out-of-sync while argocd-application-controller was in sync (on the newer version). Did you do a partial sync that didn't include argocd-redis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:application-controller component:redis version:2.12 Latest confirmed affected version is 2.12
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants