Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist resource health in Redis instead of the Application CR by default #10312

Open
crenshaw-dev opened this issue Aug 12, 2022 · 8 comments · May be fixed by #21532
Open

Persist resource health in Redis instead of the Application CR by default #10312

crenshaw-dev opened this issue Aug 12, 2022 · 8 comments · May be fixed by #21532
Assignees
Labels
enhancement New feature or request type:scalability Issues related to scalability and performance related issues
Milestone

Comments

@crenshaw-dev
Copy link
Member

Summary

As of 2.5, we'll have the option to persist resource health statuses in Redis instead of in the Application CRD. In 3.0, we should default to persisting in Redis.

Motivation

  # Specifies if resource health should be persisted in app CRD (default true)
  # Changing this to `false` significantly reduce number of Application CRD updates and improves controller performance.
  # However, disabling resource health by default might affect applications that communicate with Applications CRD directly
  # so we have to defer switching this to `false` by default till v3.0 release.
  controller.resource.health.persist: "true"

Proposal

Flip the switch. Maybe even remove support for CRD persistence entirely.

@crenshaw-dev crenshaw-dev added the enhancement New feature or request label Aug 12, 2022
@crenshaw-dev crenshaw-dev added this to the v3.0 milestone Aug 12, 2022
@crenshaw-dev crenshaw-dev changed the title Persist resource health in Redis instead of the Application CRD by default Persist resource health in Redis instead of the Application CR by default Apr 24, 2024
@csantanapr
Copy link
Member

Minor clarification is Application CR not Application CRD
I agree with the statement

Flip the switch. Maybe even remove support for ArgoCD CR persistence entirely.

Saving the status in the CR/etcd has a performance impact, end user like Adobe had to switch the default to redis to improve scalability in their recent talk at GitOpsCon NA 2024 https://youtu.be/MDXrc_cLVns?t=404
image

@csantanapr csantanapr added the type:scalability Issues related to scalability and performance related issues label Apr 24, 2024
@rumstead
Copy link
Member

rumstead commented Jan 7, 2025

Will this break the applicationset controller's progressive sync functionality?

@crenshaw-dev
Copy link
Member Author

@rumstead good question. I think the answer is "yes." One option would be to add an option that only persists top-level health changes to the status instead of changes for individual resources. Another option would be to modify the AppSet controller to pull the current health status from Redis.

@rumstead rumstead self-assigned this Jan 8, 2025
@rumstead
Copy link
Member

rumstead commented Jan 9, 2025

took me forever to find when the resource health cache was created - #1120

@rumstead
Copy link
Member

Interested in everyone's thoughts on this overall approach before I go test.

#21455

@MrFreezeex
Copy link
Contributor

MrFreezeex commented Jan 10, 2025

I am not entirely sure this would work out with notification/notification controllers too. AFAIU the notification controller rely on the Application CR to be updated to be able to send notification in "real" time about sync event 🤔. Sorry if that's already taken care of, just wondering how this could actually works 😅.

@rumstead
Copy link
Member

I am not entirely sure this would work out with notification/notification controllers too. AFAIU the notification controller rely on the Application CR to be updated to be able to send notification in "real" time about sync event 🤔. Sorry if that's already taken care of, just wondering how this could actually works 😅.

That is a good shout but I don't believe the notification controller looks at the health status of the individual resources, just the overall application health / sync status.

E.g., the diff in the status would look something like this:

persis = true (the default)

status:
  health:
    status: Healthy
  resources:
  - group: apps
    health:
      status: Healthy
    kind: Deployment
    name: demo-app-kust
    namespace: cluster-generator
    status: OutOfSync
    version: v1
  sync:
    status: OutOfSync

persist = false

# snippet CR status
status:
  health:
    status: Healthy
  resourceHealthSource: appTree
  resources:
  - group: apps
    kind: Deployment
    name: demo-app-kust
    namespace: cluster-generator
    status: OutOfSync
    version: v1
  sync:
    status: OutOfSync

Where persist = false is missing status.resources[].health

@crenshaw-dev crenshaw-dev moved this to Todo in Argo CD 3.0 Jan 14, 2025
@rumstead rumstead moved this from Todo to In Progress in Argo CD 3.0 Jan 16, 2025
@rumstead
Copy link
Member

@rumstead good question. I think the answer is "yes." One option would be to add an option that only persists top-level health changes to the status instead of changes for individual resources. Another option would be to modify the AppSet controller to pull the current health status from Redis.

Ok after looking into this deeper, progressive syncs won't be impacted. Progressive syncs only care about the application status. The application status depends on the underlying resources but the application controller handles properly setting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request type:scalability Issues related to scalability and performance related issues
Projects
Status: In Progress
4 participants