Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vrg: use reader instead of client for check hooks #1750

Merged

Conversation

raghavendra-talur
Copy link
Member

@raghavendra-talur raghavendra-talur commented Jan 14, 2025

When using the cached client, we fail to find the resource if it is recently created.

Co-Authored-by: Annaraya Narasagond <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
@BenamarMk
Copy link
Member

@raghavendra-talur It looks good to me, but I have a question about the cache. Are we potentially undermining the purpose of skipping the usage of the cache? If it's used selectively, that would be fine, but consistently hitting etcd instead of the cache feels like an expensive operation. Also, what’s the downside of not finding the "just created resource" in the cache? The next reconciliation cycle should pick it up and things should be fine. Why is it so important that we find it right after we create it. One reason for it would be to not try to recreate it again -- ok, that would be a valid reason. But is that the case?

@raghavendra-talur
Copy link
Member Author

@raghavendra-talur It looks good to me, but I have a question about the cache. Are we potentially undermining the purpose of skipping the usage of the cache? If it's used selectively, that would be fine, but consistently hitting etcd instead of the cache feels like an expensive operation. Also, what’s the downside of not finding the "just created resource" in the cache? The next reconciliation cycle should pick it up and things should be fine. Why is it so important that we find it right after we create it. One reason for it would be to not try to recreate it again -- ok, that would be a valid reason. But is that the case?

There were two different issues we saw:

  1. Delay in getting the data and multiple reconciles required
  2. The cached client never filled the cache for a certain type of object. IIRC something like pods were in the cache but deployment was not. In case of pod the next reconcile succeeded but in case of deployment, it never did.

Using the reader solves both the problems.

There are pros and cons to using the reader instead of using the cached client. One downside is that we hit the api-server for every backup or restore job, like you said. The pro is that we don't have to maintain the cache for pods, deployment and stateful set.

Comparing the two, I really think the pro of not having to keep the cache about pods,deployment,statefulset in the ramen process memory wins. These resources are not something that ramen relies on in most of the cases(managed apps and discovered with no hooks). It is only in the case of discovered apps with hooks in the recipe that we need this information.

I need to write a unit test that proves this but haven't been able to complete that in this PR. Will do it in a follow up.

Copy link
Member

@asn1809 asn1809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@BenamarMk
Copy link
Member

There were two different issues we saw:

Delay in getting the data and multiple reconciles required
The cached client never filled the cache for a certain type of object. IIRC something like pods were in the cache but deployment was not. In case of pod the next reconcile succeeded but in case of deployment, it never did.

Thanks @raghavendra-talur. Based on what you’ve said, it’s strange that the client-go cache behaves that way, but I’ll leave it at that.

@BenamarMk BenamarMk merged commit acc70c7 into RamenDR:main Jan 17, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants