Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not compare generated resources against API server but against spec hash #500

Open
1 task
pmalek opened this issue Aug 16, 2024 · 6 comments
Open
1 task
Milestone

Comments

@pmalek
Copy link
Member

pmalek commented Aug 16, 2024

Problem statement

Comparing generates resource to decide whether a patch is required has proven to be difficult to get right (e.g. #239 )

Instead of this approach we can utilize a different technique:

  • Attach a spec hash annotation to generated resources
  • When patch/update is triggered, compare against that value
  • If those differ, patch/update

Acceptance criteria

@akunszt
Copy link
Contributor

akunszt commented Aug 26, 2024

What about fetching the information from the managedFields? I did not check any code just thinking at the moment. Using a hash about the values can be problematic as it is common practice to use Mutating Webhooks to runtime modify resources (e.g. add an image prefix or modify the image to fetch from a local cache, change resource settings, remove CPU limits, use the same value for memory requests and limits, add sidecar containers for logging, observability, other nefarious reasons, etc.)

One reason using Mutating Webhooks is to standardize the workloads and many third party software components lacks the required level of configurability (they focus lies elsewhere and you want to provide a simpler interface to your users than the raw Kubernetes API).

@pmalek
Copy link
Member Author

pmalek commented Sep 9, 2024

What about fetching the information from the managedFields? I did not check any code just thinking at the moment. Using a hash about the values can be problematic as it is common practice to use Mutating Webhooks to runtime modify resources (e.g. add an image prefix or modify the image to fetch from a local cache, change resource settings, remove CPU limits, use the same value for memory requests and limits, add sidecar containers for logging, observability, other nefarious reasons, etc.)

One reason using Mutating Webhooks is to standardize the workloads and many third party software components lacks the required level of configurability (they focus lies elsewhere and you want to provide a simpler interface to your users than the raw Kubernetes API).

@akunszt Thanks for your comment. The idea behind this issue is to

  • generate the podTemplateSpec (which is done currently - 1.3.0 - anyway) for a particular resource
  • based on that (prior to submitting to kube API server) calculate the hash on that spec
  • add that hash through e.g. annotation

Then, upon the update we do the steps above and compare what was generated (before submitting to kube API server) with the hash on the object that's in cluster. If those are equal then there's no need to change anything. This method does not produce false positives caused by MutatingWebhooks changing resources on admission.

@scottaubrey
Copy link

I spent a good chunk of my day wondering why the status of the data plane and control plane never go to ready despite the pods being ready. When I finally stumbled upon #239 and this issue, it finally made sense.

I support the idea of the hash so that resources can be mutated at runtime (I my instance, because of a kyverno policy).

@pmalek
Copy link
Member Author

pmalek commented Sep 24, 2024

Hi @scottaubrey 👋

Thanks for commenting on the issue. We want to get that implemented to solve these pain points for suers but we're currently going through the prioritized backlog. We hope to hope some time to work on this in not so distant future.

@pmalek
Copy link
Member Author

pmalek commented Nov 19, 2024

I feel the need to comment on this one just to make sure everyone involved is aware of the consequences of implementing this change:

When "fixed", this will change the current behavior of KGO where it continuously enforces its configuration (and therefore corrects in cluster changes of resource that it manages) into enforcing it only on spec updates.

This will make KGO's reconciliation loop not fight with any solutions that are installed in users' clusters that could potentially come in conflict with KGO's actions on its managed resources but this will in turn make its actions not perform the self healing/correction.

@pmalek
Copy link
Member Author

pmalek commented Jan 24, 2025

I've prepared a short design doc for this: https://docs.google.com/document/d/1MSq_kPcJoCFPh4XqQtjWE5nsEaSMTGHs9qblPJ6DHDY/edit?tab=t.0. Let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants