Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Remove simplistic advice about multiple controllers reconciling same CR #4537

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

alvaroaleman
Copy link
Member

This advice is simplyfing things and making an "It depends" situation look like there was a clear good and a clear bad way that is the same in all situations. Pretty much none of the issues stated will get better if each controller gets its own CR:

  • Race conditions: Conflict errors can always happen and all controllers need to be able to deal with them. If a full reconciliation is too expensive, they can use something like retry.OnConflict
  • Concurrency issues with different interpretations of state: This example sounds like just buggy software. Copying the state to a new CR doesn't eliminate this problem
  • Maintenance and support difficulties: This is definitely not going to get any better by adding more CRDs into the mix, if anything, it will get more complicated
  • Status tracking complications: This is why conditions exist and Kubernetes api guidelines explicitly state that controllers need to ignore unknown conditions: Objects may report multiple conditions, and new types of conditions may be added in the future or by 3rd party controllers., ref
  • Performance issues: If multiple controllers do the same thing, that is a bug regardless of all other considerations and can easily lead to correctness and performance issues. The workqueue locks items while they are reconciled to avoid exactly that, but obviously it doesn't work cross-controller

To illustrate the situation, think about the Pod object, in the lifecycle of a pod we usually have at least cluster-autoscaler, scheduler and kubelet. Making cluster-autoscaler act on a PodScaleRequest and scheduler on a PodScheduleRequest would be a complication, not a simplification.

This advice is simplyfing things and making an "It depends" situation
look like there was a clear good and a clear bad way that is the same in
all situations. Pretty much none of the issues stated will get better if
each controller gets its own CR:
* Race conditions: Conflict errors can always happen and all controllers
  need to be able to deal with them. If a full reconciliation is too
  expensive, they can use something like `retry.OnConflict`
* Concurrency issues with different interpretations of state: This
  example sounds like just buggy software. Copying the state to a new
  CR doesn't eliminate this problem
* Maintenance and support difficulties: This is definitely not going to
  get any better by adding more CRDs into the mix, if anything, it will
  get more complicated
* Status tracking complications: This is why conditions exist and
  Kubernetes api guidelines explicitly state that controllers need to
  ignore unknown conditions: `Objects may report multiple conditions,
  and new types of conditions may be added in the future or by 3rd
  party controllers.`, [ref][0]
* Performance issues: If multiple controllers do the same thing, that is
  a bug regardless of all other considerations and can easily lead to
  correctness and performance issues. The `workqueue` locks items while
  they are reconciled to avoid exactly that, but obviously it doesn't
  work cross-controller

To illustrate the situation, think about the `Pod` object, in the
lifecycle of a pod we usually have at least cluster-autoscaler,
scheduler and kubelet. Making cluster-autoscaler act on a
`PodScaleRequest` and scheduler on a `PodScheduleRequest` would be a
complication, not a simplification.

[0]: https://github.com/kubernetes/community/blob/322066e7dba7c5043071392fec427a57f8660734/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 3, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 3, 2025
@sbueringer
Copy link
Member

👍 from my side

Copy link
Member

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman

Good catch! Thank you 🥇
I totally agree. I think the original intention might have gotten lost somewhere along the way, or I may have just overlooked it.

Anyway, good catcher.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 3, 2025
@alvaroaleman
Copy link
Member Author

@camilamacedo86 thanks for the prompt review. Could you maybe restart the failed CI job? I don't think its related to the PR and I don't have perms to do that myself

@camilamacedo86
Copy link
Member

@alvaroaleman

I am looking on that.
But it should not prevent this PR.
Thank you again 👍

@camilamacedo86 camilamacedo86 merged commit 54dbd1f into kubernetes-sigs:master Feb 3, 2025
8 of 10 checks passed
@camilamacedo86
Copy link
Member

Hi @alvaroaleman

@camilamacedo86 thanks for the prompt review. Could you maybe restart the failed CI job? I don't think its related to the PR and I don't have perms to do that myself

It is solved, too.
You will no longer face this issue if you want to push any other PR.
Please feel free to contribute within. Your help is very welcome.

@alvaroaleman alvaroaleman deleted the fixup branch February 3, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants