Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new features in all currently supported k8s versions #4529

Closed
stevehipwell opened this issue Dec 16, 2021 · 24 comments
Closed

Support new features in all currently supported k8s versions #4529

stevehipwell opened this issue Dec 16, 2021 · 24 comments
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@stevehipwell
Copy link
Contributor

Which component are you using?:

Cluster Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Due to the development/release strategy for CA, new features are only available in the newest version of CA. This is problematic in an ideal world where clusters follow the core K8s version lifecycle and is a nightmare in the real world where cloud providers delay their version releases and support old K8s versions.

I'd like to be able to at least use new features on all currently supported Kubernetes versions; assuming that the feature doesn't need anything not present in an older version.

Describe the solution you'd like.:

I think this could be achieved by updating the documentation to make the version of CA with the newest supported client-go version for a given K8s version be the recommended version to use. I think there would need to be tests added to support this.

Alternatively all supportable features should be backported to CA versions aligned to in-support K8s versions.

Describe any alternative solutions you've considered.:

Maintain a fork and backport the changes (I think AWS do this).

Additional context.:

n/a

@stevehipwell stevehipwell added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 16, 2021
@stevehipwell
Copy link
Contributor Author

FYI @x13n I've opened this in response to #4206 (comment).

@x13n
Copy link
Member

x13n commented Dec 20, 2021

The main issue with decoupling CA from k8s version is the dependency on scheduler code. CA reuses a large portion of scheduler codebase and getting them out of sync can lead to bugs: CA and scheduler disagreeing on whether certain pod can be scheduled or not. In order to avoid it, we'd have to start doing releases for each supported k8s version, which would increase the overall maintenance cost.

@stevehipwell
Copy link
Contributor Author

@x13n would it be easier to backport changes then?

@x13n
Copy link
Member

x13n commented Dec 21, 2021

I think a process for supporting all features in all versions would be equivalent to backporting all changes automatically. One off backport would definitely be simpler than that.

@stevehipwell
Copy link
Contributor Author

@x13n I think there should be some process for managing backports. I'm not sure if there is already documentation around backporting changes but it'd be usefully to have the process fully documented. Something like the below steps could be put in place with minimal effort and no "commitment" to do any additional work unless the demand is there.

  • List changes that could be backported in all PRs (latest K8s version)
  • Open an issue once PR is merged to discuss potential backports
  • Provider changes should be backported where possible (see below)
  • Implement backport on a case-by-case basis if there is enough benefit

With the latest v1.23.0 release there are a number of provider changes which aren't tied to the K8s version and are relevant to all supported K8s version (and a number of un-supported ones too). These changes won't actually be used by the majority of users for a significant amount of time due to the managed K8s versions being behind the official ones. This actually poses a major issue to the CA project; code is committed and released but isn't used by the majority of the ecosystem until a relatively large amount of time has passed and usually multiple additional releases. This breaks the feedback cycle for a release and means that both functionality/logic & code isn't always fully validated before incremental changes can be layered on top.

If we take AWS as an example, anyone using EKS wont have had a chance to use v1.22.0 before v1.23.0 was released and they are still using v1.21.0. If the AWS provider changes were backported to v1.22 & v1.21 EKS users can use these new features with existing v1.21 clusters and when they can create new v1.22 clusters they have the best CA experience.

@x13n
Copy link
Member

x13n commented Jan 5, 2022

This is a valid concern, I agree there is a lot of value in porting features unrelated to k8s version into older CA versions. One caveat though is that CA version is currently tied to k8s version. If we start backporting all the changes to older versions, we will be breaking the promise of semantic versioning: new features (and potentially new bugs) will start appearing in patch releases. People upgrading from 1.X.Y to 1.X.Y+1 can now expect bugfixes, not new functionality or breaking changes.

From pure versioning perspective, it might make sense to make k8s version only a part of the CA version. So, there would be e.g. CA 1.0-1.23.0, CA 1.0-1.22.0 and so on. To make it work, we would need to maintain N images instead of just 1.

If we were to stick with the existing setup and just do the work you suggested, I'd argue "List changes that could be backported in all PRs (latest K8s version)" should really only be bugfixes.

@stevehipwell
Copy link
Contributor Author

If the versioning needs changing I think the version should still be a valid SemVer production version (nothing after the patch) so we'd need to go to something v2.x.y = K8s v1.21, v3.x.y = K8s v1.22, etc?

@x13n
Copy link
Member

x13n commented Jan 5, 2022

Good point, the schema I proposed above doesn't really follow semver. So yes, either we need to have 1:1 mapping between major CA version and minor k8s version as you proposed or we treat different k8s versions as different "platforms" (akin to amd64, arm, etc. for regular container images). In the second scenario a single CA version can support multiple k8s platform versions.

Either way, I think it is clear that if we are to start offering new features in older k8s versions, we need to change the versioning schema. Then we'll need to figure out the process for building images, policy on how many k8s versions to support and so on.

@mwielgus @MaciekPytel @gjtempleton @bpineau - thoughts?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2022
@stevehipwell
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2022
@stevehipwell
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 2, 2022
@stevehipwell
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2023
@stevehipwell
Copy link
Contributor Author

@x13n @mwielgus @MaciekPytel @gjtempleton @bpineau has anyone looked into this further?

I think at a minimum if the CA versions were disconnected but still aligned as major versions to K8s minor versions (e.g. K8s v1.26 = CA v1) then backports and features could be added to the actual CA versions in use instead of only the pending release. This would have no other impact on the current logic other than providing Semver space to add functionality.

This pattern could be extended so if no relevant changes are made to the K8s scheduler between releases then the CA major release could be compatible with multiple K8s minor versions.

/remove-lifecycle rotten

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 2, 2023
@stevehipwell
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 3, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2023
@stevehipwell
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 5, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants