Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Allow disabling of kOps's management of StorageClasses #10733

Merged

Conversation

seh
Copy link
Contributor

@seh seh commented Feb 4, 2021

Introduce a new "manageStorageClasses" field in the cloud provider portion of the Cluster specification to allow operators to opt out of kOps creating and reconciling StorageClass objects. The default value for the new "spec.cloudConfig.manageStorageClasses" field is true, to maintain compatibility with the existing behavior. When set to false, kOps will no longer create any StorageClass objects. Any such objects that kOps created in the past are left as is, and kOps and will no longer reconcile them against future changes.

The existing "spec.cloudConfig.openstack.blockStorage.createStorageClass" field remains in place. However, if both that and the new "spec.cloudConfig.manageStorageClasses" field are populated, they must agree: It is invalid both to disable management of StorageClass objects globally but to enable them for OpenStack and, conversely, to enable management globally but disable it for OpenStack.

Please see this discussion in the "kops-users" channel of the "Kubernetes" Slack workspace for the negotiation that led to this change. @zetaab, take note.


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 4, 2021
@k8s-ci-robot k8s-ci-robot added area/api area/nodeup size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 4, 2021
@seh
Copy link
Contributor Author

seh commented Feb 4, 2021

Note that running hack/update-expected.sh changed 109 files. I didn't want to do that, but several integration tests failed until I allowed it to do its thing.

@seh seh mentioned this pull request Feb 4, 2021
@@ -15,7 +15,7 @@ kind: StorageClass
metadata:
name: gp2
annotations:
storageclass.beta.kubernetes.io/is-default-class: "false"
storageclass.kubernetes.io/is-default-class: "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken but I think the reason I left the old annotations was because k8s was reading both still? As in, not setting storageclass.beta.kubernetes.io/is-default-class: "false" on the gp2 class and trying to just set storageclass.kubernetes.io/is-default-class: "false" caused the same issues with which one ended up being the default. Would have to test to confirm though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply should remove all unset annotations.

}

if c.ManageStorageClasses == nil {
c.ManageStorageClasses = fi.Bool(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you really need this file since you need to use WithDefaultBool in the templates anyway. And you can set true as the default value there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Since CloudConfig isn't set, you need this. I ran into nil pointer issues in my PR as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change will break backwards compatbility with openstack. We cannot now update existing clusters because it says

spec.cloudConfig.manageStorageClasses: Forbidden: Management of storage classes and OpenStack block storage classes are both specified but disagree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

This doesn't really break BC per se. It only breaks when you have explicitly disabled storageclass on OpenStack.

If we don't set this, it will break for pretty much everyone else. The error message seems clear on what actions users need to take.
Alternative is to ignore this field for OpenStack and document as such, but that may be more confusing for users? Could potentially lead to misconfigurations.

But open to suggestions how to fix this. I suggest at minimum a clear "Actions needed" in the release notes.

Copy link
Member

@zetaab zetaab Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did migration script for our clusters which runs kops set cluster spec.cloudConfig.manageStorageClasses=false after that it works fine. In my opinion it is pretty easy step? okay kops set is still behind featureflag? Anyways it works when executing that. So if people has disabled storageclass in openstack, they need to set manageStorageClasses=false, otherwise they cannot run kops update cluster or other commands

@@ -42,6 +42,7 @@ contents: |
- --authorization-mode=AlwaysAllow
- --bind-address=0.0.0.0
- --client-ca-file=/srv/kubernetes/ca.crt
- --cloud-config=/etc/kubernetes/cloud.config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do look very strange, but we make some funny assumptions about spec.cloudConfig. I guess at some point the intention was that this flag only was for configuring the cloud.config file, but then got (ab)used for other things later.

I think we always write cloud.config though, so it should be safe to always reference this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change came by way of running hack/update-expected.sh, so it wasn't a conscious decision of mine to add the flag.

@olemarkus
Copy link
Member

/lgtm

Leaving it unapproved in case someone else also wants to review.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2021
@seh
Copy link
Contributor Author

seh commented Feb 5, 2021

Another potential design we could consider here to allow more control over StorageClass objects in the future: introduce a "storageClasses" field with an "enabled" field within. It would look like this:

spec:
  cloudConfig:
    storageClasses:
      enabled: true

If we had that, we could later add finer-grained control over whether and which class gets nominated as the cluster default. Do you think that would be worth it?

Also, as noted in Slack, I found that when I populated my new field, all the ASG launch templates got updated, inducing machine replacement at the next kops rolling-update cluster invocation. That seems like too much churn, given that we shouldn't need to replace any machines to add a few StorageClass objects. Is that indicating that maybe "spec.cloudConfig" isn't the right place for this new field?

@olemarkus
Copy link
Member

olemarkus commented Feb 5, 2021

Another potential design we could consider here to allow more control over StorageClass objects in the future: introduce a "storageClasses" field with an "enabled" field within. It would look like this:

spec:
  cloudConfig:
    storageClasses:
      enabled: true

If we had that, we could later add finer-grained control over whether and which class gets nominated as the cluster default. Do you think that would be worth it?

Also, as noted in Slack, I found that when I populated my new field, all the ASG launch templates got updated, inducing machine replacement at the next kops rolling-update cluster invocation. That seems like too much churn, given that we shouldn't need to replace any machines to add a few StorageClass objects. Is that indicating that maybe "spec.cloudConfig" isn't the right place for this new field?

The roll happens because the kube-apiserver flags change. And that happens because cloudConfig is not nil anymore. I think this is a necessary one-time evil.

If you want to fix this, we need a better check around when to add the cloud-config flag.

@seh
Copy link
Contributor Author

seh commented Feb 5, 2021

The roll happens because the kube-apiserver flags change.

I saw all the launch templates change—even for worker machines that don't host the Kubernetes API server—because the "spec.cloudConfig" field's value makes it into the conf/cluster_spec.yaml file included in the user data.

@olemarkus
Copy link
Member

People don't change this too often though. So I am fine with leaving it. We have other options that triggers unecessary rolls, and we have changes that should trigger rolls that don't. Not ideal, but not easy to solve now.

@seh
Copy link
Contributor Author

seh commented Feb 5, 2021

People don't change this too often though.

Yes, that's a good point. This isn't something we're likely to toggle more than once.

@seh seh force-pushed the allow-disabling-of-storage-class-mgmt branch from ca04fd6 to d44612c Compare February 11, 2021 15:50
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2021
@olemarkus
Copy link
Member

/lgtm

@olemarkus
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olemarkus, seh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2021
@k8s-ci-robot k8s-ci-robot merged commit ec49519 into kubernetes:master Feb 11, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Feb 11, 2021
k8s-ci-robot added a commit that referenced this pull request Feb 14, 2021
…igin-release-1.20

Automated cherry pick of #10733: Define "ManageStorageClasses" cloud config field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants