Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic provisioning support for CSI migration scenarios #253

Merged
merged 2 commits into from
Mar 25, 2019

Conversation

ddebroy
Copy link
Collaborator

@ddebroy ddebroy commented Mar 15, 2019

NOTE 1: The CSI migration provisioning logic here is not directly feature gated. However, the feature flags on the in-tree side sufficiently protects the logic here. This is because the PV controller in-tree will only yield PVCs (for in-tree provisioners) to the external provisioner when the feature flags for CSI migration for a particular plugin is enabled in-tree which in turn will make a plugin advertise IsMigratedToCSI : kubernetes/kubernetes@23478f1#diff-3f32ad90966ceeb297f57c2a8d726348R1404

NOTE 2: Unit tests are skipped for now. We need to address kubernetes/kubernetes#74594 to be able to inject unit-tests as desired into the translation framework for unit testing purposes.

Tested e2e with GCE PD CSI driver name and ensured CSI driver got invoked for provisioning but the resulting PV pointed to GCEPersistentDisk

I0315 03:43:50.483655       1 controller.go:476] CreateVolumeRequest {Name:pvc-8bd370e9-46d4-11e9-b867-42010a8a001f CapacityRange:required_bytes:6442450944  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[replication-type:none type:pd-standard] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0315 03:43:50.483838       1 connection.go:180] GRPC call: /csi.v1.Controller/CreateVolume
I0315 03:43:50.483866       1 connection.go:181] GRPC request: {"capacity_range":{"required_bytes":6442450944},"name":"pvc-8bd370e9-46d4-11e9-b867-42010a8a001f","parameters":{"replication-type":"none","type":"pd-standard"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0315 03:43:54.881857       1 connection.go:183] GRPC response: {"volume":{"accessible_topology":[{"segments":{"topology.gke.io/zone":"us-west1-a"}}],"capacity_bytes":6442450944,"volume_id":"projects/docker4x/zones/us-west1-a/disks/pvc-8bd370e9-46d4-11e9-b867-42010a8a001f"}}
I0315 03:43:54.883303       1 connection.go:184] GRPC error: <nil>
I0315 03:43:54.883320       1 controller.go:520] create volume rep: {CapacityBytes:6442450944 VolumeId:projects/docker4x/zones/us-west1-a/disks/pvc-8bd370e9-46d4-11e9-b867-42010a8a001f VolumeContext:map[] ContentSource:<nil> AccessibleTopology:[segments:<key:"topology.gke.io/zone" value:"us-west1-a" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0315 03:43:54.883435       1 controller.go:588] successfully created PV {GCEPersistentDisk:&GCEPersistentDiskVolumeSource{PDName:pvc-8bd370e9-46d4-11e9-b867-42010a8a001f,FSType:ext4,Partition:0,ReadOnly:false,} AWSElasticBlockStore:nil HostPath:nil Glusterfs:nil NFS:nil RBD:nil ISCSI:nil Cinder:nil CephFS:nil FC:nil Flocker:nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil PortworxVolume:nil ScaleIO:nil Local:nil StorageOS:nil CSI:nil}
I0315 03:43:54.883573       1 controller.go:1275] provision "default/podpvc" class "slow": volume "pvc-8bd370e9-46d4-11e9-b867-42010a8a001f" provisioned
I0315 03:43:54.883594       1 controller.go:1292] provision "default/podpvc" class "slow": succeeded
> kubectl describe pv pvc-8bd370e9-46d4-11e9-b867-42010a8a001f
Name:              pvc-8bd370e9-46d4-11e9-b867-42010a8a001f
Labels:            failure-domain.beta.kubernetes.io/region=us-west1
                   failure-domain.beta.kubernetes.io/zone=us-west1-a
Annotations:       pv.kubernetes.io/provisioned-by: pd.csi.storage.gke.io
Finalizers:        [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection]
StorageClass:      slow
Status:            Bound
Claim:             default/podpvc
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          6Gi
Node Affinity:     
  Required Terms:  
    Term 0:        failure-domain.beta.kubernetes.io/zone in [us-west1-a]
                   failure-domain.beta.kubernetes.io/region in [us-west1]
Message:           
Source:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     pvc-8bd370e9-46d4-11e9-b867-42010a8a001f
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 15, 2019
@k8s-ci-robot k8s-ci-robot requested review from msau42 and sbezverk March 15, 2019 04:47
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 15, 2019
@ddebroy ddebroy requested review from jsafrane and removed request for sbezverk and msau42 March 15, 2019 04:49
@ddebroy
Copy link
Collaborator Author

ddebroy commented Mar 15, 2019

/cc @leakingtapan @msau42 @davidz627

@ddebroy ddebroy force-pushed the csimigprov1 branch 4 times, most recently from ef09afb to ae0c72e Compare March 15, 2019 23:32
Copy link
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Deep! Just a couple small comments.

handlesMigrationFromInTreePlugin := false
handlesMigrationFromInTreePluginName := ""
if csitranslationlib.IsMigratedCSIDriverByName(provisionerName) {
handlesMigrationFromInTreePluginName, err = csitranslationlib.GetInTreeNameFromCSIName(provisionerName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this instead be a parameter passed in to the provisioner? If an inTreePluginName is given then it handles that migration and if an empty string or nothing is passed in we know it doesn't handle it.

This just gives us more options. Maybe someone will want to run two of the same driver, one for migration one for native CSI. Or maybe someone wants to do a fork of a driver and use that for migration (for some crazy weird reason). Either way I think leaving it open for more options seems like a better idea unless there is a specific issue we have with doing that.

Also could we envision a scenario where we would want to pass in multiple in-tree plugin names? Maybe if in the future a single external provisioner could service multiple drivers? Or a single driver is able to handle the dynamic provisioning for multiple different in-tree plugins. This is probably not as useful to think about right now but it would be nice to leave room for it.

Maybe just a string parameter is fine for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion to have options and keep this extensible sounds good. The main concern I have with the options is that some of the logic in https://github.com/kubernetes/csi-translation-lib/blob/master/translate.go is pretty tightly bound to names of specific CSI drivers for certain translations like https://github.com/kubernetes/csi-translation-lib/blob/master/translate.go#L72. Perhaps we can add a note for authors in the migration notes that name changes for a CSI driver needs to be reflected in the csi-translation-lib.

volumeNameUUIDLength int
config *rest.Config
driverName string
handlesMigrationFromInTreePlugin bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this bool since the existence of handlesMigrationFromInTreePluginName kind of implies that it is migrated right?

performInTreeTranslation := false
if p.handlesMigrationFromInTreePlugin {
storageClassName := options.PVC.Spec.StorageClassName
storageClass, err := p.client.StorageV1().StorageClasses().Get(*storageClassName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO(ISSUENUMBER): use informers

if err != nil {
return nil, fmt.Errorf("failed to get storage class named %s: %v", *storageClassName, err)
}
if storageClass.Provisioner == p.handlesMigrationFromInTreePluginName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the else block that can log that we are not translating the storage class parameters. Just to make this easier to understand (I got confused at first)

return nil, fmt.Errorf("failed to get storage class named %s: %v", *storageClassName, err)
}
if storageClass.Provisioner == p.handlesMigrationFromInTreePluginName {
klog.V(2).Infof("Perform CSI migration for intree plugin %s", storageClass.Provisioner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more specific here. CSI Migration: translating storage class parameters for in-tree plugin %s?

@@ -551,6 +576,13 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis
pv.Spec.PersistentVolumeSource.CSI.FSType = fsType
}

if performInTreeTranslation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performInTreeTranslation doesn't seem like the right name.

It relates to "this specific volume is using translation logic for migration"
volumeMigrated?
volumeShimmed?
volumeTranslated?

I would go with translated probably but will defer to you

volumeNamePrefix: volumeNamePrefix,
volumeNameUUIDLength: volumeNameUUIDLength,
handlesMigrationFromInTreePlugin: handlesMigrationFromInTreePlugin,
handlesMigrationFromInTreePluginName: handlesMigrationFromInTreePluginName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just supportsInTreePlugin (name is implied)?

@@ -59,12 +59,32 @@ var (
workerThreads = flag.Uint("worker-threads", 100, "Number of provisioner worker threads, in other words nr. of simultaneous CSI calls.")
operationTimeout = flag.Duration("timeout", 10*time.Second, "Timeout for waiting for creation or deletion of a volume")
provisioner = flag.String("provisioner", "", "This option is deprecated")
inTreePluginName = flag.String("supersedes-in-tree-plugin-name", "", "An in-tree plugin that is superseded by this CSI plugin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate "migration" somewhere in the parameter name and description. --migrated-plugin-name?

klog.Warningf("Provisioner name: %s not registered in CSI translation library for performing migrations for %s", csiPluginName, inTreePluginName)
return
}
inTreePluginNameFromLib, err := csitranslationlib.GetInTreeNameFromCSIName(csiPluginName)
Copy link
Contributor

@jsafrane jsafrane Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can query in-tree plugin name from CSI driver name, why do we need --supersedes-in-tree-plugin-name option? Should be --feature-gate enough? It allows us to enable it by default during beta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops this is complete opposite of my comment from earlier. I didn't think about enabling it by default during beta. Sorry @ddebroy could you change it back... my bad

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert back to querying behavior. That will ensure a single source of truth i.e. the translation library and enablement of migration by default. As far as feature gates is concerned, this will depend on the in-tree feature gates to be enabled for individual plugins (besides migration in general) for the PV controller to pass PVCs with in-tree provisioners to external provisioners.

@ddebroy
Copy link
Collaborator Author

ddebroy commented Mar 19, 2019

Comments above have been addressed and ready for another look @jsafrane @davidz627

@ddebroy
Copy link
Collaborator Author

ddebroy commented Mar 20, 2019

Rebased to latest changes.

@@ -325,6 +329,31 @@ func getVolumeCapability(
}

func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.PersistentVolume, error) {
if options.PVC.Spec.Selector != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already selector check below.

(this is probably result of rebase)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Will remove.

if p.supportsMigrationFromInTreePluginName != "" {
storageClassName := options.PVC.Spec.StorageClassName
//TODO(https://github.com/kubernetes-csi/external-provisioner/issues/256): use informers
storageClass, err := p.client.StorageV1().StorageClasses().Get(*storageClassName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only reason for Get class is to get the provisioner name, there is one in PVC.Annotations["volume.beta.kubernetes.io/storage-provisioner"]. Can this be used instead? Or please add comment why it can't.

Copy link
Collaborator Author

@ddebroy ddebroy Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. PVC.Annotations["volume.beta.kubernetes.io/storage-provisioner"] cannot be used for migration scenarios because it's set to the name of the CSI provisioner at https://github.com/kubernetes/kubernetes/blob/596a48dd64bcaa01c1d2515dc79a558a4466d463/pkg/controller/volume/persistentvolume/pv_controller.go#L1383 and https://github.com/kubernetes/kubernetes/blob/596a48dd64bcaa01c1d2515dc79a558a4466d463/pkg/controller/volume/persistentvolume/pv_controller.go#L1396 so that external CSI provisioner can detect the PVC and act on it. Will clarify with comment.

if migratedVolume {
pv, err = csitranslationlib.TranslateCSIPVToInTree(pv)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves the provisioned volume orphaned in the storage backend. We can either delete it (possibly resulting in other errors, oh well...) or return an error that suggests there is an orphan that must be cleaned manually. Both are quite bad, I'd prefer deleting the volume though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, some log (or even an event?) that this is a critical bug and should be reported / fixed would be helpful.

@davidz627
Copy link
Contributor

LGTM, will let @jsafrane do the final tagging

@jsafrane
Copy link
Contributor

lgtm, please squash the commits (at least no "Address code review comments")

@ddebroy
Copy link
Collaborator Author

ddebroy commented Mar 22, 2019

Commits squashed.

@davidz627
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 22, 2019
@ddebroy
Copy link
Collaborator Author

ddebroy commented Mar 22, 2019

/assign @lpabon @msau42 @jsafrane for approval

@k8s-ci-robot
Copy link
Contributor

@ddebroy: GitHub didn't allow me to assign the following users: for, approval.

Note that only kubernetes-csi members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @lpabon @msau42 @jsafrane for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsafrane
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627, ddebroy, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 41036d7 into kubernetes-csi:master Mar 25, 2019
kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023
jsafrane added a commit to jsafrane/external-provisioner that referenced this pull request May 13, 2024
adb3af9df Merge pull request kubernetes-csi#252 from bells17/update-go-version
b82ee3888 Merge pull request kubernetes-csi#253 from bells17/fix-typo
c31745621 Fix typo
0a7850561 Bump to Go 1.22.3
edd89ad58 Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck
043fd0991 Add test-logcheck target
d7535ae0c Merge pull request kubernetes-csi#250 from jsafrane/go-1.22
b52e7ad35 Update go to 1.22.2
14fdb6f66 Merge pull request kubernetes-csi#247 from msau42/prow
9b4352e9f Update release playbook
c7bb972cc Fix release notes script to use fixed tags
463a0e9f5 Add script to update specific go modules

git-subtree-dir: release-tools
git-subtree-split: adb3af9dfa3ed4d1a922cd839bb48e0b73918617
hime added a commit to hime/external-provisioner that referenced this pull request Jun 13, 2024
f40f0ccd4 Merge pull request kubernetes-csi#256 from solumath/master
cfa92106c Instruction update
379a1bb9b Merge pull request kubernetes-csi#255 from humblec/sidecar-md
a5667bbbb fix typo in sidecar release process
49676850e Merge pull request kubernetes-csi#254 from bells17/add-github-actions
d9bd160c2 Update skip list in codespell GitHub Action
adb3af9df Merge pull request kubernetes-csi#252 from bells17/update-go-version
f5aebfc9f Add GitHub Actions workflows
b82ee3888 Merge pull request kubernetes-csi#253 from bells17/fix-typo
c31745621 Fix typo
0a7850561 Bump to Go 1.22.3
edd89ad58 Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck
043fd0991 Add test-logcheck target
d7535ae0c Merge pull request kubernetes-csi#250 from jsafrane/go-1.22
b52e7ad35 Update go to 1.22.2
14fdb6f66 Merge pull request kubernetes-csi#247 from msau42/prow
9b4352e9f Update release playbook
c7bb972cc Fix release notes script to use fixed tags
463a0e9f5 Add script to update specific go modules

git-subtree-dir: release-tools
git-subtree-split: f40f0ccd458f2d4555e3ca98d69b5a984bae0f14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants