-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a "MountDevice" equivalent step #119
Comments
Hmm. Given that devs already understand that we need to do the ref counting and that this would be an optional piece that likely still results in plug-ins implementing the same logic for the publish operation, I’d rather leave it as is — opaque and up to the plug-ins to simply provide the volume at the requested target path and access mode. |
Hi @saad-ali, Another piece of feedback. If you're going to separate |
@saad-ali How would this relate to the |
Hi @saad-ali, Lastly, please be clear re: your intentions for |
Hi @saad-ali, I lied. One more thing. I think CSI should be as opaque as possible for the health of both the CO and the plug-in, not to mention the backend storage platform. Just like a CO can fail or crash during a I think it's dangerous and engenders too much potential for unsynchronized state for a CO to participate in the ref counting workflow. The process should be opaque to the CO. Its job is to request storage, not care how the plug-in provides it. The CO says "provide me storage at That's my two cents. Thanks! |
Is the reference counting central to the issue? With the Docker implementation we have been challenged in this area since the unmount of any container would trigger an unmount operation to the plugin. It has been on the plugin to maintain the references to figure out when appropriate to actually perform a detach of the device. But this has not been a great solution since the plugin is not aware of the actual state of containers running. For example, the engine resetting and killing containers without proper unmount calls making their way to the plugin. Having a single authoritative source of truth will help here. I am not sure this handles your primary concern for this thread @saad-ali, but here's some suggestions for the specification to make it more deterministic for ref counting.
|
Hi @saad-ali, Because there's always going to be the edge case of dangling, orphaned devices in some circumstance, this is what I'd recommend. Create an RPC called This handles the case of drift, which I think can always happen between a CO and a node if the CO crashes or there's some error in a publish call. |
@saad-ali can you elaborate a bit more on the problem. I thought we made it pretty clear that CO will do ref counting and and plugin simply react to the instructions from the CO. For your case, IIUC, why not simply call |
I agree with you Jie. It’s not broken, so why “fix” it. |
Yes, that would be necessary,
The "MountDevice" would be optional.
Yes. "MountDevice"/"UnmountDevice" would be optional like ControllerPublishVolume/ControllerUnpublishVolume. It is up to a particular volume plugin to decide if it supports it or not. If it does, the CO must call MountDevice before NodePublish... If not, then the CO shall not call MountDevice at all and the plugin can handle all of it's logic in the NodePublish/Unpublish.
At least on the k8s side we already do it, and adding it to the API would simplify plugin development significantly, as this logic wouldn't need to be reimplemented for each plugin that requires it.
That's exactly the problem this is trying to address.
Agreed.
Overloading the existence or lack of a parameter to mean something is error prone. So I'd prefer having a new explicit call (MountDevice/UnmountDevice) to handle this.
Agreed.
I completely agree, state will shift and CO will need a mechanism to recover its node state. So I really like this idea @akutz! We should pursue this as a separate request as it isn't a direct blocker for this.
We discussed offline. To summarize, the NodePublishVolume call was originally intended to do what we I described the MountDevice call as doing, but we changed it from once per volume node to once per volume per workload in order to handle credentials properly. That change however makes the plugin implementation for plugins that require an intermediary mount path more complicated. We can discuss this in the community meeting. |
Hi Saad, I’m slightly concerned about the number of changes this close to a possible, tagged release of the spec. I also strongly believe that the spec should remain as opaque as possible with respect to the workflow and data model, keeping things flexible. The more the CO is responsible for owning certain aspects of the operation, the more opportunities there are for failures along the way. All that said, this function should be called NodePublishDevice IMO, as the device isn’t being mounted but attached. The publish prefix is consistent with existing function names and is abstract enough to be a suitable description of the operation. As for NodeListDevices, I am going to file a PR for it and ListDevices so that they both are able to accept a series of optional tags on which to filter results. This is because of my next concern WRT this proposal... Reconciliation. I do not think this spec ever fully handles the fact that the CO can and will drift out of sync with the state of the storage platform. The more moving pieces (distinct RPCs) there are, the more likely this will happen. |
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This seems an awful lot like the original intent of If the issue is that we don't want the plugin to have to maintain reference counts when there are multiple containers using the same mounted volume, could we not just stipulate that Another point I would add is that even if we add this RPC, it will be incompatible with plugins that consume credentials, since the creds are typically injected at mount time, and we require multiple mounts when there are multiple creds. (Unless you're doing something esoteric like bind mounting with bindfs, but I feel certain that nobody is doing that currently.) |
This still necessitates reference counting on the plug-in side in order to know when it is safe to unmount something. That said @julian-hj, I agree, I prefer the way it is today with the plug-in responsible for doing reference counting and keeping it opaque from the CO. |
There has been a lot of people asking to change the current docker volume API such that the plugin would not have to do ref-counting. If we can solve the issue of ref-counting going out of sync somehow then I think it'd be a worthy thing to add, however I don't think this particular proposal really solves that problem. |
From what I can see in the current spec, NodeUnpublishVolume explicitly states that the operation must be idempotent. So I am confused where the assumption that the plug does reference counting would come from? The way I understand it, the CO will call NodePublishVolume once for each credential set with distinct mount points. The CO will call NodeUnpublishVolume once for each mount point on the back end. At no point does the plugin require a reference count, because NodeUnpublish just means unmount the specified mount point. |
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
I believe this issue is related to #161.
|
Seeing as we’ve crossed the 0.1 milestone I’d like to reopen discussion on this for 0.2 as well as summarize some of the thoughts of this thread. What we all currently agree on is that adding a new call for this functionality would move reference counting of volumes from the plugin to the CO. I propose we move reference counting from the plugin to the CO for 2 main reasons:
To elaborate on the second point. A core design decision in CSI is for the CO to handle complicated behaviors such as failure modes and recovery while the plugin is meant to to opaquely and idempotently react to commands from the CO. This allows much of the complexity to be centralized in a well maintained and shared part of the system. The plugins main job is to make volumes available for consumption by workloads, not to be a partial orchestration system on its own. By adding MountDevice we remove some of the burden of failure recovery from the plugin and push it onto the CO. I would also like to propose naming the new call NodeStageVolume/NodeUnstageVolume. The naming is consistent with current calls as the “Node” part refers to the scope of the call and the “Volume” part follows the existing convention. “Stage” is the action word here that refers to the fact that this call is not finally publishing the volume for usage, but is performing an intermediary staging step to make it available for publishing. I am open for other suggestions on the naming. I’ve looked at this #122 PR by @akutz and would be happy to pick up this work where he left off. |
@davidz627 Are you suggesting we add both MountDevice/UnmountDevice and NodeStageVolume/NodeUnstageVolume? btw, I would prefer naming such as "Prepare" and "Release" for this than "Stage" and "Unstage". |
@cpuguy83 Sorry I was unclear. I meant just NodeStageVolume/NodeUnstageVolume. "MountDevice/UnmountDevice" are just the synonymous calls in k8s. |
Please note that CSI create/delete publish/unpublish calls are already idempotent in the current spec, and reference counting is already the responsibility of the CO. Not saying we shouldn't do this, just that we ought to do it for the right reasons. |
Hi @davidz627, Well, you're free to pick things up, but I was expecting to continue the work once the release was tagged. I'd be more than happy to collaborate with you! Also, as it is, the spec basically poo-poo'd the notion of referencing counting by the SPs. I too was initially under the impression that a volume should be able to be mounted multiple times on a single node host. However, as @codenrhoden and I learned after doing all the work to accomplish this task, the changes to the spec's documentation around |
Hi @jieyu, Would you please link that super-long discussion where Travis and I realized that the docs had changed and there's no longer the ability to mount a single volume to multiple target paths on a single host unless the volume is |
Related to #150 |
Thanks @jieyu :) |
@akutz, I would be happy to collaborate! I believe adding this 'MountDevice' call will be important to shift reference counting logic for |
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
Closed with #169 |
Kuberentes has a "MountDevice" step that attachable volumes can optionally implement. This allows a single device to be mounted to a "global" location, and then subsequent pod mounts bind mount that directory to the final pod mount path.
If no equivalent call exists in CSI, volume plugins would be forced to set up their own global mounts, and do their own reference counting when unmounting to determine if the global mount should also be unmounted (which is very error prone),
The text was updated successfully, but these errors were encountered: