Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify staging and target paths for blocks #335

Conversation

bswartz
Copy link
Contributor

@bswartz bswartz commented Nov 14, 2018

The vague wording around the staging and target paths left open too
many possible implementations for raw block volumes, and implementors
ended up doing non-obvious things. This change clarifies the contract
between the CO and the SP when staging and publishing raw block volumes
on the node.

Copy link

@lpabon lpabon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 👍

@bswartz bswartz force-pushed the clarify-target-staging-paths branch 2 times, most recently from d638e75 to af47f95 Compare November 14, 2018 19:05
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Approve

@jdef
Copy link
Member

jdef commented Nov 14, 2018

related to #287

@jdef
Copy link
Member

jdef commented Nov 14, 2018

I'm not crazy about the specificity of this PR. Though I understand that it does help to clarify things for SP authors. I'm hopeful that we can move to something more generic (in a non-breaking way) once we support something like what's been proposed in #96, where the CO has a bit more control over mount operations.

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@jdef I'm trying to understand the problems that the solution proposed in that issue is trying to solve. The discussion seemed to argue for what we have now, which is node plugins that do the mounts themselves. The only specific use case I could find (sshfs) strongly favors the current design. What would be gained by modifying the publish interface to offer another way?

@saad-ali
Copy link
Member

@bswartz this is related to another discussion where some one proposed potentially allowing CO to control the final mount. See #96 (comment)

That would be an additive change we can consider in the future (post v1). I believe @jdef is arguing that would be a way to introduce the flexibility that is potentially lost with this PR.

@jdef
Copy link
Member

jdef commented Nov 14, 2018

@bswartz AFAICT the proposal in #96 would push CSI in the direction of the CO executing the mount syscall vs. the plugin. Also suggested by the OP of the proposal was that the same mechanism could be extended to support more exotic "mounting" of volumes, such as what might be needed for VM disk images - something that #166 would like to see supported as well.

Current thinking is that we could probably make backwards compatible changes to CSI, post-v1, such that the target_staging_path and target_path mechanism could be the default/fallback, and that we could add a new "capability" somewhere that lets the CO discover plugin support for more exotic/flexible volume "mounting".

The specificity of this PR, as it's currently written, cuts both ways: it probably simplifies things for most plugins that will be written against CSI v1.0.0; it also dictates bits of implementation and I think that's where the CO owners have expressed concern. Ideally, we'd avoid over-specifying implementation in the CSI spec as long as the interface between plugin and CO is clear (which is also the point of contention here, right?). From the community discussion today, I'm not 100% convinced that this PR is the best, long-term approach.

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@saad-ali I'm just trying to figure out why anyone would want to do that. What's the advantage of allowing the CO to control the final mount vs the status quo?

Also I don't see how the version of the spec prior to my change makes it any easier to do that. The existing spec provides NO mechanism for the SP to return anything to the CO on publish, so it's implied that something like what we current do is what MUST be done.

In the future we can add an optional mechanism to return something that the CO could use to perform the mount, but SPs could never assume the CO would want to use that mechanism and therefore would have to support the current model of mounting inside the SP as the default. This language change doesn't make an addition like that any more difficult.

spec.md Show resolved Hide resolved
spec.md Outdated
// serving the request has `read` and `write` permissions to the path.
// For volumes with an access type of block, the path MUST be a file.
// For volumes with an access type of mount, the path MUST be a
// directory. The SP SHALL bind mount the volume to the target path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really have to say "bind mount" here? Isn't it enough to say "mount"?

Copy link
Contributor Author

@bswartz bswartz Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdef The problem I think many people have is that the word "mount" traditionally implies what happens when you add a new filesystem into the VFS at some directory. For block volumes, we've opted to actually bind mount a single device file. There are lots of alternatives we could have chosen, including symlinks or returning a string. Presumably there is some advantage to the bind mount approach (I wasn't around when this decision was made) or perhaps the spec was left so vague that Kubernetes just chose to use bind mounts rather than the other alternatives and that became the de-facto way to do it. This kind of open-endedness is poison for specifications because it makes it very easy for 2 different implementation that both conform to the spec to not interoperate with eachother.

I could change the wording to make it clear that the bind mount is only a requirement for block access types, and that an ordinary mount is fine for mount (filesystem) access types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand where you're coming from. Again, this is where the proposal gets a bit too specific w/ it's requirements of a plugin. I'd like to see just a tiny bit of flexibility here.

Can you replace "SHALL bind mount" with "SHOULD bind mount"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there's multiple ways to interpret "bind mount" so I'm not sure that there's too much actually gained by keeping the word "bind".

Fun read: https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still having a hard time getting behind this change request for NodePublishVolume. If a plugin doesn't support node-stage and simply wants to mknod a block device at target_path then this proposal blocks that behavior. Forcing a plugin to mknod elsewhere and then to "bind mount" it to a file that the CO is creating here seems like a mistake.

I think we should keep more neutral wording here. The CO SHALL NOT create a file system object at target_path. Plugin implementation recommendations should be made elsewhere in the spec and should avoid requirement wording like SHALL, and MUST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word SHOULD implies to me that you can not do something and then there are still some scenarios where that can lead to a good result. SPs must do something at the target path to ensure a good result. We owe it to SP implementors to either enumerate all the valid things an SP can do or describe the end result in such a way that at least a few of the valid approaches are obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on the fact that there are valid alternatives that can work. Doing a mknod at the target path makes perfect sense to me, but then we have to change the wording to say the CO MUST NOT create anything there, which is is the opposite of what Kubernetes has done and will require changing Kubernetes again (not impossible, just annoying).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way on this, @saad-ali do you feel strongly? The one outcome I don't want is to leave it unspecified which side should create the file/device because then a de-facto standard of "whatever works with kubernetes" will emerge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference to leave the spec lose enough for alternate implementations.

We owe it to SP implementors to either enumerate all the valid things an SP can do or describe the end result in such a way that at least a few of the valid approaches are obvious.

I'd prefer the later.

@jdef
Copy link
Member

jdef commented Nov 14, 2018

That would be an additive change we can consider in the future (post v1). I believe @jdef is arguing that would be a way to introduce the flexibility that is potentially lost with this PR.

To be clear, I don't think we lose flexibility to do something for #96 with the introduction of this PR. I was trying to say that we still have an escape hatch if we find out (as I think we will) that this PR is too restrictive for all the use cases that we want to support.

@saad-ali
Copy link
Member

saad-ali commented Nov 14, 2018

Let's stay focused on this PR. We can discuss the merits of #96 on that issue.

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@jdef Okay I think I understand. As long as we agree that this PR doesn't make the problem you want to solve any harder. The staging path stuff shouldn't affect that future enhancement because it's really just a clarification of what SPs are already depending on, and doesn't impose a new burden on COs. The publish target path stuff could all be overridden by a hypothetical future enhancement to the node publish call.

Copy link
Contributor

@julian-hj julian-hj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jdef that bind mount is too specific, and is not always correct. For example, in the case where the plugin is mounting a share from a remote NAS, and does not choose to implement NodeStageVolume, there is no requirement to use a bind mount. So I'd like to see bind mount replaced with mount or some additional qualification to make it apparent that we only require bind mounts for block access.

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@julian-hj For filesystem (mount access type) volumes I absolutely agree, and will change the wording. For block access type volumes, I'm okay loosening the language as long as it's still clear what the CO should or should not create, so it's clear what's left for the SP to do.

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@lpabon I'm also curious about your opinion on the CO not creating the "file" that gets bind mounted and instead requiring the SP to create a file before bind mounting the path, if it wants to bind mount rather than doing something like mknod. In this case we could simply require that the CO ensures the parent directory exists. I can't imagine any problems with that approach, but @mkimuram will need to modify his kubernetes patch to accomodate that.

@mkimura
Copy link

mkimura commented Nov 14, 2018 via email

@saad-ali
Copy link
Member

@mkimuram not mkimura

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

Apologies for misspelling the name

@mkimuram
Copy link

@bswartz @lpabon

I'm also curious about your opinion on the CO not creating the "file" that gets bind mounted and instead requiring the SP to create a file before bind mounting the path, if it wants to bind mount rather than doing something like mknod.

If you are talking about NodePublishVolume/NodeUnpublishVolume, won't this change affect the assumption that there is no difference in NodeUnpublishVolume logic between block and mount access type? I understand that we originally assume that NodeUnpublishVolume requires just unmounting from target path in both case. However, if NodeUnpublishVolume is required to do something different from unmount, it might need information on which access type is requested, which is not passed to NodeUnpublishVolume, to decide what to do for the path. (The decision to choose other than mount/unmount is up to SP, so it could be at their own risk, though.)

@bswartz
Copy link
Contributor Author

bswartz commented Nov 14, 2018

@mkimuram SPs only need to do something different if they create different things at the publish stage. If they just create a mounted directory or a bind mounted file, then blindly calling umount on the target path would be relatively safe.
If on the other hand the stuff that gets created can vary significantly, then the SP would need to detect what got created by calling stat() on the target path at unpublish time or use an external mechanism to remember what kind of volume it is.

Copy link
Member

@jieyu jieyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @bswartz!

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

The vague wording around the staging and target paths left open too
many possible implementations for raw block volumes, and implementors
ended up doing non-obvious things. This change clarifies the contract
between the CO and the SP when staging and publishing raw block volumes
on the node.

Specifically we now require that the CO creates the staging_target_path
and that it is always a directory. And we require that the SP create
the target_path and it should be a device file for blocks and a mount
point for filesystems.
@bswartz bswartz force-pushed the clarify-target-staging-paths branch from 9fd37a6 to 6620cb6 Compare November 15, 2018 00:56
Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saad-ali saad-ali merged commit ed0bb0e into container-storage-interface:master Nov 15, 2018
// block device at target_path.
// For volumes with an access type of mount, the SP SHALL place the
// mounted directory at target_path.
// Creation of target_path is the responsibility of the SP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing needs to be changed here, just leaving a note.

Since the spec never discuss the relationship between volume states (NODE_READY, VOL_READY and PUBLISHED) and node reboots, it could be argued that a PUBLISHED volume could be in one of either states after a reboot:

  1. PUBLISHED since no NodeUnpublishVolume has been called. In this case, the CO might call NodeUnpublishVolume -> NodeUnstageVolume -> NodeStageVolume -> NodePublishVolume to recover the volume. As a result, the plugin must be able to handle the first two calls when the mount point is missing in the mount table.
  2. NODE_READY since the mount table has been reset after reboot. In this case, the CO might directly call NodeStageVolume and NodePublishVolume to recover the volume. As a result, the plugin must be able to deal with the fact that target_path might have been created by itself in a previous run.

cc @jdef

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for leaving this comment. At some point I think we should consider adding a "Notes for plugin writers" section to capture things along these lines. If people aren't comfortable adding to the spec.md file then we could add a separate "notes-for-plugin-writers.md" file or something that gives additional context, examples, and other things to consider when writing plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants