-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad should not create the target_path passed into NodePublishVolume #8358
Comments
Thanks for opening this @timoreimann !
Totally agreed. Honestly, this makes life a little bit easier on us anyways as this has been a source of difficulty when handling errors. It's easy enough to remove, but I'll want to verify that the other common SPs that we test with are also doing the right thing per the spec as the DO provider does. |
@tgross I am testing the Openstack Cinder CSI plugin right now and just ran into this exact same issue.
I'd say not creating the mount points for block is the correct course of action here. If needed, I can provide replication steps for recreating the issue using the cloud-provider-openstack cinder CSI driver . |
Yup, agreed. It's definitely on our stack of CSI bug fixes before we can mark CSI as GA. |
Hey guys, is there any pullreq/patch to test or any workaround, that I can use? |
CSI appears to be nonfunctional on multiple platforms. I have tried Cinder, Longhorn, Rook, and MooseFS, all getting similar errors to @rigrassm. Experimentation with CSI is not possible in the current state, is there a workaround coming soon? |
It's on our stack of CSI bug fixes before we can mark CSI as GA. I can't give you an exact timeline other than that it's a high priority for the team and for me personally. |
Had a chance to take a first pass at this today. The code in question is
So there's two bugs here:
The bug fixes themselves are pretty trivial so most of the work to do is testing to make sure there's no regressions from that. I'll tackle that tomorrow. |
I've opened #8505 with the patch but need to complete regression testing. |
@tgross awesome! I'll get a build with the fix running tomorrow and test out the cinder CSI driver. |
Looking good on DO as well:
|
Ok, I've merged that patch and it'll ship in 0.12.2. (Just missed 0.12.1 that went out today, sorry folks!) |
@tgross No worries, I'll apply patch on the top of 0.12.1 and wait. Thanks for the fix. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
👋
Nomad version
Unclear as I am filing this issue on behalf of a user running Nomad on DigitalOcean infrastructure.
Operating system and Environment details
OS: most likely Linux
Environment: DigitalOcean VMs (aka Droplets)
Issue
A user of DigitalOcean’s CSI driver (which I happen to maintain) opened a support ticket with DigitalOcean saying that they could not use volumes managed by a self-hosted Kubernetes cluster running on DigitalOcean infrastructure. They used DigitalOcean’s CSI driver in version v1.3.0 and saw the following error when our
NodePublishVolume
implementation was invoked (pardon the output format -- we employ structured logging in our driver):The most important part is the error message:
What is happening here is that the driver was asked to use a volume in raw block mode. The way this is implemented in our driver is that we create a file under the target path and afterwards bind-mount the device into it. The former step fails with Nomad as the CO (container orchestrator) because the target path already exists as a directory.
Here’s what the spec has to say about the
target_path
parameter onNodePublishVolume
:Note these specific parts of the description:
I reached out to the SIG Storage folks on the Kubernetes Slack (who are involved with the spec work) to confirmed that the spec in this regard should be interpreted as “the CO MUST NOT create the target_path”. So it looks like Nomad might be doing too much here.
I briefly considered changing our CSI driver to address this scenario by deleting and recreating the target path. However, I felt somewhat hesitant on this move as it seems a bit invasive / disruptive to me and could mess with the expectations that the CO may have. Overall, I am more inclined to resolving the matter in a fashion that can be considered most compliant with the spec.
That said, I'm more than happy to discuss options if you feel this should be tackled in any particular way.
Thanks!
Reproduction steps
Unfortunately I'm not familiar enough with Nomad to reproduce things on my own. My guess is that the problem should be reproducible though by creating a Nomad cluster on DigitalOcean infrastructure and trying to use a volume in block access mode.
The text was updated successfully, but these errors were encountered: