Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] CSI Spec Volume Creation #8212

Closed
sbouts opened this issue Jun 19, 2020 · 13 comments
Closed

[Feature Request] CSI Spec Volume Creation #8212

sbouts opened this issue Jun 19, 2020 · 13 comments
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/storage type/enhancement

Comments

@sbouts
Copy link

sbouts commented Jun 19, 2020

Currently, HashiCorp Nomad implements a subset of the CSI spec found here. This feature request is for the implementation of the volume related methods.

At the bottom of this feature request I have included an overview of the CSI spec methods as well as the ones Nomad has currently implemented.

Usecase

Ceph is an Open Source scalable distributed storage system that is widely adopted among enterprises and strongly liked for its scalability and performance.

Implementing the missing CSI spec methods would enable the use of the ceph-csi driver in Nomad to dynamically provision persistent volumes hosted on Ceph for Nomad workloads.

This would bring Nomad on par with other CO like Kubernetes and Mesos.

The Problem

There have been attempts by the community to try and get the ceph-csi <==> Nomad communication to work.

The problem boils down to Nomad not having fully implemented the CSI spec methods for all the Controller capabilities and Node capabilities the ceph-csi driver (and others) offers. The ceph-csi driver expects volumes to be dynamically provisioned and mounted on the fly (see here) and not being created beforehand and explicitly put into e.g. the Nomad Volume Specification.

Ceph-csi Capabilities

The ceph-csi driver currently implements the following CSI Spec capabilities:

Note: The CSI methods for RBD and CephFS may differ in implementation, but adhere to the same CSI specification.

Ceph component CSI Type CSI Capabilities Related CSI Methods Implemented by Nomad?
RBD Controller CREATE_DELETE_VOLUME CreateVolume
DeleteVolume
RBD Controller CREATE_DELETE_SNAPSHOT CreateSnapshot
DeleteSnapshot
RBD Controller CLONE_VOLUME CreateVolume
RBD Controller EXPAND_VOLUME ControllerExpandVolume
RBD Node STAGE_UNSTAGE_VOLUME NodeStageVolume ✔️
NodeUnstageVolume ✔️
RBD Node GET_VOLUME_STATS NodeGetVolumeStats
RBD Node EXPAND_VOLUME NodeExpandVolume
Ceph component CSI Type CSI Capabilities Related CSI Methods Implemented by Nomad?
CephFS Controller CREATE_DELETE_VOLUME CreateVolume
DeleteVolume
CephFS Controller EXPAND_VOLUME ControllerExpandVolume
CephFS Node STAGE_UNSTAGE_VOLUME NodeStageVolume ✔️
NodeUnstageVolume ✔️
CephFS Node GET_VOLUME_STATS NodeGetVolumeStats

The Fix

The easier part of the fix is the actual (dummy) implementation of the missing CSI methods in the Nomad client.

The harder part of the fix is how to implement a logical flow in e.g. the Job Stanza and the Volume Specification to service both the creation and mounting of NEW volumes through the CreateVolume API (like ceph-csi expects it) as well as the mounting of EXISTING volumes (like this Nomad example).

Of course I would be more than willing to give a shot at helping to implement the missing methods. Getting the flow right from HCL to the mounting of volumes in Nomad workloads ties more into the core of Nomad and I feel that should be left up to you guys to implement.

CSI Spec Method Overview And Current Nomad Adoption

Identity Method Implemented? Nomad Link
GetPluginInfo ✔️ GetPluginInfo
GetPluginCapabilities ✔️ GetPluginCapabilities
Probe ✔️ Probe
Controller Method Implemented? Nomad Link CSI Spec Link
CreateVolume CreateVolume
DeleteVolume DeleteVolume
ControllerPublishVolume ✔️ ControllerPublishVolume
ControllerUnpublishVolume ✔️ ControllerUnpublishVolume
ValidateVolumeCapabilities ✔️ ValidateVolumeCapabilities
ListVolumes ListVolumes
GetCapacity GetCapacity
ControllerGetCapabilities ✔️ ControllerGetCapabilities
CreateSnapshot CreateSnapshot
DeleteSnapshot DeleteSnapshot
ListSnapshots ListSnapshots
ControllerExpandVolume ControllerExpandVolume
ControllerGetVolume ControllerGetVolume
Node Method Implemented? Nomad Link CSI Spec Link
NodeStageVolume ✔️ NodeStageVolume
NodeUnstageVolume ✔️ NodeUnstageVolume
NodePublishVolume ✔️ NodePublishVolume
NodeUnpublishVolume ✔️ NodeUnpublishVolume
NodeGetVolumeStats NodeGetVolumeStats
NodeExpandVolume NodeExpandVolume
NodeGetCapabilities ✔️ NodeGetCapabilities
NodeGetInfo ✔️ NodeGetInfo
@tgross
Copy link
Member

tgross commented Jun 22, 2020

Hi @sbouts, thanks for opening this issue! Yes as you've noted we didn't implement the volume creation workflow. We had this slated for a potential Phase 2 of work but I don't think we realized this was a blocker for using plugins like Ceph.

As you've noted here, there's a good bit of work required to make this happen, so I'm going to tag my colleagues @galeep and @yishan-lin to make sure it's on their radar as some of our planning gets solidified.

@ryanmickler
Copy link
Contributor

This feature request from on the terraform nomad provider is also relevant for this work.
hashicorp/terraform-provider-nomad#102

@tgross
Copy link
Member

tgross commented Jul 8, 2020

Thanks for the link @ryanmickler. That feature request is to cover the existing Nomad registration APIs, not the CSI APIs being discussed here.

@tgross tgross added this to the unscheduled milestone Jul 9, 2020
@ryanmickler
Copy link
Contributor

Is there a seperate issue to implement the parameters block, not the entire volume creation spec? Currently, that's all thats limiting use of the ceph-csi plugin

see here:
#7668 (comment)

@tgross
Copy link
Member

tgross commented Jul 28, 2020

@ryanmickler reading that comment and then digging into container-storage-interface/spec#387 it doesn't look like this is part of the CSI spec yet for the RPCs we support, is it?

@ryanmickler
Copy link
Contributor

A quick inspection looks like NodeStageVolume (mounts the volume to a staging path on the node.)
https://github.com/ceph/ceph-csi/blob/47d5b60af8d48574ff6d11ca37dbff5a6f56815b/internal/rbd/nodeserver.go#L116
is calling genVolFromVolumeOptions on line 171
https://github.com/ceph/ceph-csi/blob/47d5b60af8d48574ff6d11ca37dbff5a6f56815b/internal/rbd/nodeserver.go#L171
Then inside that function, we are hitting our missing required parameter pool error here:
https://github.com/ceph/ceph-csi/blob/be9e7cf956c378227ff43e0194410468919766b7/internal/rbd/rbd_util.go#L694

@tgross
Copy link
Member

tgross commented Jul 28, 2020

Support for those fields were added in #7957

@ryanmickler
Copy link
Contributor

ryanmickler commented Jul 28, 2020

right, perhaps those parameters arent getting passed to NodeStageVolume properly. Because pool = "<value>" is set in my config, and I still get:
E0728 04:47:20.003465 1 utils.go:163] ID: 23 Req-ID: csi-test-0 GRPC error: rpc error: code = Internal desc = missing required parameter pool
Maybe i need in investigate further. thanks for the help and your work on this!

@tgross
Copy link
Member

tgross commented Jul 28, 2020

@ryanmickler we should dig into this but let's not clutter up this feature request issue with debugging that. Can you open a new issue and include your volume spec and any relevant logs?

@tgross
Copy link
Member

tgross commented Mar 5, 2021

Just an update on this long-awaited feature, I'll be working on this over the next few weeks.

@travisghansen
Copy link

As a csi driver author I had someone use cli tools to create the volumes etc and then copy responses into nomad to work that way. It’s not ideal but functional. Great to hear this is being worked on and if any questions arise happy to collaborate with the experience I have from the other side of the equation.

@tgross
Copy link
Member

tgross commented Apr 7, 2021

Just a heads up that I've split ExpandVolume, GetCapacity, and NodeGetVolumeStatus out to their own issues #10324 and #10325, because they're likely to miss 1.1.0 and have to get shipped in a patch version shortly thereafter. Once the create/snapshot E2E testing is done (#10322) and I've updated the API docs, this issue will be closed and shipped for Nomad 1.1.0 (public beta a few weeks out yet).

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/storage type/enhancement
Projects
None yet
Development

No branches or pull requests

4 participants