Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad Support #40

Closed
aditsachde opened this issue Dec 28, 2020 · 28 comments · Fixed by #42
Closed

Nomad Support #40

aditsachde opened this issue Dec 28, 2020 · 28 comments · Fixed by #42

Comments

@aditsachde
Copy link
Contributor

Thank you for making this extremely useful project! This is kind of a shot in the dark, but I've been attempting to get this project working with Nomad with its CSI support. I've got the software running properly, however, Nomad does not currently support creating volumes. Rather, an existing volume has to be registered with Nomad. I was wondering if there was any way to manually create a CSI volume that I can then register with Nomad.

@travisghansen
Copy link
Member

Interesting! Got a link to their approach I can read over? I adhered very strictly to the spec and purposely left out any k8s-isms in the code base so I’m guessing this should be doable yes.

Will nomad invoke the ListVolumes method and ‘import’ anything returned then? I’d love to get it working with nomad and documented as appropriate!

@aditsachde
Copy link
Contributor Author

Nomad has a page that has some of the details of the way it handles CSI plugins. Additionally, this comment has the details of what Nomad implements. While this plugin implements the spec properly, Nomad currently has not implemented the parts that talk to the controller, and is why the volumes have to be manually created.

At the moment, the best way to provide Nomad support is probably a Terraform plugin that can create a CSI volume, as Terraform can then handle registering the volume with Nomad automatically. However, this seems like something that is understandably out of scope of this project.

I have working Nomad job files to deploy the plugin and am happy to clean them up and provide them as part of the documentation.

@travisghansen
Copy link
Member

Got it! Let me read a bit more thoroughly. Does your deployment setup currently deploy the controller as well (it looks like they do support/use/invoke the publish/unpublish bits but seemingly not create).

If you do deploy the controller it’s hypothetically possible to expose the controller via grpc over tcp instead of unix socket (or rather, in addition to). At that point you could invoke the create methods with terraform or whatever to handle that aspect (you would want to secure that for sure in some shape or form).

@travisghansen
Copy link
Member

Ok, I’m up to speed here. Without doing the volume provisioning aspect using this driver with nomad as it stands today doesn’t really provide much/anything.

Each of the above could just as easily provide the features you would get from this with the current state of nomad support.

However, if you want to prepare yourself to seamlessly work with the hypothetical future enhancements to nomad with the volumes you’re provisioning now, then using this driver from the beginning would make that path seamless. We’ll get some samples here for you to review of what creation would look like and subsequently how to configure nomad after creating a volume. Are you initially interested in iscsi or nfs and are you using truenas or ZoL?

@aditsachde
Copy link
Contributor Author

I'm currently using truenas and nfs. I've already got somewhat of a working setup with nfs, I was just interested in seeing if I could get this working. I've found a couple libraries, so I'll take a look at potentially putting together a generic csi terraform provider. I've gone ahead and attached the nomad job files just incase anyone else comes across this and tries to get it running with nomad.

Controller:

job "storage-controller" {

  datacenters = ["dc1"]
  type = "service"

  group "controller" {
    task "controller" {
      driver = "docker"

      config {
        image = "democraticcsi/democratic-csi:latest"

        args = [
          "--csi-version=1.2.0",
          "--csi-name=org.democratic-csi.nfs",
          "--driver-config-file=/config/driver-config-file.yaml",
          "--log-level=debug",
          "--csi-mode=controller",
          "--server-socket=/csi-data/csi.sock",
        ]

        volumes = [
                "config/driver-config-file.yaml:/config/driver-config-file.yaml"
        ]
      }

      csi_plugin {
        id        = "truenas"
        type      = "controller"
        mount_dir = "/csi-data"
      }

      template {
        destination = "config/driver-config-file.yaml"
        data = <<EOH
insert config file here
EOH
      }

      resources {
        cpu = 30
        memory = 50
      }
    }
  }
}

Node:

job "storage-node" {

  datacenters = ["dc1"]
  type = "system"

  group "node" {
    task "node" {
      driver = "docker"

      config {
        image = "democraticcsi/democratic-csi:latest"

        args = [
          "--csi-version=1.2.0",
          "--csi-name=org.democratic-csi.nfs",
          "--driver-config-file=/config/driver-config-file.yaml",
          "--log-level=debug",
          "--csi-mode=node",
          "--server-socket=/csi-data/csi.sock",
        ]


        volumes = [
                "config/driver-config-file.yaml:/config/driver-config-file.yaml"
        ]

        privileged = true
      }

      csi_plugin {
        id        = "truenas"
        type      = "node"
        mount_dir = "/csi-data"
      }

      template {
        destination = "/config/driver-config-file.yaml"
        data = <<EOH
insert config file here
EOH
      }

      resources {
        cpu = 30
        memory = 50
      }
    }
  }
}

@travisghansen
Copy link
Member

Yeah perfect. I’ll send over some examples of creating a volume and then subsequently how you would format things in nomad to consume/attach (or at least how I think it would be based on the examples I’ve seen floating around).

@aditsachde
Copy link
Contributor Author

Sounds good. Registering a volume with nomad is documented here, and it seems like its mostly just the parameters from the CSI spec formatted as HCL, however, I am not very familiar with the CSI.

@travisghansen
Copy link
Member

I think in the case of how this project does it we’ll need mount_opts, parameters, and context. Secrets may be necessary if running iscsi with chap, but otherwise generally not.

@aditsachde
Copy link
Contributor Author

I'm having a bit of trouble getting the controller to actually listen on a tcp socket. I tried "--server-socket=tcp://0.0.0.0:9000 but there seems to be some permissions issue. Excluding the tcp:// part results in it being interpreted as a unix socket.

  Tail  
E1228 04:39:36.209972013       1 server_chttp2.cc:40]        {"created":"@1609130376.209942730","description":"No address added out of total 1 resolved","file":"../deps/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1609130376.209939300","description":"Failed to add port to server","file":"../deps/grpc/src/core/lib/iomgr/tcp_server_custom.cc","file_line":404,"referenced_errors":[{"created":"@1609130376.209933845","description":"Failed to bind","file":"../deps/grpc/src/core/lib/iomgr/tcp_uv.cc","file_line":80,"grpc_status":14,"os_error":"permission denied"}]}]}

@travisghansen
Copy link
Member

That arg is specifically for the socket. Use —server-address and —server-port.

@aditsachde
Copy link
Contributor Author

Thank you for all your help! Its getting quite late where I am, but I did manage to get some stuff working, and it does seem like Nomad support is possible!

Using this cli, I was able to create a new volume and get the following response back.

csc -e tcp://host:port controller create-volume --req-bytes 1000000 test

"test"	1000000	"node_attach_driver"="nfs"	"provisioner_driver"="freenas-nfs"	"server"="freenas.int.domain.tld"	"share"="/mnt/data/nomad/vols/test"

I then tried to register that volume in nomad using the following config and it worked perfectly!

id = "test"
name = "test"
type = "csi"
external_id = "test"
plugin_id = "truenas"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "nfs"
}
context {
    node_attach_driver="nfs"
    provisioner_driver="freenas-nfs"
    server="freenas.int.domain.tld"
    share="/mnt/data/nomad/vols/test"
}

Couple notes:

  1. Democratic-csi was able to bind to a unix socket and a tcp socket, which is extremely useful, as nomad does communicate with the controller to make sure it is healthy via a unix socket.
  2. I'm not sure if democratic-csi supports any access_mode other than single-node-writer. Maybe multi-node-single-writer, multi-node-reader-only, or single-node-reader-only?
  3. Additional mount flags can be defined. When using Kubernetes, are any of them generally used, or is it usually just default?

After going through the steps, I don't think there is any need to have any terraform provisioner or anything else. Nomad will eventually have the ability for native volume provisioning, so a simple set of shell scripts or similar is a good enough stop gap.

I'm going to take all the discussion here and turn it into a proper How-To guide on using democratic-csi with Nomad in the next couple of days, so that future users can make use of this awesome project!

@aditsachde aditsachde changed the title Manually provision a volume / Nomad Support Nomad Support Dec 28, 2020
@travisghansen
Copy link
Member

Awesome! It looks like you mapped that out perfectly! The helm chart examples are actually the best source of doc on this front currently:

Regarding 1:

  • yes, it’s very helpful indeed
  • the container very purposely includes socat which could be very helpful depending on deployment scenarios
    • for example, you could run the controller entirely externally and use socat to bind a socket in the container and bridge to controller running externally in reality

Regarding 2:

Regarding 3:

I’ll review the binary to create the volumes to see if it’s flexible enough to handle the common use cases and if not we could include a binary/client in this project as well which could easily be invoked using docker.

@aditsachde
Copy link
Contributor Author

The csc utility supports all the CSI operations, including creating, deleting, and expanding volumes, as well as the snapshot operations. Since Nomad has plans to add volume management capabilities in the future and everything seems to be working well, I don't see any reason to spend time adding Nomad specific utilities to this project.

I've gone ahead and created a PR to add Nomad docs, and once Nomad supports volume management, I'll update them. They aren't too great, but they should make it easier for anyone who wants Nomad support right now.

Everything seems to be working pretty well, thank you for all your help!

@travisghansen
Copy link
Member

hashicorp/nomad#8212

@mister2d
Copy link
Contributor

@aditsachde would you mind posting a gist example of your Nomad config for NFS? I'm having a bit of trouble piecing together one from the k8s examples in this project. I have already created an NFS share. Thanks!

@aditsachde
Copy link
Contributor Author

aditsachde commented Aug 19, 2021

Hey, I'd written up some docs here that has some examples.

Unfortunately, I was experiencing some stability issues with nomad and didn't really like certain aspects, so I am no longer using it.

@mister2d
Copy link
Contributor

Hey, I'd written up some docs here that has some examples.

Unfortunately, I was experiencing some stability issues with nomad and didn't really like certain aspects, so I am no longer using it.

Yes I did see the job spec in the docs but I'm still a bit mystified on the actual config yaml.

@travisghansen
Copy link
Member

What can I help with on the config yaml? You need help with the actual contents or how to get the file injected to a container properly etc?

@dkowis
Copy link

dkowis commented Aug 29, 2021

I have a question regarding the config yaml for an iscsi node job. What goes in the driver-config-file.yaml for a node job for an iscsi setup?

I can't figure that part out. I was able to set it up for NFS, but I'd like to have both options, as iscsi works better in some cases.

@travisghansen
Copy link
Member

Use exactly the same files as you use for the controller aspect. Assuming FreeNAS you can base it off of this: https://github.com/democratic-csi/democratic-csi/blob/master/examples/freenas-iscsi.yaml

@dcarbone
Copy link
Contributor

dcarbone commented Nov 30, 2021

Sorry to keep posting in a closed issue, however I'm facing what I believe to be a similar issue to @dkowis.

Before launching into a full diatribe given that Nomad support is tertiary at best, this is what i see when executing GetCapabilities against a Node container:

&{type:STAGE_UNSTAGE_VOLUME }
&{type:GET_VOLUME_STATS }
&{type:EXPAND_VOLUME }

The following is known-good (i think, ha):

  • dcsi controller is correctly creating the following:
    • zfs zvol
    • iscsi target
    • iscsi extent
  • nomad recognizes the following:
    • dcsi plugin registered with controller and appropriate count of nodes
    • volume registered and associated with above dcsi plugin
      • volume caps are as follows: --cap 1,2,xfs --cap 2,2,xfs or, more legibly, --cap SINGLE_NODE_WRITER,mount,xfs --cap SINGLE_NODE_READER_ONLY,mount,xfs
    • volume health is "Schedulable"

Some basics of the env:

  • Using same config between controller and node
    • this config file is injected into the container using an allocation-local template declaration with an inline definition of the config file itself.
    • the instance_id value is guaranteed to be unique between allocations as it is based off a combination of node_id and job_name, resulting in a value like {node_uuid}-csi-iscsi-controller
  • Single controller container, one node container per host
    • 3x arm64
    • 2x x86_64

However the node processes never seem to actually claim a volume. I'm very positive that I'm missing some small stupid detail, and i suspect it has to do with the, what seems to me, truncated list of capabilities returned by a node when queried using csc (I forked to update csc with the latest csi spec, but I am not actually leveraging any of the 1.5.0 features)

I'm very new to attempting to utilize CSI plugins in general, and also don't spend a ton of time in more modern javascript repos, so i'm having a bit of trouble determining where exactly the list of capabilities is being built. I see lots of references to options.service.node.capabilities but I'm not entirely certain what actually defines this, what I assume to be, array.

Any help would be appreciated. I may eventually let nomad go in favor of k8s, but it'd be great if I could get this going and perhaps flesh out / help flesh out the nomad documentation more.

Another thought: Do I need to mount host iscsi resources into the container? I see lots of references to conditional mounts in the charts repo.

@travisghansen
Copy link
Member

@dcarbone welcome! Maybe let's open another issue with the details to discuss further.

Unfortunately Nomad is not my forte so I'm at a bit of a loss honestly. That's compounded by the fact that Nomad has been a bit of a moving target over the last year or so as they've more fully implemented the csi spec.

The node capabilities mentioned above are correct (for iscsi) but the logic can be found here: https://github.com/democratic-csi/democratic-csi/blob/master/src/driver/controller-zfs-ssh/index.js#L110 (it's super-advanced stuff but I actually have ways that operators (ie: yourself) can override what is advertised but there should be no reason here). I'm interested in knowing why you think that list is truncated?

Regarding host mounts, yes there are several changes that are slightly different using iscsi vs nfs. I highly suggest reading the conversation over here #111 (and maybe just using that issue to continue the conversation if it makes sense) as it gives some details on running with iscsi.

On my end, I think it's time I setup a single node Nomad install for testing and familiarizing myself. Perhaps a fresh set of eyes looking at the latest/greatest will help get it lifted off the ground and documented more appropriately. A very positive take-away as it relates to Nomad is the driver(s) are very staunchly csi compliant without any k8s-isms (I was adamant about this point, many 'csi' drivers have hard dependencies on k8s).

@dcarbone
Copy link
Contributor

dcarbone commented Dec 1, 2021

@travisghansen that sounds fine with me! My suspicion is based entirely on ignorance, heh. I figured i'd see something about mounting, but I'm pretty sure that's just me conflating node and volume capabilities.

I'll review that issue and if i can find something more tangible i'll open another ticket.

Thanks!

@travisghansen
Copy link
Member

Sounds good. I'm going to attempt a single-node Nomad install now from scratch and see how far I get :)

Yes node and volume caps are distinct concepts. I don't think that particular thing is your issue.

@AidanHanda
Copy link

Running into some issues with nfs-client. I'm trying to create a volume like @aditsachde did above with:

csc -e tcp://host:port controller create-volume --req-bytes 1000000 test

but am getting the error missing volume_capabilities. I was curious about that because when I run csc -e tcp://host:port controller get-capabilities I get:

&{type:CREATE_DELETE_VOLUME }
&{type:CREATE_DELETE_SNAPSHOT }
&{type:CLONE_VOLUME }

which makes me feel as though the controller is able to create volumes. Any advice?

@dcarbone
Copy link
Contributor

dcarbone commented Feb 8, 2022

@AidanHanda you need to specify the capabilities of the volume itself :)

Have a look at the source here: https://github.com/rexray/gocsi/blob/master/csc/cmd/controller_create_volume.go#L34

An example would be:

csc -e tcp://host:port controller create-volume --cap SINGLE_NODE_WRITER,mount,xfs

@AidanHanda
Copy link

Thanks @dcarbone! Obviously 🤦 !

@travisghansen
Copy link
Member

Note that with recent versions of Nomad csc is no longer required and you can manage everything end to end with Nomad-native assets/commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants