Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: failed to setup alloc: pre-run hook "csi_hook" #7568

Closed
rkitron opened this issue Mar 31, 2020 · 5 comments
Closed

CSI: failed to setup alloc: pre-run hook "csi_hook" #7568

rkitron opened this issue Mar 31, 2020 · 5 comments

Comments

@rkitron
Copy link

rkitron commented Mar 31, 2020

Good evening,
During this crisis .. I spent some time to try the new CSI feature :)

Nomad version

Nomad v0.11.0-beta1 (a7a7d12)

Operating system and Environment details

Linux ip-172-31-1-210 4.19.0-8-cloud-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

Nomad volume

root@ip-172-31-1-210:~# nomad volume status block1
ID                   = block1
Name                 = block1
External ID          = vol-0bce339fcfbb53d8d
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

Issue

I follow this to setup the EBS CSI: https://learn.hashicorp.com/nomad/stateful-workloads/csi-volumes#deploy-the-ebs-plugin

Reproduction steps

Nothing special, juste run and the issue appear.
At the very begining of the deployment, I can see in the AWS console that the EBS volume is attached, then detached few seconds later.
I had also the same issue with scaleway, so I don't know what is wrong ? Maybe I should the CSI at the host level ?

Job file (if appropriate)

job "httpd1" {
  datacenters = ["dc1"]

  group "httpd" {
    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }
    volume "block1" {
      type      = "csi"
      read_only = false
      source    = "block1"
    }

    task "httpd" {
      driver = "docker"

      volume_mount {
        volume      = "block1"
        destination = "/srv"
        read_only   = false
      }

      config {
        image = "httpd:2.4.41-alpine"
      }
    }
  }
}

Nomad Client logs (if appropriate)

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Internal desc = Failed to find device path /dev/xvdba. nvme path "/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_block1" not found

Nomad Server logs (if appropriate)

out.log

Thanks a lot for your time,

@tgross
Copy link
Member

tgross commented Mar 31, 2020

Hi @benoitmenard! Thanks for trying out CSI! The error you're getting is bubbling up from the CSI node plugin, so if you can get the allocation logs for that plugin via nomad alloc logs -stderr :alloc_id, that would probably help figure out what the issue is.

(One of the things I've found is that it's hard to get the internal plugin errors up into the Nomad logs. I've got #7424 open to improve that observability story.)

@rkitron
Copy link
Author

rkitron commented Mar 31, 2020

Thanks for your help,
I think it something related to this #7302
When I change the volume to this:

# volume registration
type = "csi"
id = "vol-0bce339fcfbb53d8d"
name = "block2"
external_id = "vol-0bce339fcfbb53d8d"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "aws-ebs0"

And the job to this:

job "httpd" {
  datacenters = ["dc1"]

  group "webserver" {
    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }
    volume "vol-0bce339fcfbb53d8d" {
      type      = "csi"
      read_only = false
      source    = "vol-0bce339fcfbb53d8d"
    }

    task "httpd1" {
      driver = "docker"

      volume_mount {
        volume      = "vol-0bce339fcfbb53d8d"
        destination = "/srv"
        read_only   = false
      }

      config {
        image = "httpd:2.4.41-alpine"
      }
    }
  }
}

It works :)

@tgross tgross added this to the 0.11.0 milestone Apr 2, 2020
@tgross
Copy link
Member

tgross commented Apr 2, 2020

Thanks for the update on that @benoitmenard, and glad to see you've got things working.

It looks like in #7326 we made it so that if the external ID is set, we'd use it as the ID of the volume when we talk to plugins. So the first way you had it should have worked. @langmartin or I will see if we can reproduce the behavior and pull some logs from the EBS plugin to double-check.

@tgross
Copy link
Member

tgross commented Apr 2, 2020

I was able to confirm that using the IDs as you've did here originally should have worked. Looking at the server logs, it looks like you ran into a flaky bug around unmounting on the node. So the ID change was just coincidence... it happened to have worked the second time.

Mar 31 19:54:24 ip-172-31-1-210 nomad[4724]:         * rpc error: code = Internal desc = Could not unmount "/csi/per-alloc/e19145ec-e5fe-9bd8-ae3e-2a1aebe39352/block1/rw-file-system-single-node-writer": unmount failed: exit status 32
Mar 31 19:54:24 ip-172-31-1-210 nomad[4724]: Unmounting arguments: /csi/per-alloc/e19145ec-e5fe-9bd8-ae3e-2a1aebe39352/block1/rw-file-system-single-node-writer
Mar 31 19:54:24 ip-172-31-1-210 nomad[4724]: Output: umount: /csi/per-alloc/e19145ec-e5fe-9bd8-ae3e-2a1aebe39352/block1/rw-file-system-single-node-writer: no mount point specified.

That should be fixed by the client-side changes in #7596 which I'm working on getting merged in now. For now I'm going to close this as effectively a dupe of #7180. But thanks so much for giving CSI a try... I'm really excited to have people try it out!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants