-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error registering csi volume - Azure Disk #7812
Comments
Hi @carlosrbcunha! Thanks for reporting this! The error messages are definitely not great. We're hoping to improve that with #7424. But the error means that when Nomad tried to register the volume, the plugin rejected the volume definition. I see you provided the logs of the plugins, so lemme dig into that a bit more to see if I can see what the problem is. With some other plugins we've typically seen problems like the wrong ID or wrong permissions for the plugin to query the cloud provider. |
I used the same credentials for every steps up to and including creating the disk in azure with terraform as stated in the example. |
Unfortunately not. You're hitting this bit on the client. So when the Nomad client is getting a response back from the plugin, there's no error message associated with it. The portion of the CSI spec is I'm looking at the |
Can you please instruct me in how to get that log ? I am running all in one test server (server and client). I must be doing something wrong.. |
Oh! If it's the same instance of Nomad, the logs will be mixed together. We record errors for RPC to the plugins at the |
The logs Carlos sent me earlier, with the debug log level:
|
I'm working another issue with The Azure plugin isn't setting the |
After a pass thru k8s and the spec I'm seeing we are incorrectly validating this response. The spec says:
Which means that if the plugin has validated the capabilities we should be checking to make sure they match what we expect, but if the plugin doesn't validate them that's not actually an error condition. It just means the plugin doesn't care to give us a response. I might not have written the spec that way but it's definitely a bug in Nomad. Should be a straightforward fix. |
I've opened #7831 with a patch to fix the validation. |
The |
Success! This will ship in 0.11.2 shortly.
|
By the way, it looks like most of that big k8s creds block isn't needed. The jobs I ran were as follows: controller.nomad job "plugin-azure-disk-controller" {
datacenters = ["dc1"]
group "controller" {
task "plugin" {
driver = "docker"
template {
change_mode = "noop"
destination = "local/azure.json"
data = <<EOH
{
"cloud":"AzurePublicCloud",
"tenantId": "REDACTED",
"subscriptionId": "REDACTED",
"aadClientId": "REDACTED",
"aadClientSecret": "REDACTED",
"resourceGroup": "nomad-testing",
"location": "eastus",
}
EOH
}
env {
AZURE_CREDENTIAL_FILE = "/etc/kubernetes/azure.json"
}
config {
image = "mcr.microsoft.com/k8s/csi/azuredisk-csi"
volumes = [
"local/azure.json:/etc/kubernetes/azure.json",
]
args = [
"--nodeid=${attr.unique.hostname}",
"--endpoint=unix://csi/csi.sock",
"--logtostderr",
"--v=5",
]
}
csi_plugin {
id = "az-disk0"
type = "controller"
mount_dir = "/csi"
}
resources {
cpu = 500
memory = 256
}
}
}
} node.nomad job "plugin-azure-disk-nodes" {
datacenters = ["dev1"]
# you can run node plugins as service jobs as well, but this ensures
# that all nodes in the DC have a copy.
type = "system"
group "nodes" {
task "plugin" {
driver = "docker"
template {
change_mode = "noop"
destination = "local/azure.json"
data = <<EOH
{
"cloud":"AzurePublicCloud",
"tenantId": "REDACTED",
"subscriptionId": "REDACTED",
"aadClientId": "REDACTED",
"aadClientSecret": "REDACTED",
"resourceGroup": "nomad-testing",
"location": "eastus",
}
EOH
}
env {
AZURE_CREDENTIAL_FILE = "/etc/kubernetes/azure.json"
}
config {
image = "mcr.microsoft.com/k8s/csi/azuredisk-csi"
volumes = [
"local/azure.json:/etc/kubernetes/azure.json",
]
args = [
"--nodeid=${attr.unique.hostname}",
"--endpoint=unix://csi/csi.sock",
"--logtostderr",
"--v=5",
]
# node plugins must run as privileged jobs because they
# mount disks to the host
privileged = true
}
csi_plugin {
id = "az-disk0"
type = "node"
mount_dir = "/csi"
}
resources {
cpu = 500
memory = 256
}
}
}
} |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.11.1 (b434570)
Operating system and Environment details
Ubuntu 18.04
Docker 19.03.8
VM in Azure
Issue
When registering csi volume in nomad ->
Error registering volume: Unexpected response code: 500 (validate volume: Volume validation failed)
Reproduction steps
Started job for controller and node and they appeared as healthy
Job file (if appropriate)
volume register file (volume.hcl)
Controller job
Node job
There is only one example for AWS-EBS so I ported what I understood to Azure configuration.
I cannot understand what is my problem registering the volume. The error is very vague and I could not find any documentation that explained this behaviour. Can you please help me ?
Nomad Client logs (if appropriate)
Archive.zip
Nomad server logs (the only relevant entry
2020-04-27T15:33:06.188Z [ERROR] http: request failed: method=PUT path=/v1/volume/csi/mysql error="validate volume: Volume validation failed" code=500
If possible please post relevant logs in the issue.
Logs and other artifacts may also be sent to: [email protected]
Please link to your Github issue in the email and reference it in the subject
line:
Emails sent to that address are readable by all HashiCorp employees but are not publicly visible.
Nomad Server logs (if appropriate)
The text was updated successfully, but these errors were encountered: