Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor handling of "nomad volume status -verbose" for CSI plugins without LIST_VOLUMES capability #15040

Open
ejweber opened this issue Oct 25, 2022 · 1 comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/bad-ux theme/storage

Comments

@ejweber
Copy link
Contributor

ejweber commented Oct 25, 2022

Nomad version

Nomad v1.3.3 (428b2cd)
Tested with this version but v1.4.1 code has the same issues (and is linked below).

Operating system and Environment details

Red Hat Enterprise Linux release 8.4 (Ootpa)

Issue

The BeeGFS CSI Driver does not support the CSI LIST_VOLUMES capability. When nomad volume status -verbose is run, a raw HTTP error code and the message "unimplemented for this plugin" are shown.

Reproduction steps

  • Deploy the BeeGFS CSI driver (or some other driver that doesn't support the LIST_VOLUMES capability) to a Nomad cluster.
  • Run nomad volume status -verbose

Expected Result

Similar output to when nomad volume status is run, perhaps with an additional message indicating specifically that LIST_VOLUMES is not implemented or that listing external volumes is not supported by this plugin.

Actual Result

Similar output to when nomad volume status is run, followed by an raw HTTP error code and the message "unimplemented for this plugin".

Nomad command output

webere@webere-dev:~/beegfs-csi-briver$ nomad volume status
Container Storage Interface
ID                 Name               Plugin ID          Schedulable  Access Mode
beegfs-csi-volume  beegfs-csi-volume  beegfs-csi-plugin  true         multi-node-multi-writer

webere@webere-dev:~/beegfs-csi-briver$ nomad volume status --verbose
Container Storage Interface
ID                 Name               Plugin ID          Schedulable  Access Mode
beegfs-csi-volume  beegfs-csi-volume  beegfs-csi-plugin  true         multi-node-multi-writer
Error querying CSI external volumes for plugin "beegfs-csi-plugin": Unexpected response code: 500 (rpc error: unimplemented for this plugin)

Code deep dive

This only occurs because we execute "nomad volume status" with BOTH "-verbose" AND no specified ID.

No specified ID invokes "list mode":

https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L20-L23

No "-verbose" returns early:

https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L104-L106

Both conditions together ultimately result in the invocation of the ListVolumes RPC (and the documented failure message):

https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L128-L136

A Nomad server returns a 500 error code and the message "unimplemented for this plugin" when it recognizes that a driver doesn't advertise the LIST_VOLUMES capability.

https://github.com/hashicorp/nomad/blob/v1.4.1/nomad/csi_endpoint.go#L1189-L1191

We are supposed to see something like this for drivers that support the LIST_VOLUMES capability. Nomad already quietly moves on without printing the external list portion if it thinks nothing went wrong and doesn't have a list of volumes.

https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L137-L140

It seems like we just need a way for Nomad to understand to move on quietly in this circumstance as well. Maybe the Nomad server should just return an empty list instead of a 500 error when the LIST_VOLUMES capability isn't supported? Maybe the Nomad command should know to look for "unimplemented for this plugin" in the error output and move on quietly (or optionally print something more informative)?

@tgross
Copy link
Member

tgross commented Oct 26, 2022

Hi @ejweber! That's definitely not great UX in retrospect. The API error code is an unfortunate byproduct of the way we translate from RPC calls to HTTP API. I'm not sure I'd want to return an empty list here, which would confuse the cases of "this can never work" and "there's no results to be had". But there's no reason the CLI can't present that in a nicer way.

I'll mark this for roadmapping to clean it up on the CLI side of things.

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Oct 26, 2022
@tgross tgross removed their assignment Oct 26, 2022
@tgross tgross moved this to Needs Roadmapping in Nomad - Community Issues Triage Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/bad-ux theme/storage
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants