Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: unpublish workflow ID mismatches #7626

Closed
wants to merge 3 commits into from

Conversation

tgross
Copy link
Member

@tgross tgross commented Apr 4, 2020

Fixes for #7628

  • The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID.
  • The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.

@tgross tgross changed the title csi: use internal ID for node unpublish csi: unpublish workflow ID mismatches Apr 4, 2020
@tgross tgross force-pushed the csi_use_internal_id_for_node_unpublish branch 3 times, most recently from e8a8fd1 to 4484a5f Compare April 4, 2020 18:49
@tgross tgross marked this pull request as ready for review April 4, 2020 19:18
@tgross tgross requested a review from langmartin April 4, 2020 19:18
@tgross
Copy link
Member Author

tgross commented Apr 4, 2020

I want to do some refactoring after the release with some of these names to disambiguate them to prevent this sort of error in the future, but for now this will do the job.

}
targetCSIInfo, ok := targetNode.CSINodePlugins[args.plug.ID]
if !ok {
return args.nodeClaims, fmt.Errorf("Failed to find NodeInfo for node: %s", targetNode.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this failure could be temporary, will we retry this again if the node plugin is in the process of coming back up? Also, is the node CSI id constant when the plugin comes back up or is it generated by the node plugin everytime it starts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC job will get requeued (which works fine for us with #7632 where the Job.Deregister will use the same GC path).

And as far as I can tell the CSI Node ID is implementation-specific. But in practice it has to be a fixed identifier of the underlying host. So for example, in the EBS case it's the EC2 instance ID, because if it wasn't the controller wouldn't be able to attach the external volume to it.

@tgross
Copy link
Member Author

tgross commented Apr 5, 2020

#7632 includes all these changes, so we can either merge that in to pick these all up or merge this separately and rebase it from master once that's done.

@tgross
Copy link
Member Author

tgross commented Apr 6, 2020

Closing as these changes were rolled into #7632

@tgross tgross closed this Apr 6, 2020
@tgross tgross deleted the csi_use_internal_id_for_node_unpublish branch April 7, 2020 12:21
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants