Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: retry controller client RPCs on next controller #8561

Merged
merged 1 commit into from
Aug 6, 2020

Conversation

tgross
Copy link
Member

@tgross tgross commented Jul 29, 2020

Part of a fix for #8080, #8100, #8232 as summarized in #8232 (comment). Will help mitigate #8285, #8145, #8057. (1 of 4 PRs)

The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers before giving up, and adds tests
that exercise the client RPC routing and retries.

@tgross tgross force-pushed the b-csi-controller-retries branch from 9e06549 to 1c0e229 Compare July 30, 2020 12:41
The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers on ready nodes before giving up,
and adds tests that exercise the client RPC routing and retries.
@tgross tgross force-pushed the b-csi-controller-retries branch from 1c0e229 to d2b3ab3 Compare July 30, 2020 20:10
@tgross tgross marked this pull request as ready for review July 31, 2020 17:18
@tgross tgross requested a review from langmartin July 31, 2020 17:18
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

nomad/client_csi_endpoint.go Show resolved Hide resolved
return nil, fmt.Errorf("failed to find clients running controller plugin %q", pluginID)
}

rand.Shuffle(len(clientIDs), func(i, j int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants