Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: retry RPC call when no server is available #15140

Merged
merged 4 commits into from
Nov 4, 2022

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Nov 3, 2022

When a Nomad service starts it tries to establish a connection with servers, but it also runs alloc runners to manage whatever allocations it needs to run.

The alloc runner will invoke several hooks to perform actions, with some of them requiring access to the Nomad servers, such as Native Service Discovery Registration.

If the alloc runner starts before a connection is established the alloc runner will fail, causing the allocation to be shutdown. This is particularly problematic for disconnected allocations that are reconnecting, as they may fail as soon as the client reconnects.

This commit changes the RPC request logic to retry it, using the existing retry mechanism, if there are no servers available.

Closes #15139

When a Nomad service starts it tries to establish a connection with
servers, but it also runs alloc runners to manage whatever allocations
it needs to run.

The alloc runner will invoke several hooks to perform actions, with some
of them requiring access to the Nomad servers, such as Native Service
Discovery Registration.

If the alloc runner starts before a connection is established the alloc
runner will fail, causing the allocation to be shutdown. This is
particularly problematic for disconnected allocations that are
reconnecting, as they may fail as soon as the client reconnects.

This commit changes the RPC request logic to retry it, using the
existing retry mechanism, if there are no servers available.
@lgfa29 lgfa29 requested review from shoenig and jrasell November 3, 2022 23:14
@lgfa29 lgfa29 added backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line labels Nov 3, 2022
@lgfa29 lgfa29 added this to the 1.4.x milestone Nov 3, 2022
@lgfa29
Copy link
Contributor Author

lgfa29 commented Nov 3, 2022

This looks like a lot of code change, but I just indented the main RPC call. I had a version that used goto that didn't require the indentation but goto is 😬

But I'm also open to other implementation ideas 🙂

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shoenig
Copy link
Member

shoenig commented Nov 4, 2022

This looks like a lot of code change,

protip: there's a "No whitespace" option which helps a lot

Screenshot from 2022-11-04 09-02-15

Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; just the else suggestion

client/rpc.go Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Mar 5, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tasks may fail on reconnect if alloc and task runners run before client connects to servers
3 participants