Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of client/metadata: fix crasher caused by AllowStale = false into release/1.5.x #16576

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #16549 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

The below text is copied from the body of the original PR.


Fixes #16517

Given a 3 Server cluster with at least 1 Client connected to Follower 1:

If a NodeMeta.{Apply,Read} for the Client request is received by Follower 1 with AllowStale = false the Follower will forward the request to the Leader.

The Leader, not being connected to the target Client, will forward the RPC to Follower 1.

Follower 1, seeing AllowStale=false, will forward the request to the Leader.

The Leader, not being connected to... well hoppefully you get the picture: an infinite loop occurs.

Manually testing looks good, but automated testing is still WIP.

Update: I'm pretty pleased with how the manual test turned out! It's pretty big but reliably exercises the bug without the patch. With the fix the test runs in <400ms locally which isn't bad for running 7 agents and waiting for each one to become ready!

If you're feeling brave (or like watching the world burn), you can exercise the bug by commenting out one of the AllowStale = true lines and running the test with go test -v -run NodeMeta in the nomad/ directory.

I highly recommend doing that in a container or VM so you don't mess up your machine. I used:

docker run --memory 4000m --oom-score-adj 500 --mount type=bind,source="$(pwd)",target=/nomad -ti golang:1.20.2
apt update
apt install iproute2  # required by clients
cd /nomad
go install -buildvcs=false # keep go/git from complaining
cd /nomad/nomad
go test -v -run NodeMeta

# takes 120s for my machine to fail the test, but leader elections start happening far sooner

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-clientrpc-loop/manually-splendid-jackass branch 2 times, most recently from 6b38a69 to 2dfea02 Compare March 20, 2023 23:32
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit 2698eb4 into release/1.5.x Mar 20, 2023
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/b-clientrpc-loop/manually-splendid-jackass branch March 20, 2023 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants