Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follower rejecting leader's pre-vote #606

Closed
k-jingyang opened this issue Aug 17, 2024 · 2 comments · Fixed by #605
Closed

Follower rejecting leader's pre-vote #606

k-jingyang opened this issue Aug 17, 2024 · 2 comments · Fixed by #605

Comments

@k-jingyang
Copy link
Contributor

k-jingyang commented Aug 17, 2024

Hello, thanks for the library! We were trying out raft v1.7.0, and found that the pre-vote implementation has a bug where a follower rejects the pre-vote req from node X even when when the follower thinks that X is the leader.

This is the offending line. https://github.com/hashicorp/raft/blob/main/raft.go#L1752

// Check if we have an existing leader [who's not the candidate] and also
var candidate ServerAddress // ISSUE: this is not set at all
candidateID := ServerID(req.ID)

// if the Servers list is empty that mean the cluster is very likely trying to bootstrap,
// Grant the vote
if len(r.configurations.latest.Servers) > 0 && !inConfiguration(r.configurations.latest, candidateID) {
	r.logger.Warn("rejecting pre-vote request since node is not in configuration",
		"from", candidate)
	return
}

if leaderAddr, leaderID := r.LeaderWithID(); leaderAddr != "" && leaderAddr != candidate { // ISSUE: so this will always return true as long as the follower has a leader
	r.logger.Warn("rejecting pre-vote request since we have a leader",
		"from", candidate,
		"leader", leaderAddr,
		"leader-id", string(leaderID))
	return
}
@banks
Copy link
Member

banks commented Aug 19, 2024

@k-jingyang great catch and thanks for reporting this and creating a PR.

I'm curious how you found this case. Did you observe this bug in a production or test settting? Or was this something you spotted in the code?

I agree we should probably fix that code as it's at least not doing what was intended, just trying to understand the scenarios where it would have an impact.

@k-jingyang
Copy link
Contributor Author

Hey @banks, we observed this bug in our test cluster of 3 nodes. For us, it happened on re-deployment, and it has only happened once so far.

There's an ongoing discussion in the PR regarding what actually happened and the scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants