-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client stalls when accessing a data dir that is already in use #6348
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godoc.org/github.com/boltdb/bolt#Options
Unfortunately this doesn't sound like it works on Windows. Still a step in the right direction!
It looks like the client exits from that error here: nomad/command/agent/command.go Lines 601 to 604 in 4ae2cd5
Perhaps after that logGate.Flush() we log an error so it shows up at the end. Or even better we just move the error thats already logged up to this block so its logged after the flush. |
@nickethier That is an excellent suggestion - I've hit this before. This is a good incremental improvement for sure. We can follow up with using a nomad process file lock using a cross-platform library (e.g. https://github.com/gofrs/flock) early on in initialization process. This can handle windows and ensures that we never even attempt to start consul goroutines in the first place. As for improving the error message, I'd suggest dropping the |
e502d0e
to
67d1a60
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Overview
When you run two clients with the same
data_dir
, the second one will indefinitely stall rather than failing.Behavior
Before: Client setup stalls
After: Client setup fails
Reproduction
Usually, two clients with the same config will fail on port conflict first, but you can get into this stalled state in two ways
Implementation
bolt.ErrTimeout
, suggest that another client may be running on the same data_dirTodo