Tasks may fail on reconnect if alloc and task runners run before client connects to servers #15139
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/client
type/bug
Nomad version
Nomad v1.4.1 (2aa7e66bdb526e25f59883952d74dad7ea9a014e)
Operating system and Environment details
macOs
Issue
Jobs that need alloc or task runner hooks that communicate with Nomad servers (such as Nomad Native Service Discovery) may fail on reconnect if these hooks run before the client establishes a connection with the server.
Reproduction steps
unknown
This is a race condition problem, so you may need to try this a few times 😅
Expected Result
Allocation reconnects properly.
Actual Result
Allocation fails on reconnect.
Job file (if appropriate)
Sample configuration files
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: