-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs for stop_on_client_disconnect
stanza
#7938
Conversation
4714552
to
38755c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original copy is okay. I took a swing at it from scratch just to provide an alternate.
- `stop_after_client_disconnect` `(string: "")` - Specifies the | ||
duration to wait when the Nomad client is disconnected from the server | ||
after the [`heartbeat_grace`] window before terminating this group. By | ||
default, Nomad runs tasks indefinitely on clients even if the client | ||
has crashed or can't communicate with the server. After the | ||
[`heartbeat_grace`] window, the server marks allocations on the node | ||
as "lost" and reschedules them. But the task is left running on the | ||
client. For some workloads, operators may want to ensure that the | ||
allocations on the clients are also stopped (for example, they require | ||
exclusive access to an external resource). The group level | ||
`stop_after_client_disconnect` opts the group into being terminated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative version of the text for consideration. Take anything that you like from here, toss the rest.
- `stop_after_client_disconnect` `(string: "")` - Specifies the | |
duration to wait when the Nomad client is disconnected from the server | |
after the [`heartbeat_grace`] window before terminating this group. By | |
default, Nomad runs tasks indefinitely on clients even if the client | |
has crashed or can't communicate with the server. After the | |
[`heartbeat_grace`] window, the server marks allocations on the node | |
as "lost" and reschedules them. But the task is left running on the | |
client. For some workloads, operators may want to ensure that the | |
allocations on the clients are also stopped (for example, they require | |
exclusive access to an external resource). The group level | |
`stop_after_client_disconnect` opts the group into being terminated. | |
- `stop_after_client_disconnect` `(string: "")` - Specifies a duration after | |
which a Nomad client that is partitioned from the servers will stop | |
allocations based on this task group. By default, a client will not stop an | |
allocation until explicitly told to by a server. A client that fails to | |
heartbeat to a server within the `hearbeat_grace` window and any allocations | |
running on it will be marked "lost" and Nomad will schedule replacement | |
allocations. However, these replaced allocations will continue to run on the | |
non-responsive client; an operator may desire that these replaced allocations | |
are also stopped in this case—for example, allocations requiring exclusive | |
access to an external resource. When specified, the Nomad client will stop | |
them after this duration. The Nomad client process must be running for this to | |
occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took almost all of this in my last commit here but I swamped out "is partitioned" with "that cannot communicate" to avoid using some narrow distsys language where I think we want something simpler and broader.
|
c19f834
to
3772580
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clock drift situation is less dire than full cluster synchronization (although it wouldn't hurt). I'm not sure we need the second paragraph, but I'll bet it'll come up.
3772580
to
4985310
Compare
Waiting till #7939 is merged to merge this. |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
In support of #2185