Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs for stop_on_client_disconnect stanza #7938

Merged
merged 5 commits into from
May 13, 2020
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented May 12, 2020

In support of #2185

@tgross tgross added theme/storage theme/docs Documentation issues and enhancements labels May 12, 2020
@tgross tgross added this to the 0.11.2 milestone May 12, 2020
@tgross tgross force-pushed the docs_stop_on_disconnect branch from 4714552 to 38755c8 Compare May 12, 2020 19:49
@tgross tgross requested review from angrycub and langmartin May 12, 2020 19:49
@tgross tgross marked this pull request as ready for review May 12, 2020 19:49
@tgross
Copy link
Member Author

tgross commented May 12, 2020

Copy link
Contributor

@angrycub angrycub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original copy is okay. I took a swing at it from scratch just to provide an alternate.

website/pages/docs/job-specification/group.mdx Outdated Show resolved Hide resolved
Comment on lines 71 to 81
- `stop_after_client_disconnect` `(string: "")` - Specifies the
duration to wait when the Nomad client is disconnected from the server
after the [`heartbeat_grace`] window before terminating this group. By
default, Nomad runs tasks indefinitely on clients even if the client
has crashed or can't communicate with the server. After the
[`heartbeat_grace`] window, the server marks allocations on the node
as "lost" and reschedules them. But the task is left running on the
client. For some workloads, operators may want to ensure that the
allocations on the clients are also stopped (for example, they require
exclusive access to an external resource). The group level
`stop_after_client_disconnect` opts the group into being terminated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative version of the text for consideration. Take anything that you like from here, toss the rest.

Suggested change
- `stop_after_client_disconnect` `(string: "")` - Specifies the
duration to wait when the Nomad client is disconnected from the server
after the [`heartbeat_grace`] window before terminating this group. By
default, Nomad runs tasks indefinitely on clients even if the client
has crashed or can't communicate with the server. After the
[`heartbeat_grace`] window, the server marks allocations on the node
as "lost" and reschedules them. But the task is left running on the
client. For some workloads, operators may want to ensure that the
allocations on the clients are also stopped (for example, they require
exclusive access to an external resource). The group level
`stop_after_client_disconnect` opts the group into being terminated.
- `stop_after_client_disconnect` `(string: "")` - Specifies a duration after
which a Nomad client that is partitioned from the servers will stop
allocations based on this task group. By default, a client will not stop an
allocation until explicitly told to by a server. A client that fails to
heartbeat to a server within the `hearbeat_grace` window and any allocations
running on it will be marked "lost" and Nomad will schedule replacement
allocations. However, these replaced allocations will continue to run on the
non-responsive client; an operator may desire that these replaced allocations
are also stopped in this case—for example, allocations requiring exclusive
access to an external resource. When specified, the Nomad client will stop
them after this duration. The Nomad client process must be running for this to
occur.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took almost all of this in my last commit here but I swamped out "is partitioned" with "that cannot communicate" to avoid using some narrow distsys language where I think we want something simpler and broader.

@tgross
Copy link
Member Author

tgross commented May 13, 2020

I need to update this as the config field is stop_after_client_disconnect. Not on. 🤦 Done.

@tgross tgross force-pushed the docs_stop_on_disconnect branch from c19f834 to 3772580 Compare May 13, 2020 14:32
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clock drift situation is less dire than full cluster synchronization (although it wouldn't hurt). I'm not sure we need the second paragraph, but I'll bet it'll come up.

website/pages/docs/job-specification/group.mdx Outdated Show resolved Hide resolved
@tgross tgross force-pushed the docs_stop_on_disconnect branch from 3772580 to 4985310 Compare May 13, 2020 14:49
@tgross
Copy link
Member Author

tgross commented May 13, 2020

Waiting till #7939 is merged to merge this.

@tgross tgross merged commit 2209ef3 into master May 13, 2020
@tgross tgross deleted the docs_stop_on_disconnect branch May 13, 2020 20:39
@github-actions
Copy link

github-actions bot commented Jan 6, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/docs Documentation issues and enhancements theme/storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants