-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track Tasks State #416
Track Tasks State #416
Conversation
// transistions. | ||
type TaskState struct { | ||
State string | ||
Events []*TaskEvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we bounding the size of Events somewhere? A service style job might potentially have a lot of events over the course of time if the node on which it is placed remains healthy for a really long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will in setState as suggested below
@dadgar Overall looks a good step in the right direction with respect to TaskStates, and design looks fine too. Left you some questions. |
const ( | ||
TaskStatePending = "pending" // The task is waiting to be run. | ||
TaskStateRunning = "running" // The task is currently running. | ||
TaskStateDead = "dead" // Terminal state of task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these different from the other states defined here or here?
I think it's a bit confusing having two notions of state declared in two places. It seems like the ones in the api are intended for display, but there is a difference in granularity for failure states. Can we consolidate these somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have an open issue to do this: #329. But this is not something to tackle in this PR. It is across the whole project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But even in this file there are two blocks of const
declarations for TaskState, one which describes 3 ways in which the task can be dead: https://github.com/hashicorp/nomad/blob/f-task-state/nomad/structs/structs.go#L1056-L1087
It seems like it would be clearer to do something like:
func (t *TaskState) IsAlive() bool {
return !(t.State == TaskDriverFailure || t.State == TaskTerminated || t.State == TaskKilled)
}
if task.IsAlive() {
// Do something
}
and remove TaskStateDead
, rather than
if task.State == TaskStateDead ...
// or
state.Events[last].Type == structs.TaskTerminated
Particularly with this switch
in syncStatus()
it seems like the semantics are buried and could be more easily expressed via state.IsRunning()
, state.IsAlive()
, etc.
With untyped constants it's not clear in the code which should be used where and I think it also creates an opportunity to show end-users an ambiguous "dead" state when there is more information available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah there is a confusion. The State can only be pending
, running
, dead
. However there are a lot of ways a task can enter these states. The TaskEvent
are those. If you think of it as a finite state machine, the State are the nodes and the Events are the transitions.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR:
On a side note, this PR also fixes the alloc_runner and task_runner test regressions introduced with restart policies.