-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replica set conditions API #33905
Replica set conditions API #33905
Conversation
Still needs a test |
And #33092 would be a nice-to-have. |
Jenkins Kubemark GCE e2e failed for commit 7a20df8c2c60f2088b5f7ce08255c29d76aae198. Full PR test history. The magic incantation to run this job again is |
@@ -1753,6 +1753,32 @@ type ReplicationControllerStatus struct { | |||
|
|||
// ObservedGeneration is the most recent generation observed by the controller. | |||
ObservedGeneration int64 `json:"observedGeneration,omitempty"` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API change LGTM - it is consistent with other objects that have conditions.
@kubernetes/api-review-team any disagreement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other remark about LastProbeTime
.
I would recommend splitting the change to actually set conditions to a separate PR, and have this only be the API change to support conditions. |
Sure, do you want something similar for the perma-failed PR? |
Updated to include just the API changes |
Jenkins unit/integration failed for commit 9b21223e8e0b00527af04003cc83132f13110323. Full PR test history. The magic incantation to run this job again is |
Jenkins verification failed for commit 9b21223e8e0b00527af04003cc83132f13110323. Full PR test history. The magic incantation to run this job again is |
Thanks for splitting - I have no objections to adding this field to ReplicaSets but needs a sign off to @kubernetes/api-review-team |
// Status of the condition, one of True, False, Unknown. | ||
Status ConditionStatus `json:"status"` | ||
// The last time the condition transitioned from one status to another. | ||
LastTransitionTime unversioned.Time `json:"lastTransitionTime,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, there's also LastProbeTime unversioned.Time
which is the last time a check was actually performed. See PodCondition or JobCondition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But does LastProbeTime
make sense for the types of conditions we have for replica sets? Both Ready
and ReplicaFailure
conditions are updated only once their status changes. See #19343 (comment) for more info on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment and further discussion doesn't convince me to have just single timestamp. Having both seems more flexible, imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with both types of timestamps
@@ -1753,6 +1753,32 @@ type ReplicationControllerStatus struct { | |||
|
|||
// ObservedGeneration is the most recent generation observed by the controller. | |||
ObservedGeneration int64 `json:"observedGeneration,omitempty"` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other remark about LastProbeTime
.
const ( | ||
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods | ||
// fails to be created or deleted. | ||
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one condition? What about running or similar, at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add an additional Ready
condition type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced Ready is meaningful for an RC.
@kubernetes/kube-api ptal |
Jenkins GKE smoke e2e failed for commit 8ff78aa. Full PR test history. The magic incantation to run this job again is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods | ||
// fails to be created or deleted. | ||
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure" | ||
// ReplicationControllerReady denotes whether all of the replicas for the controller are ready or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention that this flips if any replica is not yet ready? That can happen sort of arbitrarily - why is this useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will flip. Actually you got me thinking that this information already exists in the resource printer of a replica set (READY pods). I wanted to make replica sets more usable but this condition doesn't add much. Removing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree Ready doesn't belong here.
// These are valid conditions of a replication controller. | ||
const ( | ||
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods | ||
// fails to be created or deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explain how/why this might happen? E.g. is it just bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually I want to surface ImagePull errors and crashlooping pods under this (or maybe a new one) Condition too.
const ( | ||
// ReplicationControllerReplicaFailure is added in a replication controller when one of its pods | ||
// fails to be created or deleted. | ||
ReplicationControllerReplicaFailure ReplicationControllerConditionType = "ReplicaFailure" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced Ready is meaningful for an RC.
// fails to be created or deleted. | ||
ReplicaSetReplicaFailure ReplicaSetConditionType = "ReplicaFailure" | ||
// ReplicaSetReady denotes whether all of the replicas for the replica set are ready or not. | ||
ReplicaSetReady ReplicaSetConditionType = "Ready" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as RC
@@ -28176,6 +28176,39 @@ | |||
} | |||
} | |||
}, | |||
"v1.ReplicationControllerCondition": { | |||
"description": "ReplicationControllerCondition describes the state of a replication controller at a certain point.", | |||
"required": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have been confused about the word Condition
until i found this PR. I was thinking something like a predicate or cirumstance that led to something. Should we rather call it ReplicationControllerState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that ship has sailed. The convention is "condition".
noun
1.
the state of something, especially with regard to its appearance, quality, or working order.
"State" is equally ambiguous.
@@ -28232,6 +28265,13 @@ | |||
"type": "integer", | |||
"format": "int32" | |||
}, | |||
"conditions": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this states
@kubernetes/sig-apps does anyone else feel confused about the use of the word |
Condition is what we use in this context in the API. It's an array of
conditions on the object. See Pod and Node.
|
"state" implies a state-machine, and this is designed to avoid a We do not, however, seem to have a guideline for whether all conditions On Sun, Oct 9, 2016 at 9:37 AM, Clayton Coleman [email protected]
|
@krmayankk there's #7856 as a background for your doubts. |
This is something I came across both here and in Deployment conditions with ReplicaFailure. Do we want to show it up by default to False and transition to True once we have a ReplicaFailure or do we want it to show up only on failures? Should we update api conventions about what is the right approach to authoring Conditions? |
Left to my own devices I would have probably made it an implies On Mon, Oct 10, 2016 at 3:29 AM, Michail Kargakis [email protected]
|
|
||
// These are valid conditions of a replica set. | ||
const ( | ||
// ReplicaSetReplicaFailure is added in a replication controller when one of its pods fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/replication controller/replica set/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Initially we started with conditions like Ready, Available, Reachable, etc. -- true always meant "known good". False would assumed by default, though we always populated the conditions. However, most node conditions we wanted to report were problems, so "known bad" conditions were introduced. These conditions we didn't always populate. Even in this case, false could be assumed by default, however. ReplicaFailed appears to match this convention. I'm fine with it. I assume that ReplicaFailed would be present when the controller was unable to make current state match desired state. |
ok to test |
@Kargakis Documenting the convention would be useful. |
In the first pass, ReplicaFailure will be added only once a CREATE or DELETE on a pod fails. So the answer is yes. Long-term though I would like to surface conditions like ImagePullBackOff or CrashLoopBackOff which means that the current state (rs.status.replicas) will match the desired state (rs.spec.replicas) but we will still have pod failures. This will be very useful for infant mortality detection: #18568 |
@bgrant0607 @thockin can we move forward with this? I would like to have it in 1.5 |
cc @erictune |
This PR contains only the API changes as per @smarterclayton's request. I will open a follow-up with the implementation as soon as this merges. |
API LGTM |
Current yes, further haven't seen yet. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Automatic merge from submit-queue Replica set conditions controller changes Follow-up to #33905, partially addresses #32863. @smarterclayton @soltysh @bgrant0607 @mfojtik I just need to add e2e tests
Partially addresses #32863
@kubernetes/sig-apps
This change is