-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Condition guidance #4521
Conversation
Hi @evankanderson. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When looking at this across a bunch of types, I wondered how we could possibly reconcile the history of Conditions in a way that would be machine-parseable easily. The best I could come up with was recommending that new Conditions add a In no way intended to block this PR, which I fully support. |
/ok-to-test |
(I thought I'd sent an update from mobile while I was on vacation, but I either hallucinated it or technology ate it) I'd like to defer the discussion of how to migrate existing controllers into the new pattern to a subsequent PR, possibly as a KEP or introduction of common infrastructure for managing Conditions. However, since you brought it up, my preference would be to introduce a |
@evankanderson thank you for putting this PR together. kubernetes-sigs/gateway-api#104 is on hold until this merges. @smarterclayton @liggitt is anything needed to move this PR forward? |
These arguments are compelling, but run opposite of previous guidance used for projects like kstatus. Can we get sig-architecture clarification on whether healthy conditions should be true or false? |
Big fan of this change. |
An example of an oscillating condition type is `Ready` (despite it running | ||
afoul of current guidance), which indicates the object was believed to be fully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this changes the guidance around positive/negative conditions, we can drop the (despite it running afoul of current guidance)
parenthetical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you approve this, or is there someone else I should chase?
An example of an oscillating condition type is `Ready` (despite it running | ||
afoul of current guidance), which indicates the object was believed to be fully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, updated.
@therealmitchconnors This PR is codifying the answers you'll get from people if you ask around into the official docs, so we can all be on the same page from now on. Any stuff that's already done will stay how it is, but when I asked sig-arch and sig-api-machinery, I was told this was the new way. I agree that having someone confirm that here would be ideal though. |
* Conditions should have a consistent polarity. **A "positive" polarity where | ||
"True" indicates normal operation and "False" indicates a failure is | ||
recommended.** (Note that this is an explicit reversal from earlier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this is a guide for choosing condition names and polarity, not a guide for consuming arbitrary conditions. As written, it sounds like I can write a client that assumes all conditions with a False
status represent failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to reach that point sometime in the future, but you're correct that we can't do this for a long time, if ever.
Adjusted this to suggest that consistency within a resource was most important consideration, and positive polarity should be chosen if there isn't already a consistent polarity.
may require adjusting condition type names, for example changing the type | ||
name from "QuotaExhausted" to "ResourcesAvailable". | ||
|
||
* The `Unknown` status is the default, and typically indicates that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term default
is overloaded... this makes it sound as though a condition that omits the status field has its status defaulted to Unknown
, which is not the case. The phrase later is clearer:
The absence of a condition should be interpreted the same as
Unknown
.
* It's helpful to have a common top-level condition which summarizes more | ||
detailed conditions. Simple consumers may simply query the top-level | ||
condition. The `Ready` and `Succeeded` condition types may be used for | ||
long-running and bounded-execution objects, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also needs to be clearly oriented at the condition producer; as phrased it could be misunderstood by a simple consumer to mean that they can consistently expect to be able to use Ready
/ Succeeded
conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is the long-term situation that I'd like to steer the world towards, but you're correct that this could be read by many consumers to demand more from API implementers than is likely in the short term.
Amended, PTAL.
* For state transitions which take a long period of time, use a condition | ||
which indicates that the transition succeeded ("Provisioned" or | ||
"SizeAllocated") with an intermediate state of `Unknown`. Note that this | ||
pattern makes an explicit distinction between "condition not set" (which | ||
means the reconciler hasn't seen the latest state) and "condition set to | ||
Unknown" (which means that the reconciler has seen the state but hasn't | ||
finished the reconciliation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess your goal is that some watcher can observe progress and know that it's done? You should document how long after the operation the condition should be left there.
But I'm not convinced this is the way to do it, what if there are two resizes back to back? I don't think you can really have conditions 1:1 with operations.
I think "Resizing=true" that is present during the resize and removed after the resize is also a legitimate--and possibly less confusing--way to represent this. Seeing "ResizingSucceeded=Unknown" is disconcerting, even if it's technically accurate.
Maybe the way to strengthen your point is to talk about how the failure of the operation will be represented. I think you can't really do that with a "Reszing" condition, but if you have another way of representing failure, it's OK. If you need a way to represent failure, then "ResizingSucceeded=false" works better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my concern is that the watcher can determine whether progress is being made in a generic way. (Particularly in the presence of CRDs and a variety of types, detecting "controller is stuck" vs "backend system is slow" can be quite difficult.)
I'd suggest not removing the condition when the operation is done (hence, Provisioned
or SizeAllocated
). The goal is twofold:
-
Presence of the condition along with
LastTransitionTime
helps to debug whether the controller has seen the most recentspec
. -
Ability to signal success and failure of these operations in a generic way that will stick with the resource seems useful for the "hey, could you help debug this with me" case -- a resize failed condition that disappears after 2 hours might be gone by the time an expert takes a look at the correct resource in the cluster.
As for the human ergonomics, I was hoping that the Reason
and Message
fields would suffice:
status:
conditions:
- name: Resizing
status: True
Reason: FileSystemExtensionInProgress
Message: The system is currently extending the filesystem on the disk after resize.
status:
conditions:
- name: SizeAvailable
status: Unknown
Reason: FileSystemExtensionInProgress
Message: The system is currently extending the filesystem on the disk after resize.
(Note that I changed this to SizeAvailable
because there's probably a distinction between allocation failures and filesystem extension failures.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a way to say this is something like "conditions should not be transient, they should more or less always be on the object with the appropriate value".
But this doesn't rule out something like "Resizing", you can just set it False when the object is the correct size. At first it seems like people would need to look at the Reason to know if it's a "good" false (no resize needed) or a "bad" false (resize failed), but in the latter case we can just leave it in "resizing" (Reason is e.g. "stuck", message is "resize operation can't complete due to error foo").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me -- my concern was more specifically that the set of signalled Conditions not be transient, which this addresses. PTAL!
As a followup to this PR, I notice that Eric's analysis highlights a number of cases where developers are prefixing their Condition types, ala
(I don't want to derail this PR with extra load at this point, but it's been bugging me since I came back to this.) |
could be `Failed`. A `True` status for `Failed` would imply failure with no | ||
retry. An object that was still active would generally not have a `Failed` | ||
condition. | ||
An example of an oscillating condition type is `Ready`, which indicates the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resizing
is another natural example, but no need to add it here. :)
/lgtm Let's get this in before anyone thinks of another thing to change. There can always be a follow-up! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: evankanderson, lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I bet we can do a followup in less time than this PR took :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still super torn on this. This PR has been reduced almost beyond arguability, but still...
I don't think we have a clear rule for when some datum wants to be a field or a condition. Half of the examples I see here would, IMO, be better as fields. So why are we suggesting conditions for them?
I agree with that, but this PR has been pending for more than half a year and is an improvement over the existing text. |
@thockin if you want to throw some outline of what you want to see, i can try to convince @evankanderson to do a follow up :) |
I'd be happy to schedule a chat with whomever is interested to try to work out some better / additional guidance. I can say that we've found the structure of Conditions (short code plus human-readable text and the tri-state Status) very useful for communicating to humans, so I'd like to preserve that. |
@evankanderson let's line it up for next sig-arch? |
I mean, I get that field-level metadata is nice, but if that's the case, we
should have it on all fields. We shouldn't force things to fit into a
string-bool just to get metadata. Or else the metadata is "nice" and not
"critical"
…On Fri, Sep 25, 2020 at 2:59 PM Evan Anderson ***@***.***> wrote:
I'd be happy to schedule a chat with whomever is interested to try to work
out some better / additional guidance.
I can say that we've found the structure of Conditions (short code plus
human-readable text and the tri-state Status) very useful for communicating
to humans, so I'd like to preserve that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4521 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVHATNPK5GUGA3GZAOLSHUHF3ANCNFSM4KVRDSEQ>
.
|
Practical reasons not to apply this to all fields: that go doesn't do templates (yet), and objects would get very large. One practical reason to keep it a bool is that it's simple to express what you want to watch for with kubectl tools. My current guidance would be, conditions are good for:
I think the "too detailed" and "valid target to wait for" tests are subjective and require some taste. I think there's a bit of an open question when multiple instances of the same binary want to vote/set a condition; apiserver does this in multiple places and I'm not sure we're super happy about it. @deads2k |
Which issue(s) this PR fixes:
Fixes #4471
I took a first stab at updating these conventions based on kubernetes-sigs/cluster-api#1658 (comment) and kubernetes-sigs/gateway-api#47 (comment).
/assign @smarterclayton @liggitt
I'm happy to recommend different top-level conditions type names. Note that I recommended
Succeeded
overComplete
as a top-level condition, becauseComplete
doesn't indicate whether the Job succeeded or failed.