Update Condition guidance #4521

evankanderson · 2020-02-14T22:47:14Z

Which issue(s) this PR fixes:
Fixes #4471

I took a first stab at updating these conventions based on kubernetes-sigs/cluster-api#1658 (comment) and kubernetes-sigs/gateway-api#47 (comment).

/assign @smarterclayton @liggitt

I'm happy to recommend different top-level conditions type names. Note that I recommended Succeeded over Complete as a top-level condition, because Complete doesn't indicate whether the Job succeeded or failed.

k8s-ci-robot · 2020-02-14T22:47:22Z

Hi @evankanderson. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

youngnick · 2020-02-19T01:20:47Z

When looking at this across a bunch of types, I wondered how we could possibly reconcile the history of Conditions in a way that would be machine-parseable easily. The best I could come up with was recommending that new Conditions add a polarity field (or whatever the name ends up being) that indicates if Status True or False is the healthy condition. If that is assumed to be False when not present, then a library could make a fair guess at what the Status means regardless of when it was created.

In no way intended to block this PR, which I fully support.

mrbobbytables · 2020-02-23T13:57:16Z

/ok-to-test

evankanderson · 2020-02-24T21:31:21Z

(I thought I'd sent an update from mobile while I was on vacation, but I either hallucinated it or technology ate it)

I'd like to defer the discussion of how to migrate existing controllers into the new pattern to a subsequent PR, possibly as a KEP or introduction of common infrastructure for managing Conditions. However, since you brought it up, my preference would be to introduce a severity field which could be used for both ErrorIfTrue and ErrorIfFalse signalling, but also indicate that some Conditions provide information without signalling a fatal error.

danehans · 2020-03-18T23:02:33Z

@evankanderson thank you for putting this PR together. kubernetes-sigs/gateway-api#104 is on hold until this merges. @smarterclayton @liggitt is anything needed to move this PR forward?

therealmitchconnors · 2020-03-24T18:30:58Z

These arguments are compelling, but run opposite of previous guidance used for projects like kstatus. Can we get sig-architecture clarification on whether healthy conditions should be true or false?

erictune · 2020-03-27T14:41:43Z

Big fan of this change.

liggitt · 2020-03-27T20:23:17Z

contributors/devel/sig-architecture/api-conventions.md

 An example of an oscillating condition type is `Ready` (despite it running
 afoul of current guidance), which indicates the object was believed to be fully


if this changes the guidance around positive/negative conditions, we can drop the (despite it running afoul of current guidance) parenthetical

Thanks for catching this, updated.

evankanderson

Can you approve this, or is there someone else I should chase?

evankanderson · 2020-03-27T21:42:09Z

contributors/devel/sig-architecture/api-conventions.md

 An example of an oscillating condition type is `Ready` (despite it running
 afoul of current guidance), which indicates the object was believed to be fully


Thanks for catching this, updated.

youngnick · 2020-03-31T05:08:59Z

@therealmitchconnors This PR is codifying the answers you'll get from people if you ask around into the official docs, so we can all be on the same page from now on. Any stuff that's already done will stay how it is, but when I asked sig-arch and sig-api-machinery, I was told this was the new way. I agree that having someone confirm that here would be ideal though.

liggitt · 2020-04-01T04:30:54Z

cc @deads2k re kubernetes/enhancements#1624

liggitt · 2020-04-16T13:05:08Z

contributors/devel/sig-architecture/api-conventions.md

+* Conditions should have a consistent polarity. **A "positive" polarity where
+  "True" indicates normal operation and "False" indicates a failure is
+  recommended.** (Note that this is an explicit reversal from earlier


Clarify this is a guide for choosing condition names and polarity, not a guide for consuming arbitrary conditions. As written, it sounds like I can write a client that assumes all conditions with a False status represent failure.

I would like to reach that point sometime in the future, but you're correct that we can't do this for a long time, if ever.

Adjusted this to suggest that consistency within a resource was most important consideration, and positive polarity should be chosen if there isn't already a consistent polarity.

liggitt · 2020-04-16T13:07:33Z

contributors/devel/sig-architecture/api-conventions.md

+     may require adjusting condition type names, for example changing the type
+     name from "QuotaExhausted" to "ResourcesAvailable".
+
+   * The `Unknown` status is the default, and typically indicates that


The term default is overloaded... this makes it sound as though a condition that omits the status field has its status defaulted to Unknown, which is not the case. The phrase later is clearer:

The absence of a condition should be interpreted the same as Unknown.

liggitt · 2020-04-17T14:46:26Z

contributors/devel/sig-architecture/api-conventions.md

+* It's helpful to have a common top-level condition which summarizes more
+  detailed conditions. Simple consumers may simply query the top-level
+  condition. The `Ready` and `Succeeded` condition types may be used for
+  long-running and bounded-execution objects, respectively.


this also needs to be clearly oriented at the condition producer; as phrased it could be misunderstood by a simple consumer to mean that they can consistently expect to be able to use Ready / Succeeded conditions.

Again, this is the long-term situation that I'd like to steer the world towards, but you're correct that this could be read by many consumers to demand more from API implementers than is likely in the short term.

Amended, PTAL.

lavalamp · 2020-09-17T21:09:13Z

contributors/devel/sig-architecture/api-conventions.md

+  * For state transitions which take a long period of time, use a condition
+    which indicates that the transition succeeded ("Provisioned" or
+    "SizeAllocated") with an intermediate state of `Unknown`. Note that this
+    pattern makes an explicit distinction between "condition not set" (which
+    means the reconciler hasn't seen the latest state) and "condition set to
+    Unknown" (which means that the reconciler has seen the state but hasn't
+    finished the reconciliation).


I guess your goal is that some watcher can observe progress and know that it's done? You should document how long after the operation the condition should be left there.

But I'm not convinced this is the way to do it, what if there are two resizes back to back? I don't think you can really have conditions 1:1 with operations.

I think "Resizing=true" that is present during the resize and removed after the resize is also a legitimate--and possibly less confusing--way to represent this. Seeing "ResizingSucceeded=Unknown" is disconcerting, even if it's technically accurate.

Maybe the way to strengthen your point is to talk about how the failure of the operation will be represented. I think you can't really do that with a "Reszing" condition, but if you have another way of representing failure, it's OK. If you need a way to represent failure, then "ResizingSucceeded=false" works better.

Actually, my concern is that the watcher can determine whether progress is being made in a generic way. (Particularly in the presence of CRDs and a variety of types, detecting "controller is stuck" vs "backend system is slow" can be quite difficult.)

I'd suggest not removing the condition when the operation is done (hence, Provisioned or SizeAllocated). The goal is twofold:

Presence of the condition along with LastTransitionTime helps to debug whether the controller has seen the most recent spec.

Ability to signal success and failure of these operations in a generic way that will stick with the resource seems useful for the "hey, could you help debug this with me" case -- a resize failed condition that disappears after 2 hours might be gone by the time an expert takes a look at the correct resource in the cluster.

As for the human ergonomics, I was hoping that the Reason and Message fields would suffice:

status: conditions: - name: Resizing status: True Reason: FileSystemExtensionInProgress Message: The system is currently extending the filesystem on the disk after resize.

status: conditions: - name: SizeAvailable status: Unknown Reason: FileSystemExtensionInProgress Message: The system is currently extending the filesystem on the disk after resize.

(Note that I changed this to SizeAvailable because there's probably a distinction between allocation failures and filesystem extension failures.)

Maybe a way to say this is something like "conditions should not be transient, they should more or less always be on the object with the appropriate value".

But this doesn't rule out something like "Resizing", you can just set it False when the object is the correct size. At first it seems like people would need to look at the Reason to know if it's a "good" false (no resize needed) or a "bad" false (resize failed), but in the latter case we can just leave it in "resizing" (Reason is e.g. "stuck", message is "resize operation can't complete due to error foo").

That works for me -- my concern was more specifically that the set of signalled Conditions not be transient, which this addresses. PTAL!

evankanderson · 2020-09-24T05:25:18Z

As a followup to this PR, I notice that Eric's analysis highlights a number of cases where developers are prefixing their Condition types, ala AppDBInstanceReady for a resource named AppDBOperator. Are we tired of swinging at this, or would a subsequent PR along the lines of smarterclayton's comment be a good follow-on?

Conditions summarize characteristics of object lifecycle that must be generic across multiple types, have a need to convey both state and human message, and can be extended by third parties to add more nuance to existing lifecycle.

An object in Kube is a distributed state machine. The state specified to that object should always be fields. Some objects have consistent fields that describe status of the state machine. One of those is observedGeneration. Another one is conditions. Some conditions define generic, multi-object state machine characteristics like Available, Progressing, Ready.

(I don't want to derail this PR with extra load at this point, but it's been bugging me since I came back to this.)

lavalamp · 2020-09-24T16:37:36Z

contributors/devel/sig-architecture/api-conventions.md

-could be `Failed`. A `True` status for `Failed` would imply failure with no
-retry. An object that was still active would generally not have a `Failed`
-condition.
+An example of an oscillating condition type is `Ready`, which indicates the


Resizing is another natural example, but no need to add it here. :)

lavalamp · 2020-09-24T16:38:24Z

/lgtm
/approve

Let's get this in before anyone thinks of another thing to change. There can always be a follow-up!

k8s-ci-robot · 2020-09-24T16:38:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: evankanderson, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~contributors/devel/sig-architecture/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lavalamp · 2020-09-24T16:39:45Z

Are we tired of swinging at this, or would a subsequent PR along the lines of smarterclayton's comment be a good follow-on?

I bet we can do a followup in less time than this PR took :)

thockin

I'm still super torn on this. This PR has been reduced almost beyond arguability, but still...

I don't think we have a clear rule for when some datum wants to be a field or a condition. Half of the examples I see here would, IMO, be better as fields. So why are we suggesting conditions for them?

lavalamp · 2020-09-25T19:01:15Z

I agree with that, but this PR has been pending for more than half a year and is an improvement over the existing text.

dims · 2020-09-25T19:34:39Z

@thockin if you want to throw some outline of what you want to see, i can try to convince @evankanderson to do a follow up :)

evankanderson · 2020-09-25T21:59:41Z

I'd be happy to schedule a chat with whomever is interested to try to work out some better / additional guidance.

I can say that we've found the structure of Conditions (short code plus human-readable text and the tri-state Status) very useful for communicating to humans, so I'd like to preserve that.

dims · 2020-09-26T01:36:10Z

@evankanderson let's line it up for next sig-arch?

thockin · 2020-09-26T04:46:51Z

I mean, I get that field-level metadata is nice, but if that's the case, we should have it on all fields. We shouldn't force things to fit into a string-bool just to get metadata. Or else the metadata is "nice" and not "critical"

…

On Fri, Sep 25, 2020 at 2:59 PM Evan Anderson ***@***.***> wrote: I'd be happy to schedule a chat with whomever is interested to try to work out some better / additional guidance. I can say that we've found the structure of Conditions (short code plus human-readable text and the tri-state Status) very useful for communicating to humans, so I'd like to preserve that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4521 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVHATNPK5GUGA3GZAOLSHUHF3ANCNFSM4KVRDSEQ> .

lavalamp · 2020-09-28T18:50:05Z

Practical reasons not to apply this to all fields: that go doesn't do templates (yet), and objects would get very large.

One practical reason to keep it a bool is that it's simple to express what you want to watch for with kubectl tools.

My current guidance would be, conditions are good for:

Top-level facts about the object,
Which can change over the lifetime of the object (i.e., could be a valid target to wait for),
And are not overly detailed (i.e., don't make a separate condition for the version of every image cached on the node),
And have a unique owner (i.e., each condition has a single setter, but it's ok to have different setters for different conditions).

I think the "too detailed" and "valid target to wait for" tests are subjective and require some taste.

I think there's a bit of an open question when multiple instances of the same binary want to vote/set a condition; apiserver does this in multiple places and I'm not sure we're super happy about it. @deads2k

Update Condition guidance

3a2c17b

k8s-ci-robot assigned liggitt and smarterclayton Feb 14, 2020

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2020

k8s-ci-robot requested review from derekwaynecarr and dims February 14, 2020 22:48

danehans mentioned this pull request Feb 17, 2020

Reverses Gateway Condition Polarity kubernetes-sigs/gateway-api#104

Closed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 23, 2020

liggitt reviewed Mar 27, 2020

View reviewed changes

evankanderson added 2 commits March 27, 2020 14:38

Merge remote-tracking branch 'upstream/master' into unconventional

f0fdf1c

Drop positive-polarity exception for Ready

5cab966

evankanderson commented Mar 27, 2020

View reviewed changes

mortent mentioned this pull request Apr 10, 2020

Use Ready condition for computing status if it is available on a resource kubernetes-sigs/cli-utils#109

Merged

liggitt reviewed Apr 16, 2020

View reviewed changes

liggitt reviewed Apr 17, 2020

View reviewed changes

Add guidance on conditions for long-running reconciliations

dd082f4

lavalamp reviewed Sep 17, 2020

View reviewed changes

justinsb mentioned this pull request Sep 22, 2020

Refactors code to set and get status from object kubernetes-sigs/kubebuilder-declarative-pattern#118

Merged

This was referenced Sep 23, 2020

Should we deprecate status.conditions? kubernetes/kubernetes#50798

Closed

provide recommended .status.conditions schema kubernetes/enhancements#1624

Merged

Adjust with suggested language from lavalamp

0a9d1eb

lavalamp reviewed Sep 24, 2020

View reviewed changes

k8s-ci-robot assigned lavalamp Sep 24, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2020

k8s-ci-robot merged commit 04531ba into kubernetes:master Sep 24, 2020

thockin reviewed Sep 25, 2020

View reviewed changes

github-actions bot mentioned this pull request Sep 29, 2020

Week Ending September 27, 2020 dev-obs/actus#234

Open

lavalamp mentioned this pull request May 27, 2021

Eliminate Phase and simplify Conditions kubernetes/kubernetes#7856

Closed

Nokel81 mentioned this pull request Aug 3, 2021

Fix custom resources not having a fallback details component lensapp/lens#3542

Merged

akutz mentioned this pull request Aug 18, 2021

Add a DeployedCondition for the VirtualMachine object vmware-archive/vm-operator-api#82

Closed

displague mentioned this pull request Oct 8, 2021

Deprecate the 'Synced' status condition crossplane/crossplane-runtime#198

Closed

metacosm mentioned this pull request Nov 4, 2021

Readiness doesn't account for metadata.observedGeneration fabric8io/kubernetes-client#3474

Closed

keithmattix mentioned this pull request Sep 20, 2022

Add GEP-1364 proposal kubernetes-sigs/gateway-api#1383

Merged

joshri mentioned this pull request Oct 5, 2022

Handle conditions without a type of "Ready" weaveworks/weave-gitops#2837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Condition guidance #4521

Update Condition guidance #4521

evankanderson commented Feb 14, 2020

k8s-ci-robot commented Feb 14, 2020

youngnick commented Feb 19, 2020

mrbobbytables commented Feb 23, 2020

evankanderson commented Feb 24, 2020 •

edited

Loading

danehans commented Mar 18, 2020

therealmitchconnors commented Mar 24, 2020

erictune commented Mar 27, 2020

liggitt Mar 27, 2020

evankanderson Mar 27, 2020

evankanderson left a comment

evankanderson Mar 27, 2020

youngnick commented Mar 31, 2020

liggitt commented Apr 1, 2020

liggitt Apr 16, 2020

evankanderson Apr 17, 2020

liggitt Apr 16, 2020

liggitt Apr 17, 2020

evankanderson Apr 17, 2020

lavalamp Sep 17, 2020

evankanderson Sep 23, 2020

lavalamp Sep 23, 2020

evankanderson Sep 24, 2020

evankanderson commented Sep 24, 2020

lavalamp Sep 24, 2020

lavalamp commented Sep 24, 2020

k8s-ci-robot commented Sep 24, 2020

lavalamp commented Sep 24, 2020

thockin left a comment

lavalamp commented Sep 25, 2020

dims commented Sep 25, 2020

evankanderson commented Sep 25, 2020

dims commented Sep 26, 2020

thockin commented Sep 26, 2020 via email

lavalamp commented Sep 28, 2020

		An example of an oscillating condition type is `Ready` (despite it running
		afoul of current guidance), which indicates the object was believed to be fully

Update Condition guidance #4521

Update Condition guidance #4521

Conversation

evankanderson commented Feb 14, 2020

k8s-ci-robot commented Feb 14, 2020

youngnick commented Feb 19, 2020

mrbobbytables commented Feb 23, 2020

evankanderson commented Feb 24, 2020 • edited Loading

danehans commented Mar 18, 2020

therealmitchconnors commented Mar 24, 2020

erictune commented Mar 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick commented Mar 31, 2020

liggitt commented Apr 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson commented Sep 24, 2020

Choose a reason for hiding this comment

lavalamp commented Sep 24, 2020

k8s-ci-robot commented Sep 24, 2020

lavalamp commented Sep 24, 2020

thockin left a comment

Choose a reason for hiding this comment

lavalamp commented Sep 25, 2020

dims commented Sep 25, 2020

evankanderson commented Sep 25, 2020

dims commented Sep 26, 2020

thockin commented Sep 26, 2020 via email

lavalamp commented Sep 28, 2020

evankanderson commented Feb 24, 2020 •

edited

Loading