-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Condition guidance #4521
Changes from all commits
3a2c17b
f0fdf1c
5cab966
511866a
49d3a5a
5f972cf
305d8d8
dd082f4
0a9d1eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -308,17 +308,77 @@ response reduces the complexity of these clients. | ||||||||||||||||||||||
|
|||||||||||||||||||||||
##### Typical status properties | |||||||||||||||||||||||
|
|||||||||||||||||||||||
**Conditions** represent the latest available observations of an object's | |||||||||||||||||||||||
state. They are an extension mechanism intended to be used when the details of | |||||||||||||||||||||||
an observation are not a priori known or would not apply to all instances of a | |||||||||||||||||||||||
given Kind. For observations that are well known and apply to all instances, a | |||||||||||||||||||||||
regular field is preferred. An example of a Condition that probably should | |||||||||||||||||||||||
have been a regular field is Pod's "Ready" condition - it is managed by core | |||||||||||||||||||||||
controllers, it is well understood, and it applies to all Pods. | |||||||||||||||||||||||
**Conditions** provide a standard mechanism for higher-level status reporting | |||||||||||||||||||||||
from a controller. They are an extension mechanism which allows tools and other | |||||||||||||||||||||||
controllers to collect summary information about resources without needing to | |||||||||||||||||||||||
understand resource-specific status details. Conditions should complement more | |||||||||||||||||||||||
detailed information about the observed status of an object written by a | |||||||||||||||||||||||
controller, rather than replace it. For example, the "Available" condition of a | |||||||||||||||||||||||
Deployment can be determined by examining `readyReplicas`, `replicas`, and | |||||||||||||||||||||||
other properties of the Deployment. However, the "Available" condition allows | |||||||||||||||||||||||
other components to avoid duplicating the availability logic in the Deployment | |||||||||||||||||||||||
controller. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Objects may report multiple conditions, and new types of conditions may be | |||||||||||||||||||||||
added in the future or by 3rd party controllers. Therefore, conditions are | |||||||||||||||||||||||
represented using a list/slice, where all have similar structure. | |||||||||||||||||||||||
represented using a list/slice of objects, where each condition has a similar | |||||||||||||||||||||||
structure. This collection should be treated as a map with a key of `type`. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Conditions are most useful when they follow some consistent conventions: | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Conditions should be added to explicitly convey properties that users and | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really agree. I think there's room in the world for two kinds of conditions:
This assumes some sort of bridge controller which understands all the reportables and can summarize them into the consumables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we allow Condition types to be namespaced (Tim's idea), it would be possible to summarize all namespaced conditions with a consistent polarity using a "bridge controller". Existing Conditions would need some sort of annotation to indicate their expected polarity, since they are generally mixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's two ideas here?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm lost on this thread - I don't know what a "bridge controller" is for? Regarding polarity, we could have an indicator, as Daniel suggests, so it is self-describing. But "good/bad" is a pretty vague proclamation. And we'd have to make it optional ANYWAY, so ... If the goal is to have machines understand arbitrary conditions, we need an indicator of normal or abnormal polarity. Is that the goal? Or do we expect that most consumers of conditions are either a) humans who can read the name and infer polarity; b) programs coded by humans to look for specific conditions with a priori known polarity? We can't simply say that Conditions are "summarizations" of other info, because they are ALSO the only extension point for external status. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to reach the point where machines can manage arbitrary Conditions. Evolving that from the current state may require additional indicator fields, depending on implementation strategy. If there are existing status contributors with mixed polarity within an object, we'll definitely need a "Polarity" field. If we perform the status aggregation in the "main" controller for the object, we can do this object-by-object and might be able to avoid Polarity. I interpreted @lavalamp 's idea of a "bridge controller" as a single controller which could operate across all Kinds and perform this aggregation based on mechanical rules. Doing this aggregation within the object's status avoids duplicating the aggregation logic across many clients; doing this in a "bridge controller" avoids duplicating the aggregation in many controllers (which may or may not be as much of an issue). I'm not sure how many aggregated conditions are needed (vs creating additional objects for "consumable" measurements that are composite). We got away with one in Knative, but singletons always come back to greet you. The advantage of codifying the condition aggregation of "aspect of" in a controller is that it allows new controllers to contribute observations without needing to modify the logic of the main controller to recognize the new conditions. Playing with an experiment we did in Knative, it would be possible to add "Severity" and "Polarity" fields:
🗡️ - For legacy conditions, we might want to prioritize updating those that should be treated as "Error" to set a polarity, then those that should be prioritized as "Warning". If we proceed object-at-a-time, the Polarity field might be able to be dropped, but this would probably require a certain amount of explicit handling of legacy conditions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not inherently against more self-describing, extensible status, but tat's a significant change from Conditions today. Ready is a condition of its own and we don't have "virtual" conditions. So clients would need to be taught this new structure, which is obviously backwards-incompatible. CF the design around Pod There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In other words, if you want something like this, I think it needs to be totally additive (a new concept) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we update the API conventions to reflect current actual usage (encompassing core and out of core) and then suggest new usage in a separate thread? I think this PR does a good job of the former. |
|||||||||||||||||||||||
components care about rather than requiring those properties to be inferred | |||||||||||||||||||||||
from other observations. Once defined, the meaning of a Condition can not be | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe worth expanding on "meaning"? Specifically roll-up sorts of signals are valuable because consumers do not have to do look at N raw signals and make the same inference AND because that set of N raw signals can evolve into M raw signals, and the roll-up is handled automatically. See for example pod readiness gates - the raw signals got more complex, but consumers only had to worry about the "Ready" condition. In that case the internal composition of "Ready" changed, but the external semantic did not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't really enforce this in code, so users can't really rely on this. I'd therefore phrase it as a recommendation rather than requirement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "meaning" here is the "intent" of the Condition -- it may become possible to more accurately measure the intent. For example, adding "is there enough disk space" to the set of conditions that indicate "this Node is able to accept work" or "the Pods on this Node are likely to function properly" -- but it would not be acceptable to swap "able to accept work" intent for an "Pods function properly" intent; the latter would need to be a new Condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My original comment was that the text could be clearer about what "meaning" means - how abstract that is vs how it is literally derived. But I think it also unclear if you mean that conditions should exclusively be used as "summaries" of other information or that "summaries" are one of several reasonable uses of conditions? And I am still looking for answers to my question on principle. Why is a "ready" pronouncement better a a condition than as a field? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My read is that conditions have consistent metadata attached to them (at least as of #1624) as to when and why they last transitioned states, including now the |
|||||||||||||||||||||||
changed arbitrarily - it becomes part of the API, and has the same backwards- | |||||||||||||||||||||||
and forwards-compatibility concerns of any other part of the API. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Controllers should apply their conditions to a resource the first time they | |||||||||||||||||||||||
visit the resource, even if the `status` is Unknown. This allows other | |||||||||||||||||||||||
components in the system to know that the condition exists and the controller | |||||||||||||||||||||||
is making progress on reconciling that resource. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Not all controllers will observe the previous advice about reporting | |||||||||||||||||||||||
"Unknown" or "False" values. For known conditions, the absence of a | |||||||||||||||||||||||
condition status should be interpreted the same as `Unknown`, and | |||||||||||||||||||||||
typically indicates that reconciliation has not yet finished (or that the | |||||||||||||||||||||||
resource state may not yet be observable). | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* For some conditions, `True` represents normal operation, and for some | |||||||||||||||||||||||
conditions, `False` represents normal operation. ("Normal-true" conditions | |||||||||||||||||||||||
are sometimes said to have "positive polarity", and "normal-false" conditions | |||||||||||||||||||||||
are said to have "negative polarity".) Without further knowledge of the | |||||||||||||||||||||||
conditions, it is not possible to compute a generic summary of the conditions | |||||||||||||||||||||||
on a resource. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Condition type names should make sense for humans; neither positive nor | |||||||||||||||||||||||
negative polarity can be recommended as a general rule. A negative condition | |||||||||||||||||||||||
like "MemoryExhausted" may be easier for humans to understand than | |||||||||||||||||||||||
"SufficientMemory". Conversely, "Ready" or "Succeeded" may be easier to | |||||||||||||||||||||||
understand than "Failed", because "Failed=Unknown" or "Failed=False" may | |||||||||||||||||||||||
cause double-negative confusion. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Condition type names should describe the current observed state of the | |||||||||||||||||||||||
resource, rather than describing the current state transitions. This | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sufficiently slow transitions are best thought of as states in and of themselves, so I'm not convinced about this paragraph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm ready to sign off on the other changes, I'm on the fence about whether to ask for a change here or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can update to mention slow transitions as states (LetsEncrypt cert provisioning strikes me as an example) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this paragraph was the only one I bumped on. I'm thinking of conditions like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess? I don't really see a distinction between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, it seems that
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think
(original) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a new paragraph suggesting |
|||||||||||||||||||||||
typically means that the name should be an adjective ("Ready", "OutOfDisk") | |||||||||||||||||||||||
or a past-tense verb ("Succeeded", "Failed") rather than a present-tense verb | |||||||||||||||||||||||
("Deploying"). Intermediate states may be indicated by setting the status of | |||||||||||||||||||||||
the condition to `Unknown`. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* For state transitions which take a long period of time (rule of thumb: > 1 | |||||||||||||||||||||||
minute), it is reasonable to treat the transition itself as an observed | |||||||||||||||||||||||
state. In these cases, the Condition (such as "Resizing") itself should not | |||||||||||||||||||||||
be transient, and should instead be signalled using the | |||||||||||||||||||||||
`True`/`False`/`Unknown` pattern. This allows other observers to determine | |||||||||||||||||||||||
the last update from the controller, whether successful or failed. In cases | |||||||||||||||||||||||
where the state transition is unable to complete and continued | |||||||||||||||||||||||
reconciliation is not feasible, the Reason and Message should be used to | |||||||||||||||||||||||
indicate that the transition failed. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* When designing Conditions for a resource, it's helpful to have a common | |||||||||||||||||||||||
top-level condition which summarizes more detailed conditions. Simple | |||||||||||||||||||||||
consumers may simply query the top-level condition. Although they are not a | |||||||||||||||||||||||
consistent standard, the `Ready` and `Succeeded` condition types may be used | |||||||||||||||||||||||
by API designers for long-running and bounded-execution objects, respectively. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
The `FooCondition` type for some resource type `Foo` may include a subset of the | |||||||||||||||||||||||
following fields, but must contain at least `type` and `status` fields: | |||||||||||||||||||||||
|
@@ -347,20 +407,13 @@ Use of the `Reason` field is encouraged. | ||||||||||||||||||||||
Use the `LastHeartbeatTime` with great caution - frequent changes to this field | |||||||||||||||||||||||
can cause a large fan-out effect for some resources. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Conditions should be added to explicitly convey properties that users and | |||||||||||||||||||||||
components care about rather than requiring those properties to be inferred from | |||||||||||||||||||||||
other observations. Once defined, the meaning of a Condition can not be | |||||||||||||||||||||||
changed arbitrarily - it becomes part of the API, and has the same backwards- | |||||||||||||||||||||||
and forwards-compatibility concerns of any other part of the API. | |||||||||||||||||||||||
Condition types should be named in PascalCase. Short condition names are | |||||||||||||||||||||||
preferred (e.g. "Ready" over "MyResourceReady"). | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Condition status values may be `True`, `False`, or `Unknown`. The absence of a | |||||||||||||||||||||||
condition should be interpreted the same as `Unknown`. How controllers handle | |||||||||||||||||||||||
`Unknown` depends on the Condition in question. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Condition types should indicate state in the "abnormal-true" polarity. For | |||||||||||||||||||||||
example, if the condition indicates when a policy is invalid, the "is valid" | |||||||||||||||||||||||
case is probably the norm, so the condition should be called "Invalid". | |||||||||||||||||||||||
|
|||||||||||||||||||||||
The thinking around conditions has evolved over time, so there are several | |||||||||||||||||||||||
non-normative examples in wide use. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
|
@@ -371,12 +424,12 @@ we define comprehensive state machines for objects, nor behaviors associated | ||||||||||||||||||||||
with state transitions. The system is level-based rather than edge-triggered, | |||||||||||||||||||||||
and should assume an Open World. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
An example of an oscillating condition type is `Ready` (despite it running | |||||||||||||||||||||||
afoul of current guidance), which indicates the object was believed to be fully | |||||||||||||||||||||||
operational at the time it was last probed. A possible monotonic condition | |||||||||||||||||||||||
could be `Failed`. A `True` status for `Failed` would imply failure with no | |||||||||||||||||||||||
retry. An object that was still active would generally not have a `Failed` | |||||||||||||||||||||||
condition. | |||||||||||||||||||||||
An example of an oscillating condition type is `Ready`, which indicates the | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
|||||||||||||||||||||||
object was believed to be fully operational at the time it was last probed. A | |||||||||||||||||||||||
possible monotonic condition could be `Succeeded`. A `True` status for | |||||||||||||||||||||||
`Succeeded` would imply completion and that the resource was no longer | |||||||||||||||||||||||
active. An object that was still active would generally have a `Succeeded` | |||||||||||||||||||||||
condition with status `Unknown`. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Some resources in the v1 API contain fields called **`phase`**, and associated | |||||||||||||||||||||||
`message`, `reason`, and other status fields. The pattern of using `phase` is | |||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this really inverts the previous guidance, which was sort of agonized over (and with which I happen to agree). Available (or Ready in the previous edit) are omnipresent and (I think?) are never "unknown".
Why are those correct as Conditions rather than fields? What principal guides one to decide between those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have only one argument to make - it's this one. As written, many status fields should just become conditions, and I don't think that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were going to have a distinct subresource for conditions, which we could RBAC distinctly, then I could see an argument forming around it. That doesn't answer the question for "core" conditions like ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Tim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at Deployment, the current set of Conditions are a bit subtle for a newcomer:
Oddly, a deployment which does not have any changes in flight is "Progressing=True", which suggests a positive polarity, but maybe a poor name choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to look at those more carefully before I could agree that Deployment's conditions are a good example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this speaks to the question of principle - what is the litmus test that helps me decide - field or condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose the following guidance:
Expose the underlying measurements used to make decisions as fields.
If the outcome of that decision is interesting to a human (roughly: does it affect the behavior or functionality of the object in an externally observable way), provide a single Condition which summarizes whether that aspect of the controller algorithm is operating correctly ("within the envelope").
If a controller is adding status to an object and cannot modify the object to set fields, it must summarize the decision and record it in a Condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditions summarize characteristics of object lifecycle that must be generic across multiple types, have a need to convey both state and human message, and can be extended by third parties to add more nuance to existing lifecycle.
An object in Kube is a distributed state machine. The state specified to that object should always be fields. Some objects have consistent fields that describe status of the state machine. One of those is observedGeneration. Another one is conditions. Some conditions define generic, multi-object state machine characteristics like Available, Progressing, Ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about "must be generic across multiple types" may be true for core kubernetes types, but it does not appear to be true in the general cases of non-core types.
Since people use these guidelines for non-core types, I think we need to acknowledge how they are used outside of core.
I've looked at 364 different .go files and found that most conditions are only used in a single file. By inspection it appears that many of those are specific to a single kind.
Here is a list:
https://gist.github.com/erictune/47d5bfb0fd0c1a7273ecaa5a9fcf5d52#file-topconditions-txt
Details on the method at https://gist.github.com/erictune/47d5bfb0fd0c1a7273ecaa5a9fcf5d52