Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide recommended .status.conditions schema #1624

Merged
merged 3 commits into from
May 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 162 additions & 0 deletions keps/sig-api-machinery/1623-standardize-conditions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# KEP-1623: Standardize Conditions.

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Noteworthy choices](#noteworthy-choices)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

<!--
**ACTION REQUIRED:** In order to merge code into a release, there must be an
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
of the targeted release**.

For enhancements that make changes to code or processes/procedures in core
Kubernetes i.e., [kubernetes/kubernetes], we require the following Release
Signoff checklist to be completed.

Check these off as they are completed for the Release Team to track. These
checklist items _must_ be updated for the enhancement to be released.
-->

- [ ] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] KEP approvers have approved the KEP status as `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

<!--
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
-->

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

While many Kuberentes APIs have `.status.conditions`, the schema of `condition` varies a lot between them.
There is very little commonality at the level of serialization, proto-encoding, and required vs optional.
Conditions are central enough to the API to make a common golang type with a fixed schema.
The schema can be a strong recommendation to all API authors.

## Motivation

Allow general consumers to expect a common schema for `.status.conditions` and share golang logic for common Get, Set, Is for `.status.conditions`.
The pattern is well-established and we have a good sense of the schema we now want.

### Goals

1. For all new APIs, have a common type for `.status.conditions`.
2. Provide common utility methods for `HasCondition`, `IsConditionTrue`, `SetCondition`, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited to see this proposal!

As the list of the common utility methods is still in discussion, wanted to list a couple of condition util methods we found to be very frequently used at Rancher - callbacks DoUntilTrue/Once/Do and ReasonAndMessageFromError (method setting condition fields from error obj): https://github.com/rancher/norman/blob/146f45dafa623a34864cbeeb3a39d5903daa2995/condition/condition.go#L131

3. Provide recommended defaulting functions that set required fields and can be embedded into conversion/default functions.

### Non-Goals

1. Update all existing APIs to make use of the new condition type.

## Proposal

Introduce a type into k8s.io/apimachinery/pkg/apis/meta/v1 for `Condition` that looks like
```go
type Condition struct {
// Type of condition in CamelCase or in foo.example.com/CamelCase.
// Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be
liggitt marked this conversation as resolved.
Show resolved Hide resolved
// useful (see .node.status.conditions), the ability to deconflict is important.
// +required
Type string `json:"type" protobuf:"bytes,1,opt,name=type"`
// Status of the condition, one of True, False, Unknown.
// +required
Status ConditionStatus `json:"status" protobuf:"bytes,2,opt,name=status"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're normalizing, should we consider bigger changes - like making this an optional bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're normalizing, should we consider bigger changes - like making this an optional bool?

I think we've been happy with this as an enumerated string to cover Unknown as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is happy? All I see are calls to ParseBool. "Unknown" as a value is just weird because we also have to handle a non-specified condition, which is truly unknown.

TL;DR I am not super happy with Conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called this out as unresolved.

I guess I'm arguing for a larger, deeper, not really compatible change.

I think the acknowledgment of unknown has worked well. It gives an option for acknowledging awareness of a condition without the ability to inspect the current value. For instance, an inability to reach an external system results in an Unknown status for the "is-it-installed" condition. You cannot do that with the absence of a condition. I think the concern about ParseBool is well handled by an IsConditionTrue and IsConditionFalse which comments here indicate are commonly used.

If we were rebooting Condition, I would argue harder for *bool as the totally unambiguous representation. Given that you're not rebooting, just normalizing, I will waive my objection. I just wanted it on the record, I guess. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that everyone is already using "Unknown" as a value, I think we have to keep it. :/

Documentation should make clear that "Unknown" is a fact about the writer of the condition, and not a claim about the object.

Good: {Ready, Unknown} to mean: "it might or might not be ready, we don't know."

Good: {IPAssigned, Unknown} to mean: "it might or might not have gotten an IP address, we don't know."

Very bad: {IPAssigned, Unknown} to mean: "it has an IP address, but we don't know what it is."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

But also <no key "IPAssigned"> is equivalent to {IPAssigned, Unknown} for most intents.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{IPAssigned, Unknown} gives a hint to other participants that the "IPAssigned" condition may still be reconciled, while <no key "IPAssigned"> makes it more difficult for other participants to know whether IP assignment is applicable in this case. But this is mostly about the case for *bool vs bool, rather than the string enum that we have vs what we wish.

Would it be reasonable to fill in status: "" as status: "Unknown" somewhere in the apiserver side for Conditions (read side or write side)? My gut says "no", but my heart says "please".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think status: "" should be canonicalized as status: "Unknown".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Default to", or just "assumed to mean"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to fill in status: "" as status: "Unknown" somewhere in the apiserver side for Conditions (read side or write side)? My gut says "no", but my heart says "please".

I would require condition writers to say what they mean and specify the status value.

I'd much rather have a condition-adding patch that typos "statuss":"True" get rejected with a "status is required" error than get accepted with status defaulted to "Unknown"

// If set, this represents the .metadata.generation that the condition was set based upon.
// For instance, if .metadata.generation is currently 12, but the .status.condition[x].observedGeneration is 9, the condition is out of date
// with respect to the current state of the instance.
// +optional
ObservedGeneration int64 `json:"observedGeneration,omitempty" protobuf:"varint,3,opt,name=observedGeneration"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionals should be pointers?

// Last time the condition transitioned from one status to another.
// This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
// +required
LastTransitionTime metav1.Time `json:"lastTransitionTime" protobuf:"bytes,4,opt,name=lastTransitionTime"`
// The reason for the condition's last transition in CamelCase.
// The specific API may choose whether or not this field is considered a guaranteed API.
// This field may not be empty.
// +required
Reason string `json:"reason" protobuf:"bytes,5,opt,name=reason"`
// A human readable message indicating details about the transition.
// This field may be empty.
// +required
Comment on lines +103 to +104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination of "This field may be empty." and "+required" is confusing to me. How should this be interpreted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination of "This field may be empty." and "+required" is confusing to me. How should this be interpreted?

required is about the serialization format, empty is about the content of that serialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I get what it means now, but still think it's confusing. Is this how we should be implementing all new optional string fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field is a non-pointer type -- can it even be optional? (how would the consumer on the controller side even know)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value can be omitted in the serialization if it's "omitempty", so yes, it's possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not clear to me why @deads2k wants this field to be this way? Why not call it optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not clear to me why @deads2k wants this field to be this way? Why not call it optional?

Because optional required a pointer and I will do nearly anything to avoid a pointer in a commonly used struct like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like observedGeneration is an optional field that is also not a pointer. It sounds like the guidance for that is when "the zero value is sufficiently invalid that it doesn't create any ambiguity", would that also apply here? It seems like there's no point in distinguishing between empty and nil messages here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like observedGeneration is an optional field that is also not a pointer. It sounds like the guidance for that is when "the zero value is sufficiently invalid that it doesn't create any ambiguity", would that also apply here? It seems like there's no point in distinguishing between empty and nil messages here.

I'd rather fuss over this in the actual API PR. observedGeneration was not an easy sell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mark it +optional without making it omitempty so a client can submit a condition without the field without client-side validation complaining bitterly, but still always serialize it in server responses.

Agree we can review this bit during API review in the PR, when we have the benefit of fixture tests and generated openapi schemas showing us exactly the effect on clients and server output.

Message string `json:"message" protobuf:"bytes,6,opt,name=message"`
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to define a type for metav1.Conditions []metav1.Condition, to support adding helper methods for operating on the set of conditions.

Cribbing from this interface, we've found the following useful:

type ConditionManager interface {
	// SetCondition sets or updates the Condition on Conditions for Condition.Type.
	// If there is an update, Conditions are stored back sorted.
	SetCondition(new Condition)

	// ClearCondition removes the non terminal condition that matches the ConditionType
	ClearCondition(t ConditionType) error

	// MarkTrue sets the status of t to true, and then marks the happy condition to
	// true if all dependents are true.
	MarkTrue(t ConditionType)

	// MarkTrueWithReason sets the status of t to true with the reason, and then marks the happy
	// condition to true if all dependents are true.
	MarkTrueWithReason(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkUnknown sets the status of t to Unknown and also sets the happy condition
	// to Unknown if no other dependent condition is in an error state.
	MarkUnknown(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkFalse sets the status of t and the happy condition to False.
	MarkFalse(t ConditionType, reason, messageFormat string, messageA ...interface{})
}

The ConditionManager interface also codifies patterns around having a single top-level "happy" Condition (aka "Ready" or "Succeeded") and supports feeding dependent conditions into the top-level condition. Note that the MarkUnknown and MarkFalse methods both require reason and message fields, but MarkTrue does not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great candidate for client-go and/or controller helper libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great candidate for client-go and/or controller helper libraries.

Yeah, probably something very similar to that interface above like https://github.com/openshift/library-go/blob/master/pkg/operator/v1helpers/helpers.go#L49-L108

Copy link
Member

@MikeSpreitzer MikeSpreitzer May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evan: does your suggestion require the helper methoda to be defined in the metav1 package? If so, then it does not suffice to define an interface; we would need to also write the implementations for metav1.Conditions in metav1.


This is not strictly compatible with any of our existing conditions because of either proto ordinals,
required vs optional, or omitEmpty or not.
However, it encapsulates the best of what we've learned and will allow new APIs to have a unified type.

### Noteworthy choices
1. `lastTransitionTime` is required.
Some current implementations allow this to be missing, but this makes it difficult for consumers.
By requiring it, the actor setting the field can set it to the best possible value instead of having clients try to guess.
2. `reason` is required and must not be empty.
The actor setting the value should always describe why the condition is the way it is, even if that value is "unknown unknowns".
No other actor has the information to make a better choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to mention whether reason needs to be non-empty when the condition is True. (For example, if Ready=True, the reason may be superfluous.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to mention whether reason needs to be non-empty when the condition is True. (For example, if Ready=True, the reason may be superfluous.)

It should always be non-empty. I'll update to be more clear.

3. `lastHeartbeatTime` is removed.
This field caused excessive write loads as we scaled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, other resource types named this field differently, such as lastProbeTime. I agree that the cost has outweighed the benefit, though.

If an API needs this concept, it should codify it separately and possibly using a different resource.
4. `observedGeneration` does not follow the standard requirement of "all optional fields are pointers".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked through the reasoning of an optional, non-pointer field in this case with @lavalamp and @liggitt . This is where I think we landed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we never start committing to this, we'll never reach consistency.

I waive my concern - it's not worth arguing about piece by piece. It doesn't matter much for JSON. I would expect it changes the on-wire proto representation, but on inspection it seems not. That is surprising to me, but the proto marshal/unmarshal code is human-hostile, so ... <shrug>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule is, the zero value is sufficiently invalid that it doesn't create any ambiguity. Omitted-on-the-wire needs to mean "I have no opinion". As long as that can't be confused for "I have a positive opinion that it should be the zero value", it's fine.

I also am a little unhappy about this making things more confusing. Basically the argument is that it's worth the convenience of not having the pointer in go.

This rule originated from the need to distinguish intent of zero value versus unset.
The `.metadata.generation` is never set to zero.
See the [CR strategy](https://github.com/kubernetes/apiextensions-apiserver/blob/release-1.18/pkg/registry/customresource/strategy.go#L88)
and [deployment strategy](https://github.com/kubernetes/kubernetes/blob/release-1.18/pkg/registry/apps/deployment/strategy.go)
as examples.
Because `.metadata.generation` is never zero-value, it is not necessary to distinguish between absent and zero-value observedGeneration.
Whether a client omits `observedGeneration` (because it is unaware of the new field) or explicitly sets it to 0, the
meaning is the same: the condition does not correspond to a known generation.
This also provides parity the `.metadata.generation` field [Generation int64 \`json:"generation,omitempty" protobuf:"varint,7,opt,name=generation"\`](https://github.com/kubernetes/apimachinery/blob/release-1.18/pkg/apis/meta/v1/types.go#L182-L185).

### Graduation Criteria

Because meta/v1 APIs are necessarily v1, this would go direct to GA.
Using a meta/v1beta1 isn't a meaningful distinction since this type is embedded into other types which own their own versions.
liggitt marked this conversation as resolved.
Show resolved Hide resolved

### Upgrade / Downgrade Strategy

This KEP isn't proposing that existing types be changed.
This means that individual upgrade/downgrade situations will be handled discretely.
By providing recommended defaulting functions, individual APIs will be able to more easily transition to the new condition type.

### Version Skew Strategy

Standard defaulting and conversion will apply.
APIs which have extra values for this type may have to go through an intermediate version that drops them or accept
that certain optional fields of their conditions will be dropped.
Depending on the individual APIs and when their extra fields are deprecated, this could be acceptable choice.

## Implementation History

## Drawbacks

1. There may be some one-time pain when new versions are created for APIs that wish to consume this common schema.
Switching is not strictly required, but it is encouraged.

## Alternatives

1. We could recommend a schema and not provide one. This doesn't seem very nice to consumers.

19 changes: 19 additions & 0 deletions keps/sig-api-machinery/1623-standardize-conditions/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: KEP Template
kep-number: 1623
authors:
- "@deads2k"
owning-sig: sig-api-machinery
participating-sigs:
- sig-api-machinery
- sig-apps
- sig-architecture
status: implementable
creation-date: 2020-03-23
reviewers:
- "@lavalamp"
- "@smarterclayton"
approvers:
- "@derekwaynecarr"
- "@liggitt"
see-also:
replaces: