Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossplane custom resources status should be compatible with kstatus #2672

Open
erikgb opened this issue Oct 26, 2021 · 23 comments
Open

Crossplane custom resources status should be compatible with kstatus #2672

erikgb opened this issue Oct 26, 2021 · 23 comments
Labels

Comments

@erikgb
Copy link
Contributor

erikgb commented Oct 26, 2021

What problem are you facing?

We are attempting to provision crossplane custom resources (i.e. providers) with FluxCD, and seem to have issues enabling health assessment for crossplane custom resources. And it seems like the problem is related to the following requirement for custom resources:

A health check entry can reference one of the following types: Custom resources that are compatible with kstatus

And for crossplane providers, it seems like the status conditions are not compatible:

status:
  conditions:
  - lastTransitionTime: "2021-10-26T11:36:13Z"
    reason: HealthyPackageRevision
    status: "True"
    type: Healthy
  - lastTransitionTime: "2021-10-26T11:36:06Z"
    reason: ActivePackageRevision
    status: "True"
    type: Installed

The workaround is to disable health checks for crossplane resources in FluxCD - which is far from optimal.

How could Crossplane help solve your problem?

It would be convenient if all crossplane custom resources had status conditions compatible with kstatus.

I raised this question on Slack, and got a reply from @muvaf:

All managed resource CRDs and CRDs produced from XRDs have Ready condition. I believe we can have it in package APIs, too. Likely, we’d either rename Healthy to Ready or add another one.

@erikgb erikgb added the enhancement New feature or request label Oct 26, 2021
@negz negz added the proposal label Nov 22, 2021
@negz
Copy link
Member

negz commented Nov 22, 2021

I'm not sure about compatibility with all of kstatus - e.g. adding all of the "abnormal true" statuses mentioned at https://github.com/kubernetes-sigs/cli-utils/tree/45a0b5b7e6b292c0a59c899aa50a3662a0c1cf7d/pkg/kstatus#statuses.

That said, if tooling like FluxCD is starting to treat Ready as the canonical readiness condition it could make sense to use that everywhere. As @muvaf mentioned we already use it almost everywhere.

Breadcrumbs to crossplane/crossplane-runtime#198, which is proposing we remove the Synced condition managed resources currently have (in addition to Ready).

@erikgb erikgb changed the title Crossplane custom resources status should compatible with kstatus Crossplane custom resources status should be compatible with kstatus Nov 23, 2021
@negz
Copy link
Member

negz commented Dec 8, 2021

Likely, we’d either rename Healthy to Ready or add another one.

Conditions are intended to be machine parseable, so renaming would be a breaking change. I think we'd need to add a Ready condition in addition to what was already there. @hasheddan do you have any feelings about this WRT the package types? I think they and XRD are the only types that would be affected - there are some other types like Composition that don't have any status conditions but they also don't have any real concept of readiness.

My thinking is that duplicate conditions with the same meaning (e.g. Healthy and Ready) are a bit awkward and unfortunate, but not so bad if it increases compatibility. In a way I kind of like having one condition that may be more contextually meaningful (e.g. Healthy, or Offered) together with a more generic but consistent one (Ready).

@erikgb
Copy link
Contributor Author

erikgb commented Jan 21, 2022

To not break compatibility, I'll suggest adding a Ready condition with the same semantics as Healthy - at least for now. A cleanup might be considered for Crossplane 2.x. 😉 And also add an observedGeneration field to the status of all Crossplane resources. WDYT?

@negz
Copy link
Member

negz commented Feb 25, 2022

@erikgb I'm on board with adding a Ready condition to all resources (e.g. packages etc) alongside their existing ones. observedGeneration sounds good to me too.

@bgrant0607
Copy link

FWIW, Config Sync and kpt also assume kstatus.

@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Sep 23, 2022
@erikgb
Copy link
Contributor Author

erikgb commented Sep 23, 2022

/fresh

Still relevant.

@github-actions github-actions bot removed the stale label Sep 23, 2022
@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@yhrn
Copy link

yhrn commented Jun 4, 2023

/fresh

kstatus compatibility would be very much desirable for us as well. Especially the lack of observedGeneration on claims is problematic since it means we can't really determine if an object is truly "up-to-date" or if Crossplane is just slow/down.

@github-actions github-actions bot removed the stale label Jun 4, 2023
@yhrn
Copy link

yhrn commented Jun 4, 2023

@negz seems like marking the issue as fresh does not re-open it. Could you maybe open it again?

@erikgb
Copy link
Contributor Author

erikgb commented Jun 4, 2023

/reopen

@chlunde
Copy link
Contributor

chlunde commented Apr 23, 2024

To not break compatibility, I'll suggest adding a Ready condition with the same semantics as Healthy - at least for now. A cleanup might be considered for Crossplane 2.x. 😉 And also add an observedGeneration field to the status of all Crossplane resources. WDYT?

Is this just a "do"-task at this moment, no design or unknowns? In that case maybe a "good first issue"?

@erikgb
Copy link
Contributor Author

erikgb commented Apr 23, 2024

Is this just a "do"-task at this moment, no design or unknowns? In that case maybe a "good first issue"?

Well, I would say no to your question. The resources will NOT be compatible with KStatus with the current Healthy condition on successfully reconciled resources. KStatus defines a resource as successfully reconciled when .status.observedGeneration == .metadata.generation AND the resource has no conditions (or a single Ready condition with status true).

@jbw976
Copy link
Member

jbw976 commented Apr 23, 2024

There's actually a bit of discussion about kstatus now on this design PR:

Feel free to weigh in with opinions there too 🙇‍♂️

@chlunde
Copy link
Contributor

chlunde commented Apr 23, 2024

Would it be a good idea to handle some core crossplane resource types Provider, Function, Configuration, here? And keep the proposal is specifically about managed resources? At least I can check if it seems "doable" if you're OK with that, independent of the proposal.

My goal here is simply to detect that provider and function upgrades are ready in our CI cluster before we actually run the CI, and since we use flux, that would work almost automatically if we fix Provider and Function objects.

@chlunde
Copy link
Contributor

chlunde commented Apr 23, 2024

I just verified how kstatus would interpret different conditions if we add status.observedGeneration and a Ready-condition equal to the Healthy-condition, and that this would make sense. Looks OK to me (although Unknown -> InProgress is a bit surprising)

func TestKStatusCompatibility(t *testing.T) {
	observedGenerationNotUpdated := &v1.Configuration{}
	observedGenerationNotUpdated.SetGeneration(2)
	observedGenerationNotUpdated.SetObservedGeneration(1)
	observedGenerationNotUpdated.SetConditions(v1.Ready())

	ready := &v1.Configuration{}
	ready.SetGeneration(1)
	ready.SetObservedGeneration(1)
	ready.SetConditions(v1.Ready())

	notReady := &v1.Configuration{}
	notReady.SetGeneration(1)
	notReady.SetObservedGeneration(1)
	notReady.SetConditions(v1.NotReady())

	unknown := &v1.Configuration{}
	unknown.SetGeneration(1)
	unknown.SetObservedGeneration(1)
	unknown.SetConditions(v1.UnknownReady())

	/* available kstatus Status-es:
		InProgressStatus  Status = "InProgress"
		FailedStatus      Status = "Failed"
		CurrentStatus     Status = "Current"
		TerminatingStatus Status = "Terminating"
		NotFoundStatus    Status = "NotFound"
		UnknownStatus     Status = "Unknown"
	*/

	type args struct {
		obj *v1.Configuration
	}

	type want struct {
		status status.Status
	}

	cases := map[string]struct {
		reason string
		args   args
		want   want
	}{
		"ObservedGenerationNotUpdated": {
			reason: "The condition should not be used when the observed generation is not updated",
			args: args{
				obj: observedGenerationNotUpdated,
			},
			want: want{
				status: status.InProgressStatus,
			},
		},
		"Ready": {
			reason: "Ready -> Current",
			args: args{
				obj: ready,
			},
			want: want{
				status: status.CurrentStatus,
			},
		},
		"NotReady": {
			reason: "NotReady -> InProgress",
			args: args{
				obj: notReady,
			},
			want: want{
				status: status.InProgressStatus,
			},
		},
		"Unknown": {
			reason: "unknown -> InProgress",
			args: args{
				obj: unknown,
			},
			want: want{
				status: status.InProgressStatus,
			},
		},
	}

	for name, tc := range cases {
		t.Run(name, func(t *testing.T) {
			unstructuredBody, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&tc.args.obj)
			if err != nil {
				t.Fatal(err)
			}
			u := &unstructured.Unstructured{Object: unstructuredBody}

			kstatus, err := status.Compute(u)
			if err != nil {
				t.Fatal(err)
			}

			if kstatus.Status != tc.want.status {
				t.Errorf("expected status %s, got %s", tc.want.status, kstatus.Status)
			}
		})
	}

}

@jbw976
Copy link
Member

jbw976 commented Apr 24, 2024

Thanks for that practical example to explore how the Configuration.pkg.crossplane.io/v1 type could get closer to being kstatus compatible. I appreciate your hands-on approach of writing the test cases to demonstrate how it behaves, esp. after the kstatus overview confused me in #5453 (comment) 😂

There would be more consideration here across all the core crossplane types and MRs so we have a complete defined experience for everything, but so far this is informative. Some more areas to consider:

  • should the kstatus recommended Reconciling and Stalled conditions be adopted as well?
  • how could this be implemented without breaking any previous assumptions for conditions used by Crossplane?
  • should all core types (packages, XRs, claims, etc.) as well as MRs adopt a consistent experience, and if not, where should we diverge?
  • some other kstatus status values (e.g. Failed) are not shown in the test cases above - what properties/values trigger them too?

I wonder if we should take the full matrix of what kstatus supports, the matrix of Crossplane types (and MRs in general), and just holistically capture the state of each type that would make it all work - to be very complete about it. Perhaps that is the focus that #5453 should be taking? /cc @TerjeLafton 🤔

@TerjeLafton
Copy link
Contributor

Sounds good! Implementing kstatus without any breaking changes would be perfect.
I did however comment one concern here, but it's not the most relevant for this issue.

I have mostly been thinking about managed resources. Of course, I would always vote for consistency, but I don't know enough about other Crossplane resources to say what it would mean to make them compliant too.

@erikgb
Copy link
Contributor Author

erikgb commented Apr 24, 2024

Implementing kstatus without any breaking changes would be perfect.

Sadly, I don't think this is possible.

@GMoraesGarcia
Copy link

Integrating kstatus compatibility would be really nice. I'm currently working with FluxCD kustomization health checks to receive notifications not just from FluxCD CRDs, but from the entire environment

@bobh66
Copy link
Contributor

bobh66 commented Jul 22, 2024

I could see adding the Ready condition to the resources that don't have it, and supporting observedGeneration, but I'm not convinced that any of the other items make sense. I'm not a fan of the "abnormal=true" pattern which I've seen too many (non-Crossplane) resources implement.

@erikgb
Copy link
Contributor Author

erikgb commented Jul 22, 2024

I'm not a fan of the "abnormal=true" pattern which I've seen too many (non-Crossplane) resources implement.

Could you explain what you don't like about it, @bobh66? We have events in Kubernetes to report important state changes. And what's the point of keeping old conditions? If we need more permanent status information on resources, we should add fields for it IMO. And following a standard, that will allow tools to integrate more easily is much better than not following a standard. Even if you think the standard is wrong. 😉

@bobh66
Copy link
Contributor

bobh66 commented Jul 23, 2024

Could you explain what you don't like about it, @bobh66?

Mostly I don't like the fact that I can't know whether a particular condition being set is good or bad without analyzing the content of the condition. I have no problem with extra conditions to expose internal state, but for consistency they should all evaluate to True in the "good" case, and False in the "failed" case. Clients should not need to implement custom logic to analyze a list of conditions to determine readiness.

The spec mentions:

it's helpful to have a common top-level condition which summarizes more detailed conditions.

but provides no method of classifying/ranking/prioritizing conditions to know which ones are the top-level and which are the details.

In many cases I have seen a list of possible failure conditions listed as condition types, possibly along with a Ready condition, but there is no way to know whether Ready=True means that the other conditions can be ignored. If both Ready=True and ValidationFailed=True are both set, what does that mean?

If I thought it was valid to assume that when Ready=True none of the other conditions matter, that would be fine, but I'm reluctant to assume that. When a resource has a list of abnormal condition types that are essentially bad things that might happen, I have to check them all to make sure they didn't happen before I know the resource is actually Ready.

Abnormal conditions can be reported as Events or in the reason field of the Ready condition when it is set to False. I don't think there is any value in supporting a list of failure condition types that should always be False except in error conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants