Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds paginating lister for evaluating CRs' upgrade fitness versus new CRDs. #3387

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

grokspawn
Copy link
Contributor

Description of the change:
adds a paging lister for listing CRs deployed in the cluster, used for fitness testing new CRD versions during upgrade.

Motivation for the change:
scaling issues in clusters where listing CRs for an operator would exceed apiserver timeouts

Architectural changes:
since previously we attempted to load the list of all related CRs into memory, this should also significantly help with memory usage by OLM during upgrade evaluation.

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Signed-off-by: Jordan Keister <[email protected]>
@openshift-ci openshift-ci bot requested review from anik120 and dtfranz September 4, 2024 20:58
for _, cr := range crList.Items {
err = validation.ValidateCustomResource(field.NewPath(""), cr.UnstructuredContent(), validator).ToAggregate()
pager := pager.New(pager.SimplePageFunc(func(opts metav1.ListOptions) (runtime.Object, error) {
return dynamicClient.Resource(gvr).List(context.TODO(), metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return dynamicClient.Resource(gvr).List(context.TODO(), metav1.ListOptions{})
return dynamicClient.Resource(gvr).List(context.TODO(), opts)

Copy link
Contributor

@tmshort tmshort Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ListOptions argument opts isn't being used...
Presumably that's how it's paging/restricting output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in ca452be

Signed-off-by: Jordan Keister <[email protected]>
@grokspawn grokspawn requested a review from tmshort September 5, 2024 14:08
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 5, 2024
@grokspawn grokspawn added this pull request to the merge queue Sep 5, 2024
Merged via the queue into operator-framework:master with commit 48a856a Sep 5, 2024
11 of 12 checks passed
@grokspawn grokspawn deleted the paginate branch September 5, 2024 20:00
@jianzhangbjz
Copy link
Contributor

LGTM, test pass: https://issues.redhat.com/browse/OCPBUGS-35358
/lgtm

grokspawn added a commit to grokspawn/operator-lifecycle-manager that referenced this pull request Dec 9, 2024
We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators.  It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in operator-framework#3387.

This manifested in `InstallPlan` `.status.Message` something like:
```
retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\"
```

The difference between the predecessor calling convention and the one introduced in operator-framework#3387 appears to be that one is a pointer and the other is concrete.

old
```golang
unstructured.Unstructured{Object:map[string]interface...
```

new
```golang
&unstructured.Unstructured{Object:map[string]interface...
```

so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail).

But k8s already dereferences pointer parameters [here](https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141) during validation.  So that isn't it.

And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here](https://pkg.go.dev/k8s.io/[email protected]/pkg/apiserver/validation#ValidateCustomResource). So we cannot derive guidance from it.

Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here](https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54) then we achieve the desired results.

Signed-off-by: Jordan Keister <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Dec 9, 2024
* fixed to pass map[string]interface instead of unstructured.Unstructured

We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators.  It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in #3387.

This manifested in `InstallPlan` `.status.Message` something like:
```
retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\"
```

The difference between the predecessor calling convention and the one introduced in #3387 appears to be that one is a pointer and the other is concrete.

old
```golang
unstructured.Unstructured{Object:map[string]interface...
```

new
```golang
&unstructured.Unstructured{Object:map[string]interface...
```

so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail).

But k8s already dereferences pointer parameters [here](https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141) during validation.  So that isn't it.

And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here](https://pkg.go.dev/k8s.io/[email protected]/pkg/apiserver/validation#ValidateCustomResource). So we cannot derive guidance from it.

Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here](https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54) then we achieve the desired results.

Signed-off-by: Jordan Keister <[email protected]>

* adding tests

Signed-off-by: Jordan Keister <[email protected]>

---------

Signed-off-by: Jordan Keister <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants