Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finalizers for specialresources add metrics.go framework and two basic metrics #3

Merged
merged 24 commits into from
Feb 10, 2021

Conversation

dagrayvid
Copy link
Collaborator

@dagrayvid dagrayvid commented Dec 10, 2020

This PR adds finalizers to specialresources and metrics to SRO.
The implementation of finalizers is based heavily on the Operator SDK example.

The initial metrics added are:

  • Number of specialresources being managed
  • Which states have been completed per specialresource

More metrics can be added in the future. See metrics.go

Currently the only cleanup step done before finalizers are reconciled on specialresource deletion is to clear the completed states metric, however the scaffolding is there to later potentially add more complex cleanup later if necessary. We may want to later add some way to specify custom cleanup steps to the CRD.

@dagrayvid
Copy link
Collaborator Author

/cc @zvonkok
/cc @ArangoGutierrez

… added sample specialresource which will get used as alm-example when running make bundle
@@ -181,3 +218,57 @@ func createSpecialResourceFrom(r *SpecialResourceReconciler, name string) (srov1

return specialresource, errs.New("Created new SpecialResource we need to Reconcile")
}

func (r *SpecialResourceReconciler) finalizeSpecialResource(toFinalize srov1beta1.SpecialResource) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as this function to finalizers.go

@dagrayvid dagrayvid marked this pull request as ready for review February 1, 2021 18:37
@dagrayvid
Copy link
Collaborator Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 3, 2021
@ArangoGutierrez
Copy link

/label tide/merge-method-squash

@ArangoGutierrez
Copy link

Is this ready for review @dagrayvid ? should we unhold?

@dagrayvid
Copy link
Collaborator Author

Is this ready for review @dagrayvid ? should we unhold?

@ArangoGutierrez I don't expect any major changes so should be ready for review, but still trying to investigate some weird behavior with the reconciliation loop before unholding

@dagrayvid
Copy link
Collaborator Author

/unhold

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2021
@ArangoGutierrez
Copy link

/label tide/merge-method-squash

@openshift-ci-robot openshift-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 4, 2021
@ArangoGutierrez
Copy link

This PR is also adding finalizers, could to document that in the PR description so it is properly stored on the merge commit

@dagrayvid dagrayvid changed the title Add metrics.go framework for adding metrics and two basic metrics. Add finalizers for specialresources add metrics.go framework and two basic metrics Feb 4, 2021
@zvonkok
Copy link

zvonkok commented Feb 10, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2021
@zvonkok
Copy link

zvonkok commented Feb 10, 2021

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dagrayvid, zvonkok

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2021
@openshift-merge-robot openshift-merge-robot merged commit ecbdbd8 into openshift-psap:master Feb 10, 2021
dagrayvid added a commit to dagrayvid/special-resource-operator that referenced this pull request Jul 12, 2021
* Adding github actions OpenShift CI CD Jobs (openshift-psap#2)

* Added super-linter

* Initial OpenShift CI CD Job
Testing Makefile
New make file target with upstream Dockerfile
Upstream Dockerfile added
Fixed Typo
Added more steps
Using UBI image to have all tools
Using UBI8 base golang to reduce size
Added self-runner
Makefile simplification
Updated Dockerfile ubi go toolset does not work
Switchign to auth.json
Set the right braces for vars in makefile
Added golint and remove superlinter

* Some refactoring and created a deploy binary

* e2e deploy undeploy (openshift-psap#6)

* Added super-linter

* Initial OpenShift CI CD Job
Testing Makefile
New make file target with upstream Dockerfile
Upstream Dockerfile added
Fixed Typo
Added more steps
Using UBI image to have all tools
Using UBI8 base golang to reduce size
Added self-runner
Makefile simplification
Updated Dockerfile ubi go toolset does not work
Switchign to auth.json
Set the right braces for vars in makefile
Added golint and remove superlinter

* Added deploy and undeploy function and some SRO restructuring

* Fixed Typo

* Added missing logs for client package

* Fixed another typo

* Added parallel worflow (openshift-psap#8)

* Added parallel worflow

* Renamed the workflows and enabled vet

* Renamed OpenShift workflow

* Added sync events for workflows

* Added sync events for workflows

* Renamed event

* Changed to a more despretive dispatch name

* Making the linter happy

* Added depedency image-build first then e2e tests

* Set timeout for linter

* Wait for image build

* Added paths-ignore README.md no CI run on README updates

* Changed to passwd and username

* Increase interval secs

* Only create one CR for one specialresource

* Added undeploy

* Added manual dispatch (openshift-psap#11)

* Fix the MOC part of driver-container-base which is failing on 4.7 (openshift-psap#12)

* Add e2e test cases for the operator (openshift-psap#5)

* Added 3 test cases to verify that the operator is running and reporting status

* Refactored cluster operator e2e tests. Modified Upgradeable condition check

* Removing e2e test for upgradeable condition until that feature is added to the operator

* Refactoring deploy and undeploy to add the create-from-yaml features to the framework pkg. Refactoring basic e2e tests. WIP simple-kmod e2e test added

* Fixed typo in test/framework/create_from_yaml.go

* Implemented the simple-kmod e2e test, updated lustre-client for new api

* Correcting error messages for util.GetPodForNode

* Removing line used for testing in lustre-client

* Changes needed to deploy SRO from OperatorHub as a community operator (openshift-psap#13)

* Changes needed to enable installation from OperatorHub / OLM

* Added NFD dependency. Tested that this will install NFD if not present.

* Add dagray to maintainers in CSV

Co-authored-by: Zvonko Kaiser <[email protected]>

* Update controllers/status.go error message for clusteroperator

Co-authored-by: Zvonko Kaiser <[email protected]>

Co-authored-by: Zvonko Kaiser <[email protected]>

* Update 0000-nvidia-gpu-cr.yaml

* Update 0000-nvidia-gpu-cr.yaml

* Add finalizers for specialresources add  metrics.go framework and two basic metrics (openshift-psap#3)

* Add metrics.go framework for adding metrics and two basic metrics. (rebased from deprecated repository)

* First draft at finalizers, untested

* Corrected two issues in servicemonitor, added minor fixes to CSV, and added sample specialresource which will get used as alm-example when running make bundle

* Fixing problems in finalization logic, added metric cleanup code to finalization, fixed a few typos

* Reorganizing finalizer code into separate file

* Remove servicemonitor namespaceSelector which is hardcoded for a namespace

* Moving finalizer name to finalizers.go

* Adding Get() before update in clusterOperatorStatusUpdate to avoid common error in logs

* Reverting clusteroperator status change

* Moving r.specialresource = r.parent to before finalization block

* Update to new tag for testing

* Reorganized metrics

* Fixing golint errors

* Refactored conditions

* Added upgrade skeleton

* Added missing package statement

* Added pull_request keyword

* Added handling of fork PRs

* Moved env to the right location

* Added new evn handling some funcs are deprecated

* Added recreate of manifests when running e2e tests

* Adding robot account to the runner

* Add the right token

Co-authored-by: Zvonko Kaiser <[email protected]>

* Added new KernelPatchVersion (openshift-psap#15)

* Added new KernelPatchVersion

* Removed github action no push

* Reworked openshift example

* Added merge job to keep the master image up to date

* Moving away from self hosting (openshift-psap#17)

* Moving away from self hosting
Add pull_request_target and check for label
Build master image only on a push
Added better check for deleting cRS
Added automatic removal of images after PR closed/merged
Fixed typo

* Added kube-linter

* We only need to look into manifests and config/recipes

* Make kube lint happy

* Added cpu and mem limit to kube proxy

* Fixed git tag

* Reverted Makfeilfe fix

* Re-adding undeploy change to remove CR before everything else, with 728045(..) fix

* Fixed kube-lint

* Increasing the timeout since we have lower powered machines

* Adding driver-container-base build logs to e2e test, changing to 10s and 10min polling for e2e test

* Fixed go-lint errors and renamed some e2e test steps to be more clear

Co-authored-by: David Gray <[email protected]>

* Removed uneeded pull_request target (openshift-psap#20)

* Removed unneeded pull_request target

* testing new label func

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Test

* Adding working test of ok-to-test label to image-build and openshift checks

* Make image-build always run from forks on openshift-psap/ repo

* Changes to test job

* Fixing typo in test.yaml

* Putting test.yaml back to pull_request

* Renamed second test job

* Test image-build github context dump

* Adding test for pull_request_target

* Fixed mistake in new test-target.yaml

* Testing a different command in test-target.yaml

* Removing test workflows

* Changing conditional for image-master and image-delete

Co-authored-by: David Gray <[email protected]>

* Adding conditional checks for main and forked repositories (openshift-psap#22)

* test

* Adding conditional check to each workflow

* Increasing max time for simple-kmod e2e test

Co-authored-by: David Gray <[email protected]>

* First draft of run-tests.yaml to kick off other tests when running from a fork

* fixing error in run-tests

* Testing change to conditionals

* Syntax error

* Syntax error in image-master

* Update run-tests.yaml

* Update run-tests.yaml

* Update run-tests.yaml

* Update run-tests.yaml

* Update run-tests.yaml

* Testing new image-build workflow conditional

* renaming run-tests.yaml github actions to be less confusing

* Removing whitespace in run-tests.yaml

* Trying new ref for image-build in run-tests

* testing new ref and repo for run-tests image-build

* Simplifying conditional for all workflows

* Fixing typo in conditional check for labeled action

* Fixing typo in openshift.yaml github action

* Cleaning up if statement, image-build context dump, and removing test.yaml

* Removing context dump from image-build (again)

* Try workflow_run for dependency between openshift.yaml and image-build.yaml

* Combining image-build.yaml and openshift.yaml into one file e2e.yaml with dependency between jobs

* Add relatedObjects to ClusterOperator for must-gather (openshift-psap#16)

* Created getRelatedObjects() in controllers/status.go to ensure relevant objects and logs are gathered by must-gather

* Refactoring must-gather code. Turning getRelatedObjects into a method to clean up clusterOperatorStatusReconcile()

* Added e2e test to verify relatedObjects set

* Formatting change to comment

* Fix bug in runtime.go causing exit onError when waiting for secret creation, and bug in 1100-lustre-csi-secret, add driver-container-base dependency to lustre (openshift-psap#30)

* Update lustre-client (openshift-psap#34)

* Updating lustre-client driver container to use weak-modules, and also fixing graceful termination

* Adding comment

* Changing lustre-client to use kvc

* Added Rolling Upgrade Support (openshift-psap#33)

* Added Rolling Upgrade Support

* Added some refactoring

* Make kubelinter happy

* Reorder as per Davids request

* Update controllers/resources.go

Co-authored-by: David Gray <[email protected]>

* Update controllers/resources.go

Co-authored-by: David Gray <[email protected]>

* Update controllers/resources.go

Co-authored-by: David Gray <[email protected]>

Co-authored-by: David Gray <[email protected]>

* Fixing actions/checkouts that should be using the PR code

* Bug fixes: finalizers, e2e tests, runtime.go, status.go (openshift-psap#37)

* Changing when we check for specialresource deletion to fix make undeploy bug

* Fixing 2 bugs in CO status reconciliation

* Added a new e2e test to check NFD is working, and used that to get the name of pods, added rhel8.3 for ocp4.7+ in runtime.go

* Fix go-lint errors

* Refactor/rewrite renderOperatingSystem into a separate pkg

* Fixing some formatting/whitespace

* Add support for cert-manager and an example SpecialResource (openshift-psap#36)

* Add support for cert-manager Issuer and Certificates in recipes, and add a dummy SR recipe as an example

* Remove the key/cert in the secret (this key was only used for this app, not a leak) and add instructions to generate it

* Moving to go1.16 (openshift-psap#42)

* Moving to go1.16 and pinning kube-lint version

* Ran go mod vendor again to fix inconsistent vendoring... now another problem

* Patched zapr.go

* Fixing clusteroperator status bug and renaming simple-kmod buildconfig (openshift-psap#41)

* Fix a requeue bug in reconciliation loop and expand simple-kmod e2e test (openshift-psap#43)

* Fix a requeue bug in reconciliation loop which resulted in some resources not being created, and added an e2e test for this case

* Simplifying conditional in bug fix, thanks to linter

* Changing WaitForDaemonsetReady(...) in test/e2e/basic/util.go to check for ds.Status.NumberAvailable

* Add preStop hook to unload kmods loaded by KVC (openshift-psap#40)

* Add preStop hook to unload kmods loaded by kvc

* Switching to systemctl stop

* Adding rename of buildconfig file to config/recipes/simple-kmod/manifests/kustomization to fix make assets

* Added e2e test for simple-kmod unloading

* Update NFD dependency CRD version (openshift-psap#46)

* e2e-test fixes: namespace the oc debug and clarify logs when no operator deployment found (openshift-psap#47)

* Add logging, and a wait for when no deployments are created yet

* Adding namespace to oc debug node/ command in e2e test

* Update osversion.go to correctly set rhel version in OCP4.8 (openshift-psap#48)

* Update OWNERS

* Adding helm chart support (openshift-psap#45)

* Removing Preamble CR

* First running version

* Added missing files

* Added helm object kind ordering for OCP

* More charts working

* use the latest ubi8 image

* Added repos

* Added missing files

* Remove duplicate manifests

* Fixing kube-lint paths

* Update simple-kmod.yaml

* Added lastlayer fetching

* Added better SR detection

* Update e2e test

* Update e2e test higher timeout

* Update e2e test higher timeout 60min

* Configure probe

* Added shipwright chart and intelligent updates

* Added shipwright chart and intelligent updates

* New version with DTK support

* Make linters happy

* Setting the right go version

* Added helm linst

* Making all linters happey

* caching on emptyDir

Co-authored-by: David Gray <[email protected]>

* Added ping-pong example with cert-manager dependency (openshift-psap#54)

* Added ping-pong example with cert-manager dependency

* Excluding experimental charts from linting

* Fixed recipe for simple-kmod

* Added ping pong test

* Added ping pong test

* Added test-e2e locking

* Update charts/olm/cert-manager-0.0.1/kustomization.yaml

Co-authored-by: David Gray <[email protected]>

* Update charts/olm/cert-manager-0.0.1/Chart.yaml

Co-authored-by: David Gray <[email protected]>

* removed not needed file

* Update charts/olm/cert-manager-0.0.1/templates/0000_namespace.yaml

Co-authored-by: David Gray <[email protected]>

* Update .github/workflows/e2e.yaml

* Update test

* Update test

* Return version even if DTK not found

* Update test

* Updated Test and Docs

* Updated Test and Docs

* IgnoreNotFound on deletion

* IgnoreNotFound on deletion

* Removing old podlogs

* Added better initialization functions to custom SRs

* Removed unneccessary files

Co-authored-by: David Gray <[email protected]>

* Change e2e test lock to work with bash -e

* Adding compound command to only exit 1 when first command fails

* Fixing Makefile for make bundle, adding bundle for ART (openshift-psap#55)

* Fixing Makefile for make bundle, adding bundle for ART

* Updating bundle after rebase

* Update e2e.yaml

* Update e2e.yaml

* Update e2e.yaml

Co-authored-by: Zvonko Kaiser <[email protected]>
Co-authored-by: Zvonko Kaiser <[email protected]>
Co-authored-by: Zvonko Kaiser <[email protected]>
openshift-merge-robot pushed a commit that referenced this pull request Aug 9, 2022
Fix markdown and syntax mistakes

Fix markdown and syntax mistakes #2

Fix markdown format #3

Fix markdown format #4
openshift-merge-robot pushed a commit that referenced this pull request Aug 18, 2022
Fix markdown errors

Fixes in syntax #2

Fix syntax #3

Fix broken anchor link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants