Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[blocker] Nexus Deployment stuck on OpenShift 4.x #186

Closed
ricardozanini opened this issue Nov 1, 2020 · 4 comments · Fixed by #189
Closed

[blocker] Nexus Deployment stuck on OpenShift 4.x #186

ricardozanini opened this issue Nov 1, 2020 · 4 comments · Fixed by #189
Assignees
Labels
bug 🐛 Something isn't working openshift This issue/PR is related to OpenShift deployments only
Milestone

Comments

@ricardozanini
Copy link
Member

Describe the bug
When trying to create a simple Nexus3 instance with the example "CentOS No Persistence", I got:

2020-11-01T14:29:25.381Z	INFO	controllers.Nexus	Reconciling Nexus
2020-11-01T14:29:25.389Z	DEBUG	Fetching the latest micro from minor 28
2020-11-01T14:29:25.389Z	DEBUG	Replacing 'spec.image' (docker.io/sonatype/nexus3:3.28.1) with 'docker.io/sonatype/nexus3:3.28.1'
2020-11-01T14:29:25.396Z	INFO	Generating required resources
2020-11-01T14:29:25.396Z	DEBUG	Generating required Deployment
2020-11-01T14:29:25.396Z	DEBUG	Generating required Service
2020-11-01T14:29:25.396Z	DEBUG	Generating required Service Account
2020-11-01T14:29:25.396Z	DEBUG	Generating required Secret
2020-11-01T14:29:25.396Z	INFO	Fetching deployed resources
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Deployment"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Deployment
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Service"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Service
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Persistent Volume Claim"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Persistent Volume Claim
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Secret"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Secret
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Service Account"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Service Account
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Route"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Route
2020-11-01T14:29:25.396Z	INFO	Attempting to fetch	{"deployed": "Ingress"}
2020-11-01T14:29:25.396Z	DEBUG	There is no deployed Ingress
2020-11-01T14:29:25.396Z	INFO	controllers.Nexus	Will 	{"create ": 1, ", update ": 0, ", delete ": 0, " instances of ": "v1.Deployment"}
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Updating application status before leaving
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Checking Deployment Status
2020-11-01T14:29:25.410Z	INFO	controllers.Nexus	Controller finished reconciliation
2020-11-01T14:29:25.410Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "deployments.apps \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:246
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:197
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90

It's stucking in the deployment creation.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the operator with the nexus-operator.yaml file
  2. Deploy the example "CentOS No Persistence"

Expected behavior
Deployment and other resources to be created

Environment
OpenShift 4.5

@ricardozanini ricardozanini added bug 🐛 Something isn't working openshift This issue/PR is related to OpenShift deployments only labels Nov 1, 2020
@ricardozanini ricardozanini added this to the v0.4.0 milestone Nov 1, 2020
@ricardozanini
Copy link
Member Author

This same error happens to whatever resource we try to create:

2020-11-01T14:35:00.133Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
2020-11-01T14:35:00.133Z	ERROR	controller	Reconciler error	{"reconcilerGroup": "apps.m88i.io", "reconcilerKind": "Nexus", "controller": "nexus", "name": "nexus3", "namespace": "nexus", "error": "secrets \"nexus3\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

Not only deployments.

@LCaparelli
Copy link
Member

Looks like we need to add permissions for finalizers.

It's strange, because we already have permissions to use the update verb indeployment/finalizers, see this. Perhaps that's not enough on newer OCP versions? Not sure why though.

We should be able to slowly add the permissions (the API object is in the format of <kind>/finalizers and belongs to <kind>'s group from what I looked around) and fix the issue by only adding what's necessary.

We could start by adding all verbs to deployment/finalizers and secret/finalizers (and whatever other Kinds we see on the logs, mentioning these ones because they're the only ones mentioned in the issue), to first make sure they will work correctly and we're really just facing a permission issue. Once done and confirmed, we can try adding one verb at a time.

Unfortunately I can't test this myself as I can't run CRC, but let me know if I can be of any help some other way.

@ricardozanini
Copy link
Member Author

Not sure if it's a problem in the permissions or the way we are defining the owner and the options for the finalizers for each resource. I'll take a look later. 🤤

@ricardozanini ricardozanini self-assigned this Nov 6, 2020
@ricardozanini
Copy link
Member Author

ricardozanini commented Nov 6, 2020

SDK 1.0.1 left this behind in their docs:
https://sdk.operatorframework.io/docs/building-operators/golang/tutorial/#specify-permissions-and-generate-rbac-manifests

Here are more details about this problem:
operator-framework/operator-sdk#3477

Admission controller on OpenShift is enforcing this:
https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#ownerreferencespermissionenforcement

Won't happen on vanilla Kubernetes.

ricardozanini added a commit that referenced this issue Nov 6, 2020
ricardozanini added a commit that referenced this issue Nov 7, 2020
* Fix #186 - Add Nexus finalizers to ClusterRole

Signed-off-by: Ricardo Zanini <[email protected]>

* Reverting back to Controller Gen 0.3.0

Signed-off-by: Ricardo Zanini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working openshift This issue/PR is related to OpenShift deployments only
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants