Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger Operator generating secrets on a regular basis #521

Closed
samcrutt9900 opened this issue Jul 12, 2019 · 21 comments · Fixed by #526
Closed

Jaeger Operator generating secrets on a regular basis #521

samcrutt9900 opened this issue Jul 12, 2019 · 21 comments · Fixed by #526
Labels
bug Something isn't working

Comments

@samcrutt9900
Copy link

I have deployed jaeger operator version 1.13.1 onto Openshift version 3.11

The operator appears to be generating secrets for msa-jaeger-ui-proxy-token, msa-jaeger-ui-proxy-dockercfg and msa-jaeger-token on a regular basis and therefore the amount of secrets in the namespace is constantly growing.

I see there was a related issue #286 that has been marked as resolved but I am still seeing the issue.

@objectiser
Copy link
Contributor

How often are the secrets being created?

@samcrutt9900
Copy link
Author

They are being created quite frequently. In the last 20 hours for example I have had 193 msa-jaeger-ui-proxy-token secrets created.

@jpkrohling jpkrohling added the bug Something isn't working label Jul 12, 2019
@jpkrohling
Copy link
Contributor

@samcrutt9900 would you be able to set --log-level=debug and share the (anonymized) logs? I'll try to reproduce this, but a log would certainly make it easier to spot the problem.

@samcrutt9900
Copy link
Author

samcrutt9900 commented Jul 12, 2019

@jpkrohling Attached is the log. I believe you can see a new secret being created here:
time="2019-07-12T12:20:06Z" level=debug msg="updating service account"

jaeger_operator.log

@jpkrohling
Copy link
Contributor

Interesting. I assume you did not update your Jaeger CR, and yet, there are changes to some objects:

time="2019-07-12T12:20:06Z" level=debug msg="updating service account" account=msa-jaeger-dev-ui-proxy instance=msa-jaeger-dev namespace=observability
time="2019-07-12T12:20:06Z" level=debug msg="updating service account" account=msa-jaeger-dev instance=msa-jaeger-dev namespace=observability
time="2019-07-12T12:20:06Z" level=debug msg="updating config maps" configMap=msa-jaeger-dev-ui-configuration instance=msa-jaeger-dev namespace=observability
time="2019-07-12T12:20:06Z" level=debug msg="updating config maps" configMap=msa-jaeger-dev-sampling-configuration instance=msa-jaeger-dev namespace=observability
time="2019-07-12T12:20:06Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-collector-headless
time="2019-07-12T12:20:06Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-collector
time="2019-07-12T12:20:06Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-query
time="2019-07-12T12:20:06Z" level=debug msg="updating deployment" deployment=msa-jaeger-dev-collector instance=msa-jaeger-dev namespace=observability
time="2019-07-12T12:20:06Z" level=debug msg="updating deployment" deployment=msa-jaeger-dev-query instance=msa-jaeger-dev namespace=observability

There might be a bug in the logic that compares the expected vs. existing objects. I'm looking into this now.

@samcrutt9900
Copy link
Author

samcrutt9900 commented Jul 12, 2019

Thanks for looking into this and no I did not update the Jaeger CR

@samcrutt9900
Copy link
Author

@jpkrohling One thing to add, in case it has any bearing on the situation, I have created my own kubernetes secret in the observability (jaeger) namespace. This secret is a standard secret that has none of the jager-operator annotations so should not be managed by it.

@jpkrohling
Copy link
Contributor

I think I need more info to reproduce this. Would you be able to come up with a minimal CR that triggers the problem? Do you have access to an OpenShift 4.x cluster?

I tried on both OpenShift 4.x with CRC (CodeReady Containers), and OpenShift 3.11 with minishift, and I couldn't really reproduce this. I tried deploying the simplest CR, and I got 4 service account tokens for each service account after 5 minutes, all of them created at around the same time:

$ kubectl get secrets | grep simplest
simplest-dockercfg-h47p4            kubernetes.io/dockercfg               1         5m
simplest-dockercfg-nf5t9            kubernetes.io/dockercfg               1         5m
simplest-token-88qn8                kubernetes.io/service-account-token   4         5m
simplest-token-jcsq2                kubernetes.io/service-account-token   4         5m
simplest-token-nr6cb                kubernetes.io/service-account-token   4         5m
simplest-token-shfm4                kubernetes.io/service-account-token   4         5m
simplest-ui-oauth-proxy-tls         kubernetes.io/tls                     2         5m
simplest-ui-proxy-dockercfg-7xbnp   kubernetes.io/dockercfg               1         5m
simplest-ui-proxy-dockercfg-z2s4z   kubernetes.io/dockercfg               1         5m
simplest-ui-proxy-token-8nbkw       kubernetes.io/service-account-token   4         5m
simplest-ui-proxy-token-g9v7b       kubernetes.io/service-account-token   4         5m
simplest-ui-proxy-token-mztzf       kubernetes.io/service-account-token   4         5m
simplest-ui-proxy-token-t4sjk       kubernetes.io/service-account-token   4         5m

I then changed the CR to trigger a reconciliation, and got two more:

$ kubectl get secrets  | grep simplest
simplest-dockercfg-h47p4            kubernetes.io/dockercfg               1         8m
simplest-dockercfg-nf5t9            kubernetes.io/dockercfg               1         8m
simplest-dockercfg-p5tzt            kubernetes.io/dockercfg               1         30s
simplest-token-88qn8                kubernetes.io/service-account-token   4         8m
simplest-token-jcsq2                kubernetes.io/service-account-token   4         8m
simplest-token-nr6cb                kubernetes.io/service-account-token   4         8m
simplest-token-qw7dk                kubernetes.io/service-account-token   4         30s
simplest-token-shfm4                kubernetes.io/service-account-token   4         8m
simplest-token-sk9rm                kubernetes.io/service-account-token   4         30s
simplest-ui-oauth-proxy-tls         kubernetes.io/tls                     2         8m
simplest-ui-proxy-dockercfg-7xbnp   kubernetes.io/dockercfg               1         8m
simplest-ui-proxy-dockercfg-tmwxq   kubernetes.io/dockercfg               1         30s
simplest-ui-proxy-dockercfg-z2s4z   kubernetes.io/dockercfg               1         8m
simplest-ui-proxy-token-8nbkw       kubernetes.io/service-account-token   4         8m
simplest-ui-proxy-token-g9v7b       kubernetes.io/service-account-token   4         8m
simplest-ui-proxy-token-mqh4n       kubernetes.io/service-account-token   4         30s
simplest-ui-proxy-token-mztzf       kubernetes.io/service-account-token   4         8m
simplest-ui-proxy-token-t4sjk       kubernetes.io/service-account-token   4         8m
simplest-ui-proxy-token-z974b       kubernetes.io/service-account-token   4         30s

Every change then seems to create two tokens, but there's no constant stream of new tokens appearing for me, as the bug report suggests. I guess I'm just not triggering the bug.

In time: ideally, we'd have only one token per change, but the current version of the Jaeger Operator persists a complete CR with empty fields after the first reconciliation loop, which explains the duplicate reconciliation loop (one for the original CR, one for the CR with empty fields). This should be fixed by #517, as we'd then omit the empty fields. I expect the CR after the first reconciliation to be the same as what the user specified, so, no changes would be triggered. In any case, this does not explain why there are hundreds of service account tokens there.

@samcrutt9900
Copy link
Author

I will try out the simplest CR on our platform and see if I get the same issue.
If I do not see the issue I guess it could be specific to cr I've created for our deployment, which is based off of a production strategy.
I will report back.

@samcrutt9900
Copy link
Author

I tried deploying the same simple CR as you @jpkrohling but get the same result as before.
When the instance was first created I had 6 tokens created rather than 4 like you had. After 15 minutes ish new ones were created:

simplest-dockercfg-7j2kx            kubernetes.io/dockercfg               1         14m
simplest-dockercfg-l78rd            kubernetes.io/dockercfg               1         14m
simplest-dockercfg-r5jh9            kubernetes.io/dockercfg               1         14m
simplest-dockercfg-rgp62            kubernetes.io/dockercfg               1         1m
simplest-token-62t7f                kubernetes.io/service-account-token   4         14m
simplest-token-jxtzq                kubernetes.io/service-account-token   4         14m
simplest-token-l4dth                kubernetes.io/service-account-token   4         14m
simplest-token-l8zhj                kubernetes.io/service-account-token   4         14m
simplest-token-nfldw                kubernetes.io/service-account-token   4         14m
simplest-token-rnkjp                kubernetes.io/service-account-token   4         1m
simplest-token-slb6h                kubernetes.io/service-account-token   4         1m
simplest-token-vb6fb                kubernetes.io/service-account-token   4         14m
simplest-ui-oauth-proxy-tls         kubernetes.io/tls                     2         14m
simplest-ui-proxy-dockercfg-5pwkb   kubernetes.io/dockercfg               1         14m
simplest-ui-proxy-dockercfg-bn7h8   kubernetes.io/dockercfg               1         14m
simplest-ui-proxy-dockercfg-c7wkn   kubernetes.io/dockercfg               1         1m
simplest-ui-proxy-dockercfg-mrnsv   kubernetes.io/dockercfg               1         14m
simplest-ui-proxy-token-42rb5       kubernetes.io/service-account-token   4         14m
simplest-ui-proxy-token-58mcj       kubernetes.io/service-account-token   4         14m
simplest-ui-proxy-token-gnqp6       kubernetes.io/service-account-token   4         14m
simplest-ui-proxy-token-l5wck       kubernetes.io/service-account-token   4         1m
simplest-ui-proxy-token-n2xtn       kubernetes.io/service-account-token   4         14m
simplest-ui-proxy-token-qgzp7       kubernetes.io/service-account-token   4         1m
simplest-ui-proxy-token-t2p8r       kubernetes.io/service-account-token   4         14m
simplest-ui-proxy-token-xmspl       kubernetes.io/service-account-token   4         14m

@samcrutt9900
Copy link
Author

@jkandasa After some more investigation I believe I am seeing the same behaviour as you, whereby extra service account secrets are created for a change to the CR. The difference is the Jaeger CR reconciliation on my deployment seems to be being triggered on a regular basis by this:

W0712 12:20:05.561159       1 reflector.go:256] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:93: watch of *v1.Jaeger ended with: The resourceVersion for the provided watch is too old.

So I guess I need to understand why I get so many watch too old messages and understand if the change #517 will resolve it.

@jkandasa
Copy link
Member

@jpkrohling I guess, the previous comment is for you.

@jpkrohling
Copy link
Contributor

@samcrutt9900 I'll give some more attention to this issue today, but while I try to figure out why you are getting so many reconciliation loops, would you be willing to test a temporary image with master+ #517?

@samcrutt9900
Copy link
Author

@jpkrohling Yes I can test a temp image with #517. What is the best way to build this image?

@jpkrohling
Copy link
Contributor

Awesome. I just fixed the merge conflicts from #517 and published an image for you: jpkroehling/jaeger-operator:466-ValidateCR

@samcrutt9900
Copy link
Author

@jpkrohling I am running the image provided above but see some issues around the cluster role that the operator SA uses in the log.

time="2019-07-15T08:55:35Z" level=info msg="The service account running this operator does not have the role 'system:auth-delegator', consider granting it for additional capabilities"

And then I repeatedly see this

E0715 08:55:36.083534 1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:93: Failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:observability:jaeger-operator" cannot list clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: no RBAC policy matched

Is there a change to the cluster role as part of this change that I need to apply ?

@jpkrohling
Copy link
Contributor

Sorry, yes, there's a new rule to the roles.yaml file. Just apply the deploy/role.yaml again (kubectl apply -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml) and the second message should disappear.

The first message is OK, it's just the operator telling you that it might enable extra features if given more permissions. If you are curious about this feature, look for ClusterRoleBinding in the readme.

@samcrutt9900
Copy link
Author

Unfortunately I am still seeing the generation of secrets when a change to the CR happens.
So the CR again was triggered by a watch too old:

W0715 09:32:57.563862       1 reflector.go:256] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:93: watch of *v1.Jaeger ended with: The resourceVersion for the provided watch is too old.

Which resulted in this being logged by the operator:

time="2019-07-15T09:32:58Z" level=debug msg="Reconciling Jaeger" execution="2019-07-15 09:32:58.571917019 +0000 UTC" instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="Strategy chosen" instance=msa-jaeger-dev namespace=observability strategy=production
time="2019-07-15T09:32:58Z" level=debug msg="Assembling the UI configmap" instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="Assembling the Sampling configmap" instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="skipping agent daemonset" instance=msa-jaeger-dev namespace=observability strategy=
time="2019-07-15T09:32:58Z" level=debug msg="assembling a collector deployment" instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="Assembling a query deployment" instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="injecting sidecar" deployment=msa-jaeger-dev-query instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating service account" account=msa-jaeger-dev-ui-proxy instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating service account" account=msa-jaeger-dev instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating config maps" configMap=msa-jaeger-dev-ui-configuration instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating config maps" configMap=msa-jaeger-dev-sampling-configuration instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-query
time="2019-07-15T09:32:58Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-collector-headless
time="2019-07-15T09:32:58Z" level=debug msg="updating service" instance=msa-jaeger-dev namespace=observability service=msa-jaeger-dev-collector
time="2019-07-15T09:32:58Z" level=debug msg="updating deployment" deployment=msa-jaeger-dev-collector instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="updating deployment" deployment=msa-jaeger-dev-query instance=msa-jaeger-dev namespace=observability
time="2019-07-15T09:32:58Z" level=debug msg="Deployment has stabilized" desired=1 name=msa-jaeger-dev-collector namespace=observability ready=1
time="2019-07-15T09:32:58Z" level=debug msg="Deployment has stabilized" desired=1 name=msa-jaeger-dev-query namespace=observability ready=1
time="2019-07-15T09:32:58Z" level=debug msg="updating route" instance=msa-jaeger-dev namespace=observability route=msa-jaeger-dev
time="2019-07-15T09:32:58Z" level=debug msg="Reconciling Jaeger completed - reschedule in 5 seconds" execution="2019-07-15 09:32:58.571917019 +0000 UTC" instance=msa-jaeger-dev namespace=observability

And the creation of these additional secrets

msa-jaeger-dev-ui-proxy-token-m9kgw       kubernetes.io/service-account-token   4         5m
msa-jaeger-dev-ui-proxy-token-7s658       kubernetes.io/service-account-token   4         5m
msa-jaeger-dev-token-r6ptt                kubernetes.io/service-account-token   4         5m
msa-jaeger-dev-token-72wqq                kubernetes.io/service-account-token   4         5m
msa-jaeger-dev-dockercfg-pjj8h            kubernetes.io/dockercfg               1         5m
msa-jaeger-dev-ui-proxy-dockercfg-pjtcc   kubernetes.io/dockercfg               1         5m

@jpkrohling
Copy link
Contributor

generation of secrets when a change to the CR happens

Depending on the change, this should indeed happen. I just found the root cause for this behavior: OpenShift is injecting the token into the ServiceAccount object, and the operator will detect that as something that needs to be "reconciled". In this case, the operator is removing the service account token from the service account, which is causing OpenShift to create a new one and update the SA again.

I'll publish a new image soon fixing this.

@samcrutt9900
Copy link
Author

samcrutt9900 commented Jul 15, 2019

@jpkrohling I tested with the image built from the above pull request and confirm that I no longer see secrets getting created each time a change to the jaeger CR is picked up.
Thanks very much for all your help on this.

@jpkrohling
Copy link
Contributor

Thanks for the confirmation, for the bug report and for the patience :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants