Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock reconciliation per instance #280

Closed
jpkrohling opened this issue Mar 7, 2019 · 4 comments
Closed

Lock reconciliation per instance #280

jpkrohling opened this issue Mar 7, 2019 · 4 comments
Labels
bug Something isn't working

Comments

@jpkrohling
Copy link
Contributor

Currently, it's possible to have two reconciliation iterations concurrently for a given instance. One way to trigger this is executing an apply followed by a delete, like:

$ kubectl apply -f deploy/examples/simple-prod.yaml 
jaeger.jaegertracing.io/simple-prod created
$ kubectl delete -f deploy/examples/simple-prod.yaml 
jaeger.jaegertracing.io "simple-prod" deleted

This is a sample debug log from the operator for the situation above:

DEBU[0075] Reconciling Jaeger                            execution="2019-03-07 14:35:42.874115 +0000 UTC" instance=simple-prod namespace=default
DEBU[0075] Strategy chosen                               instance=simple-prod namespace=default strategy=production
DEBU[0075] Assembling the Sampling configmap             instance=simple-prod namespace=default
DEBU[0075] skipping agent daemonset                      instance=simple-prod namespace=default strategy=
DEBU[0075] assembling a collector deployment             instance=simple-prod namespace=default
DEBU[0075] Assembling a query deployment                 instance=simple-prod namespace=default
DEBU[0075] injecting sidecar                             deployment=simple-prod-query instance=simple-prod namespace=default
DEBU[0075] creating service account                      account=simple-prod instance=simple-prod namespace=default
DEBU[0075] creating config maps                          configMap=simple-prod-sampling-configuration instance=simple-prod namespace=default
DEBU[0075] creating cronjob                              cronjob=simple-prod-spark-dependencies instance=simple-prod namespace=default
DEBU[0075] creating cronjob                              cronjob=simple-prod-es-index-cleaner instance=simple-prod namespace=default
DEBU[0075] creating service                              instance=simple-prod namespace=default service=simple-prod-collector
DEBU[0075] creating service                              instance=simple-prod namespace=default service=simple-prod-query
DEBU[0075] creating deployment                           deployment=simple-prod-collector instance=simple-prod namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-collector namespace=default
DEBU[0075] annotation not present, not injecting         deployment=simple-prod-collector namespace=default
DEBU[0075] creating deployment                           deployment=simple-prod-query instance=simple-prod namespace=default
DEBU[0075] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-collector namespace=default
DEBU[0075] annotation not present, not injecting         deployment=simple-prod-collector namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-collector namespace=default
DEBU[0075] annotation not present, not injecting         deployment=simple-prod-collector namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-collector namespace=default
DEBU[0075] annotation not present, not injecting         deployment=simple-prod-collector namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-query namespace=default
DEBU[0075] Reconciling Deployment                        name=simple-prod-collector namespace=default
DEBU[0076] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0077] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0078] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0079] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0080] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0081] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0082] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0083] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0084] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0085] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0086] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0087] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0088] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0089] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default
DEBU[0090] Deployment doesn't exist yet.                 name=simple-prod-query namespace=default

(the logs are based on #279)

@jpkrohling jpkrohling added the bug Something isn't working label Apr 2, 2019
@jpkrohling
Copy link
Contributor Author

This is still happening, and the following is how the logs look like currently:

DEBU[0016] Reconciling Jaeger                            execution="2019-07-16 12:01:32.199478249 +0000 UTC" instance=simplest namespace=default
INFO[0016] Storage type not provided. Falling back to 'memory'  instance=simplest namespace=default
DEBU[0016] Strategy chosen                               instance=simplest namespace=default strategy=allInOne
DEBU[0016] Creating all-in-one deployment                instance=simplest namespace=default
DEBU[0016] Assembling the UI configmap                   instance=simplest namespace=default
DEBU[0016] Assembling the Sampling configmap             instance=simplest namespace=default
DEBU[0016] Assembling an all-in-one deployment           instance=simplest namespace=default
DEBU[0016] skipping agent daemonset                      instance=simplest namespace=default strategy=
DEBU[0016] creating service account                      account=simplest-ui-proxy instance=simplest namespace=default
DEBU[0016] creating service account                      account=simplest instance=simplest namespace=default
DEBU[0016] creating config maps                          configMap=simplest-ui-configuration instance=simplest namespace=default
DEBU[0016] creating config maps                          configMap=simplest-sampling-configuration instance=simplest namespace=default
DEBU[0017] creating service                              instance=simplest namespace=default service=simplest-collector
DEBU[0017] creating service                              instance=simplest namespace=default service=simplest-query
DEBU[0017] creating service                              instance=simplest namespace=default service=simplest-agent
DEBU[0017] creating service                              instance=simplest namespace=default service=simplest-collector-headless
DEBU[0017] creating deployment                           deployment=simplest instance=simplest namespace=default
DEBU[0017] Reconciling Deployment                        name=simplest namespace=default
DEBU[0017] annotation not present, not injecting         deployment=simplest namespace=default
DEBU[0017] Deployment has stabilized                     desired=0 name=simplest namespace=default ready=0
DEBU[0017] Reconciling Deployment                        name=simplest namespace=default
DEBU[0017] annotation not present, not injecting         deployment=simplest namespace=default
DEBU[0017] Reconciling Deployment                        name=simplest namespace=default
DEBU[0017] annotation not present, not injecting         deployment=simplest namespace=default
DEBU[0017] Reconciling Deployment                        name=simplest namespace=default
DEBU[0017] creating route                                instance=simplest namespace=default route=simplest
ERRO[0017] failed to store back the current CustomResource  error="Operation cannot be fulfilled on jaegers.jaegertracing.io \"simplest\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/jaegertracing.io/jaegers/default/simplest, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 73a09455-a7c1-11e9-9b16-52fdfc072182, UID in object meta: " execution="2019-07-16 12:01:32.199478249 +0000 UTC" instance=simplest namespace=default
DEBU[0018] Reconciling Jaeger                            execution="2019-07-16 12:01:34.364945973 +0000 UTC" instance=simplest namespace=default

@jkandasa
Copy link
Member

@jpkrohling I face this issue when we supply invalid options to jaeger-services.
Steps followed:

  • Deploy the cr file (oc create -f cr-file.yaml)
  • wait until all the services come up. (jaeger query service will not come up)
  • undeploy jaeger services (oc delete -f cr-file.yaml)
  • Note the jaeger-operator log
  • this error continues approx 5 minutes. During this period unable to create jaeger services.

CR file: (invalid option supplied for jaeger query service)

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaegerqe
spec:
  strategy: production
  query:
    replicas: 1
    options:
      invalidOption: "invalid option"
  storage:
    type: elasticsearch
    elasticsearch:
      nodeCount: 1
      resources:

Operator log:

time="2019-09-20T16:54:35Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:36Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:54:36Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:54:36Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:37Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:38Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:39Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:40Z" level=debug msg="Waiting for deployment to stabilize" desired=1 name=jaegerqe-query namespace=jkandasa ready=0
time="2019-09-20T16:54:41Z" level=debug msg="Reconciling Deployment" name=jaegerqe-collector namespace=jkandasa
time="2019-09-20T16:54:41Z" level=debug msg="Reconciling Deployment" name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:41Z" level=debug msg="Reconciling Deployment" name=elasticsearch-cdm-jkandasajaegerqe-1 namespace=jkandasa
time="2019-09-20T16:54:41Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:54:41Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:54:41Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:42Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:43Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:44Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:45Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:46Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:54:46Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:54:46Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:47Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:48Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:49Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:50Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:51Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:54:51Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:54:51Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:52Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:53Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:54Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:55Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:56Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:54:56Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:54:56Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:57Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:58Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:54:59Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:00Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:01Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:55:01Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:55:01Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:02Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:03Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:04Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:05Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:06Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:55:06Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:55:06Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:07Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:08Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:09Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:11Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:11Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift
time="2019-09-20T16:55:11Z" level=debug msg="The 'es-provision' option is explicitly set" es-provision=true
time="2019-09-20T16:55:11Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:12Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:13Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:14Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:15Z" level=debug msg="Deployment doesn't exist yet." name=jaegerqe-query namespace=jkandasa
time="2019-09-20T16:55:16Z" level=debug msg="The 'platform' option is explicitly set" platform=openshift

@jpkrohling
Copy link
Contributor Author

@jkandasa although the underlying problem is similar, the causes are different. This issue should be closed, as the lock was implemented already ("ManagedBy" annotation, IIRC), but we've recently seen the same problem in a different cause during e2e tests (cc @kevinearls). So far, every case where this happened was an edge case, but yours seems more likely to occur in the real world.

Could you please open a new issue with your last comment?

@jkandasa
Copy link
Member

@jpkrohling new issue: #670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants