-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed deployment of Elasticsearch via its operator #234
Fixed deployment of Elasticsearch via its operator #234
Conversation
2717099
to
5087bb9
Compare
Running locally with this PR causes this when deploying
|
5087bb9
to
3a58472
Compare
Codecov Report
@@ Coverage Diff @@
## master #234 +/- ##
=========================================
- Coverage 90.48% 90.19% -0.3%
=========================================
Files 59 61 +2
Lines 2680 2753 +73
=========================================
+ Hits 2425 2483 +58
- Misses 164 172 +8
- Partials 91 98 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this now.
Once I got:
ploffay ~/projects/golang/src/github.com/jaegertracing/jaeger-operator PR234 make run WATCH_NAMESPACE=myproject 10:27
customresourcedefinition.apiextensions.k8s.io/jaegers.io.jaegertracing created
INFO[0000] Versions arch=amd64 operator-sdk=v0.4.1 os=linux version=go1.11.1
INFO[0000] Auto-detected the platform platform=openshift
INFO[0000] Starting the Cmd.
ERRO[0113] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0114] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0115] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0116] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0117] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0118] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0120] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0121] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0122] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0123] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0126] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0131] failed to apply the changes error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
Second time
INFO[0000] Versions arch=amd64 operator-sdk=v0.4.1 os=linux version=go1.11.1
INFO[0000] Auto-detected the platform platform=openshift
INFO[0000] Starting the Cmd.
ERRO[0003] failed to apply the changes error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0005] failed to apply the changes error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0006] failed to apply the changes error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0007] failed to apply the changes error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0008] failed to apply the changes error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
To test it make sure that |
1b25167
to
576bc87
Compare
I got it working now:
The collector and query were in a failed state for quite some time, because they don't reconnect to Elasticsearch upon failure, so, I had to kill the pods manually. Kubernetes then created a new pod for the deployments, which then made them work. This feature should be marked as experimental, as it's not really well polished, especially because the collector/query should wait for ES to be ready before they start. If Jaeger could reconnect to ES upon failure like we do with Cassandra, then it wouldn't be a big issue, but right now, we can't do anything else from the Operator's perspective... |
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
576bc87
to
5a8d228
Compare
K8s should reschedule pods once they fail. At least I was this behavior before update PR merge. I didn't had to kill pods manually - they restarted 2-3 times until ES was in the ready state. There is an issue #216 which handles the initialization properly. |
That PR shouldn't change this behavior. The pod was in a "healthy" state, which is why Kubernetes didn't kill it. This is how the logs look like:
|
The query and collector exit with 1 if there are no ES nodes available. Maybe you should wait longer? I have tested on minishift and it worked like before. However, I get the following errors when I edit jaeger e.g. (oc edit jaeger simple-prod)
|
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Not sure: it does get into a failed state when the initial connection cannot be made, with logs like this:
The case I got into was later on ( |
By the way, the PR has been updated, to fix the error you reported before when doing
|
thanks @jpkrohling for looking into this! The #235 seems to be a side effect |
Fixes #233 by adding the ES type to the controller's reconcile loop and inventory.
Signed-off-by: Juraci Paixão Kröhling [email protected]