You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requirement - what kind of business use case are you trying to solve?
I'm trying out the Jaeger Operator v1.25.0 with the streaming strategy and the Jaeger ClickHouse storage plugin. Now it could happen that the plugin in the Ingester Pod is killed with the following error message, when the container hasn't enough resources:
2021-08-19T15:52:11.780Z [DEBUG] plugin process exited: path=/plugin/jaeger-clickhouse pid=11 error="signal: killed"
In that case the Ingester Pod is still running but doesn't write any messages from Kafka to ClickHouse. Therefore it would be good if we can adjust the liveness probe of the Pod so that the Pod is restarted, when it isn't working anymore.
Problem - what in Jaeger blocks you from solving the requirement?
When the messages from Kafka are not consumed anymore, because the plugin was killed, we have to restart the Pod. A better approach would be, that the Pod is automatically restarted.
For that it must be possible that we can adjust the liveness probe of the Pod via the created Jaeger CR, which is currently not possible.
Proposal - what do you suggest to solve the problem or improve the existing situation?
Extend the Jaeger CRD with a new field, where a user can set a custom liveness probe for the Jaeger Ingester:
---
apiVersion: jaegertracing.io/v1kind: Jaegermetadata:
name: jaegernamespace: tracingspec:
strategy: streamingingester:
autoscale: falsereplicas: 5resources:
limits:
cpu: 2000mmemory: 1024Mirequests:
cpu: 500mmemory: 256Mi# First approach: Add a liveness probe to check the Port of the storage pluginlivenessProbe:
failureThreshold: 5httpGet:
path: /metrics # Maybe a health check endpoint can be added to the plugin, which can then be used instead of the metrics endpointport: 9090scheme: HTTPinitialDelaySeconds: 5periodSeconds: 15successThreshold: 1timeoutSeconds: 1# Seconde approach: Check if the process is runninglivenessProbe:
exec:
command:
- sh
- -ec
- ps -ef | grep "/plugin/jaeger-clickhouse --config /plugin-config/config.yaml" | grep -v grepoptions:
log-level: warningester:
deadlockInterval: 300skafka:
consumer:
topic: jaeger-spansbrokers: kafka-kafka-0.kafka-kafka-brokers.tracing.svc.cluster.local:9092,kafka-kafka-1.kafka-kafka-brokers.tracing.svc.cluster.local:9092,kafka-kafka-2.kafka-kafka-brokers.tracing.svc.cluster.local:9092
Any open questions to address
Maybe this can also checked by the Jaeger Ingester and the liveness probe of the Ingester fails, when the storage plugin was killed. If you think this would be the better approach I can also create an issue in the jaegertracing/jaeger repository.
Requirement - what kind of business use case are you trying to solve?
I'm trying out the Jaeger Operator v1.25.0 with the streaming strategy and the Jaeger ClickHouse storage plugin. Now it could happen that the plugin in the Ingester Pod is killed with the following error message, when the container hasn't enough resources:
In that case the Ingester Pod is still running but doesn't write any messages from Kafka to ClickHouse. Therefore it would be good if we can adjust the liveness probe of the Pod so that the Pod is restarted, when it isn't working anymore.
Problem - what in Jaeger blocks you from solving the requirement?
When the messages from Kafka are not consumed anymore, because the plugin was killed, we have to restart the Pod. A better approach would be, that the Pod is automatically restarted.
For that it must be possible that we can adjust the liveness probe of the Pod via the created Jaeger CR, which is currently not possible.
Proposal - what do you suggest to solve the problem or improve the existing situation?
Extend the Jaeger CRD with a new field, where a user can set a custom liveness probe for the Jaeger Ingester:
Any open questions to address
Maybe this can also checked by the Jaeger Ingester and the liveness probe of the Ingester fails, when the storage plugin was killed. If you think this would be the better approach I can also create an issue in the jaegertracing/jaeger repository.
Processes when everything looks fine
Processes after the storage plugin was killed
Jaeger Ingester logs
The text was updated successfully, but these errors were encountered: