Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaerger Ingester - Unable to create consumer","error":"kafka: client has run out of available brokers to talk to #3059

Closed
dgoscn opened this issue Jun 4, 2021 · 6 comments
Labels

Comments

@dgoscn
Copy link

dgoscn commented Jun 4, 2021

Describe the bug
So, me and my team are trying to up a Jaeger Ingester on EKS and we are passing Jaeger flags as args. When the pod is created we can check on the logs that it wasn't able to create a kafka consumer. In order to troubleshoot we developed a python script that is able to connect to the kafka cluster.

To Reproduce
Steps to reproduce the behavior:

  1. This is deployment file that we use to deploy.
    kubectl apply -f jaeger-ingester.yml
 apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: jaeger
    component: ingester
  name: jaeger-ingester
  namespace: jaeger
spec:
  selector:
    matchLabels:
      app: jaeger
      component: ingester
  replicas: 1
  template:
    metadata:
      labels:
        app: jaeger
        component: ingester
    spec:
     containers:
        - name: jaeger-ingester
          args:
          - --log-level=debug
          - --span-storage.type=elasticsearch
          - --es.server-urls=https://vpc-es-cluster-ZZZ-XXX.us-east-1.es.amazonaws.com/
          - --kafka.consumer.brokers=kafka-url:31101
          - --kafka.consumer.protocol-version=SSL
          - --kafka.consumer.tls.enabled=true
          - --kafka.consumer.tls.cert=/opt/keys/SOME-cert.pem
          - --kafka.consumer.tls.key=/opt/keys/SOME-key.pem
          - --kafka.consumer.tls.ca=/opt/keys/CARoot
          - --kafka.consumer.topic=observability-topic
          - --kafka.consumer.group-id=group_consumer_jaeger
          - --ingester.deadlockInterval=2m
          - --ingester.parallelism=5
          image: jaegertracing/jaeger-ingester:1.22.0 #we use our own ECR Repository
          env:
            - name: "SPAN_STORAGE_TYPE"
              value: "elasticsearch"
          ports:
            # IngesterAdminHTTP is the default admin HTTP port (health check, metrics, etc.)
            - name: jaeger-ingester
              containerPort: 14270
          # volume secrets
          volumeMounts:
          - name: kafka-ing
            mountPath: "/opt/keys"
            readOnly: true
     volumes:
     - name: kafka-ing
       secret:
         secretName: kafka-key
         items:
         - key: SOME-key.pem
           path: ./SOME-key.pem
         - key: SOME-cert.pem
           path: ./SOME-cert.pem
         - key: CARoot
           path: ./CARoot
    # WE HAVE THESE SOME-CERT inside of our manifests in local directory when we run the deployment.
    #kubectl create secret generic kafka-key -n jaeger --from-file=./SOME-cert.pem --from-file=./SOME-cert.pem --from-file=./CARoot
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: jaeger
    component: ingester
  name: jaeger-ingester
  namespace: jaeger
spec:
  ports:
    - name: jaeger-ingester
      port: 14270
      protocol: TCP
      targetPort: 14270
  selector:
    app: jaeger
    component: ingester
  type: ClusterIP
  1. kubectl logs pods/jaeger-pods-generated -n jaeger

Expected behavior
kubectl logs pods/jaeger-ingester-POD -n jaeger

021/06/04 22:01:00 maxprocs: Leaving GOMAXPROCS=2: CPU quota undefined {"level":"info","ts":1622844060.9307475,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"} {"level":"info","ts":1622844060.9308012,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"} {"level":"info","ts":1622844060.9309163,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"} {"level":"info","ts":1622844060.9309664,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14270"} {"level":"info","ts":1622844060.930975,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14270","health-status":"unavailable"} {"level":"info","ts":1622844061.055035,"caller":"config/config.go:189","msg":"Elasticsearch detected","version":7} {"level":"debug","ts":1622844062.216302,"caller":"consumer/deadlock_detector.go:147","msg":"Global deadlock detector disabled"} {"level":"info","ts":1622844062.216349,"caller":"healthcheck/handler.go:129","msg":"Health Check state change","status":"ready"} {"level":"info","ts":1622844062.2163684,"caller":"consumer/consumer.go:79","msg":"Starting main loop"} {"level":"info","ts":1622844064.5668762,"caller":"consumer/consumer.go:167","msg":"Starting error handler","partition":1} {"level":"info","ts":1622844064.5669289,"caller":"consumer/consumer.go:110","msg":"Starting message handler","partition":1} {"level":"debug","ts":1622844064.5670233,"caller":"consumer/deadlock_detector.go:98","msg":"Partition deadlock detector disabled"} {"level":"debug","ts":1622844064.5750434,"caller":"consumer/consumer.go:138","msg":"Got msg","msg":{"Headers":null,"Timestamp":"0001-01-01T00:00:00Z","BlockTimestamp":"0001-01-01T00:00:00Z","Key":"zKSauSG41837BRHUE==","Value":"KZ01Vd3Flenc9Iiwib3BlcmF0aW9uTmFtZSI6IkdldERyaXZlciIsInJlZ","Topic":"observability-topic","Partition":1,"Offset":4332888}}

Version (please complete the following information):

  • OS: cat /etc/os-release
    NAME="Amazon Linux"
    VERSION="2"
    ID="amzn"
    ID_LIKE="centos rhel fedora"
    VERSION_ID="2"
    PRETTY_NAME="Amazon Linux 2"
    ANSI_COLOR="0;33"
    CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
    HOME_URL="https://amazonlinux.com/"

  • Jaeger version: 1.22.0

  • Deployment: Kubernetes, 1.17

What troubleshooting steps did you try?
We tried using some telnet connections, and it's worked, so, we tried using python code to test connectivity with kafka broker

from kafka import KafkaConsumer

try:
    consumer = KafkaConsumer(
                                bootstrap_servers=kafka-url:31101',
                                group_id='group_consumer_jaeger',
                                ssl_cafile='/opt/keys/CARoot',
                                ssl_certfile='/opt/keys/SOME-cert.pem',
                                ssl_keyfile='/opt/keys/SOME-key.pem',
                                security_protocol='SSL'
                            )
except Exception as e: print('Error creating consumer: %s', str(e))

consumer.subscribe('observability-topic')
print(consumer.partitions_for_topic('observability-topic')

this is the output: python test-kafka-brokers.py set([0, 1, 2])

We also commented the jaeger flags presents on the deployment, but, we received the same error.

@dgoscn dgoscn added the bug label Jun 4, 2021
@dgoscn dgoscn changed the title Jaerger Ingeste - Unable to create consumer","error":"kafka: client has run out of available brokers to talk to Jaerger Ingester - Unable to create consumer","error":"kafka: client has run out of available brokers to talk to Jun 4, 2021
@Ashmita152
Copy link
Contributor

Hi,

I am not sure but I think you're missing --kafka.consumer.authentication=tls flag in your jaeger-ingester arguments.

Can you try adding it and see if it helps. Let us know how it goes.

Thanks!

@dgoscn
Copy link
Author

dgoscn commented Jun 7, 2021

Hi @Ashmita152. Thank you very much for your time.

I already tried this approach and made it again, but, still having the same error.

"level":"fatal","ts":1623072056.8520513,"caller":"command-line-arguments/main.go:75","msg":"Unable to create consumer","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:75\ngithub.com/spf13/cobra.(...)

Thanks

@Ashmita152
Copy link
Contributor

Hi @dgoscn

May I ask you the version of kafka you're running. Your issue looks something similar to the one reported here: #2950

@dgoscn
Copy link
Author

dgoscn commented Jun 7, 2021

Thanks one more time @Ashmita152.

The kafka running, it is 2.3. About the issue that you metioned, it's a bite confusing. There are just a log of the error and not some useful info.

Regards

@dgoscn
Copy link
Author

dgoscn commented Jun 10, 2021

Hello @Ashmita152. How are you?

A Huge sorry from my part and a special thanks for your time at this issue.

I hope that this can help someone if some catch this error one day.

This error it is not necessarily a BUG

SOLUTION:
It is all about permission. At my /opt/keys path and on the root, it is necessary a chmod 600 or another depending your necessities.

Feel free to close the issue.

Thanks one more time @Ashmita152 and Jaeger team

@Ashmita152
Copy link
Contributor

Nice debugging! Happy to see that you solved the issue.

@dgoscn dgoscn closed this as completed Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants