Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaegertracing with Docker #286

Closed
de-robat opened this issue Jul 26, 2017 · 14 comments
Closed

Jaegertracing with Docker #286

de-robat opened this issue Jul 26, 2017 · 14 comments

Comments

@de-robat
Copy link

de-robat commented Jul 26, 2017

Hello, first of all Kudos and Thanks for this awesome tool.

I fiddled around with the Standalone Setup and managed to get some traces up and running. Now needing more performance for my evaluation tests, i wanted to switch to the proper setup, containing a cassandra instead of the memory database.

What i'm trying to achieve is getting a Jaeger-Setup with one agent one collector a cassandra and the query interface up and running on one docker server. My plan is to finally set it all up as a Rancher-Catalog to deploy it to more serious environments, but right now i'm struggling to get it running in the first place.

Specs: Linux: Debian, Docker version 17.05.0-ce, build 89658be, jager versions 0.5.2, cassandra 3.11

starting point is the following docker compose

# dont forget to init the cassandras schema 
# docker run -d --network=jaeger_jaeger jaegertracing/jaeger-cassandra-schema:0.5.2
version: '2'
services:
  cassandra:
    image: cassandra:3.11
    networks:
      - jaeger
    ports:
      - "7000:7000"
      - "7001:7001"
      - "7199:7199"
      - "9042:9042"
      - "9160:9160"
    volumes:
      - cassandra-data:/var/lib/cassandra
      - cassandra-logs:/var/log/cassandra
    environment:
      MAX_HEAP_SIZE: 512M
      HEAP_NEWSIZE: 100M
      CASSANDRA_CLUSTER_NAME: "jaeger"
      CASSANDRA_DC: "dc1"
      CASSANDRA_RACK: "rack1"
      CASSANDRA_ENDPOINT_SNITCH: "GossipingPropertyFileSnitch"
      CASSANDRA_START_RPC: "true"
  jaeger-collector:
    image: jaegertracing/jaeger-collector:0.5.2
    networks:
      - jaeger
    ports:
      - "14267:14267"
      - "14268:14268"
    command: /go/bin/collector-linux -cassandra.servers=cassandra -cassandra.keyspace=jaeger_v1_dc1
  jaeger-agent:
    image: jaegertracing/jaeger-agent:0.5.2
    networks: 
      - jaeger
    ports:
      - "5775:5775"
      - "5778:5778"
      - "6831:6831"
      - "6832:6832"
    command: /go/bin/agent-linux -collector.host-port=jaeger-collector:14267
  jaeger-query:
    image: jaegertracing/jaeger-query:0.5.2
    networks:
      - jaeger
    ports:
      - "16686:16686"
    command: /go/bin/query-linux -cassandra.servers=cassandra -cassandra.keyspace=jaeger_v1_dc1 --query.static-files=/go/jaeger-ui/
networks:
  jaeger: #defaultnetwork
volumes:
  cassandra-data: {}
  cassandra-logs: {}

This still has some issues. First of all running it only successuflly starts cassandra and the agent at the first go. once the cassandra is up i run ocker run -d --network=jaeger_jaeger jaegertracing/jaeger-cassandra-schema:0.5.2 to setup the keyspace as expected. After that i restart the collector and query container which will start up healthy now.

i then run a little testscript that sends traces to the agent via the node jaeger-client. I verified that the udp packages are arriving on the machine the agent is running on.

The agent it self says that it had successfully connected the collector

{"level":"info","ts":1501081173.181871,"caller":"peerlistmgr/peer_list_mgr.go:172","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"} {"level":"info","ts":1501081173.2294185,"caller":"peerlistmgr/peer_list_mgr.go:177","msg":"Connected to peer","host:port":"[::]:14267"}

Unfortunately i can't see any traces when i connect the ui. Therefore the api endpoint ...:16686/api/services rightfully yield:

{"data":null,"total":0,"limit":0,"offset":0,"errors":null}

Checking the service endpoint of the agent (...5778/sampling?service=import-test) gives me a suspicious message:

tcollector error: no peers available

I just had a quick chat with @yurishkuro who helped me pinpoint the above mentioned until here and we agreed on me starting an issue to properly track any progress we can make in this regard.

I can't figure out any way to log more verbosely along the chain of involved components and thus i'm a little clueless where to look next what might be the problem. Any help would be much appreciated!

Ty in advance!

@jpkrohling
Copy link
Contributor

Are you able to enter the Agent's container and contact the Collector? The very first test could be something like:

$ docker exec -it CONTAINER_NAME bash
bash$ ping jaeger-collector

If it works, then a curl to the jaeger-collector:14268 would be useful.

@jpkrohling
Copy link
Contributor

Actually, it looks like the connection was made:

{"level":"info","ts":1501081173.181871,"caller":"peerlistmgr/peer_list_mgr.go:172","msg":"Trying to connect to peer","host:port":"jaeger-collector:14268"} {"level":"info","ts":1501081173.2294185,"caller":"peerlistmgr/peer_list_mgr.go:177","msg":"Connected to peer","host:port":"[::]:14268"}

@de-robat
Copy link
Author

de-robat commented Jul 26, 2017

ok, i just double checked it, and noticed that i tinkered with the port. so when i'm setting the agents collector setting to:
-collector.host-port=jaeger-collector:14267 then everything works

{"level":"info","ts":1501085415.011374,"caller":"peerlistmgr/peer_list_mgr.go:172","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"}
{"level":"info","ts":1501085415.0903237,"caller":"peerlistmgr/peer_list_mgr.go:177","msg":"Connected to peer","host:port":"[::]:14267"}

when i go with -collector.host-port=jaeger-collector:14268
then the log wont say it connected, it just keeps prinitng:

{"level":"info","ts":1501085384.3659136,"caller":"peerlistmgr/peer_list_mgr.go:165","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1501085384.3659976,"caller":"peerlistmgr/peer_list_mgr.go:172","msg":"Trying to connect to peer","host:port":"jaeger-collector:14268"}

i adjusted the compose file above to reflect my setup more properly.

connecting the agent and running the cmds you suggested yield:

[root@8a73f12339e2 /]# ping jaeger-collector
PING jaeger-collector (192.168.0.2) 56(84) bytes of data.
64 bytes from jaeger_jaeger-collector_1.jaeger_jaeger (192.168.0.2): icmp_seq=1 ttl=64 time=0.132 ms
64 bytes from jaeger_jaeger-collector_1.jaeger_jaeger (192.168.0.2): icmp_seq=2 ttl=64 time=0.165 ms
64 bytes from jaeger_jaeger-collector_1.jaeger_jaeger (192.168.0.2): icmp_seq=3 ttl=64 time=0.129 ms
64 bytes from jaeger_jaeger-collector_1.jaeger_jaeger (192.168.0.2): icmp_seq=4 ttl=64 time=0.137 ms
--- jaeger-collector ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.129/0.140/0.165/0.020 ms
[root@8a73f12339e2 /]# curl jaeger-collector:14268
404 page not found
[root@8a73f12339e2 /]# curl jaeger-collector:14267
[TIMEOUT]

@mabn
Copy link

mabn commented Jul 27, 2017

FYI - this is what I use, works fine:

version: '2'
services:
  cassandra:
    image: cassandra:3.11
    environment:
      CASSANDRA_BROADCAST_ADDRESS: localhost
    volumes:
      - /opt/cassandra:/var/lib/cassandra
    network_mode: host
    logging:
      options:
        max-size: 5m

  jaeger_query:
    image: jaegertracing/jaeger-query
    network_mode: host
    command: /go/bin/query-linux --query.static-files=/go/jaeger-ui/ -cassandra.keyspace jaeger_v1_test -cassandra.servers localhost -cassandra.connections-per-host 2 -query.port 8080

  jaeger_collector:
    image: jaegertracing/jaeger-collector
    logging:
      options:
        max-size: 5m
    network_mode: host
    command: /go/bin/collector-linux -cassandra.keyspace jaeger_v1_test -cassandra.servers localhost -cassandra.connections-per-host 2

I added max-size for logs because cassandra has tendency to go OOM and then query/collector start spamming with errors, fill up the volume with docker logs and everything dies. I have no clue why, it just looks like it leaks memory (in 3.10). Setting memtable_allocation_type: offheap_objects helped. Also in 3.11 it seems that things are fine with the default "heap_buffers".

net=host makes things simple

@yurishkuro
Copy link
Member

@de-robat it would be nice to have a proven docker-compose file in the main repo that people could "just run" - it's certainly simpler than deploying to k8s. Do you think you could do a pull request?

Regarding the keyspace initialization, I think you can use depends_on to ensure that jaeger components don't start until Cassandra is running and the keyspace has been installed.

@de-robat
Copy link
Author

Yes, i'll do a pull request once i got it working to my satisfaction.

@de-robat
Copy link
Author

@jpkrohling had the time to do more testing today. replies to your suggestions are: i can ping the collector from within the agents container, and a curl to ...:14268 yields a "404" .

@phal0r
Copy link

phal0r commented Jul 28, 2017

@mabn
How do you initialize the keyspace? We use https://hub.docker.com/r/jaegertracing/jaeger-cassandra-schema/ without ENV params (which means that the defaults of the script apply -> mode: test, keyspace name: jaeger_v1_dc1)

@de-robat
Copy link
Author

@mabn and if you mind to respond would you please add the versions of the jaeger components you used. the provided compose-file does not work either for me :/

@jpkrohling
Copy link
Contributor

@de-robat I promise I'll try to get your setup running locally soon, but in the meantime, would you try also running on OpenShift/Kubernetes, just to rule out possible issues specific to your environment?

https://github.com/jaegertracing/jaeger-kubernetes
https://github.com/jaegertracing/jaeger-openshift

Both do use Docker at the lower levels (but not compose!).

@mabn
Copy link

mabn commented Jul 28, 2017

I initialized the keyspace manually:

docker run -ti --rm -e MODE=test -e DATACENTER=datacenter1 -e CQLSH_HOST=localhost -e CASSANDRA_WAIT_TIMEOUT=5 \
  jaegertracing/jaeger-cassandra-schema /cassandra-schema/create.sh > schema.cql

Then open the file and remove the unnecessary comment at the beginning (few lines).

Then:

docker run -ti --rm --net=host -v $(pwd)/schema.cql:/schema.cql cassandra:3.11 cqlsh localhost -f /schema.cql

There was some problem with just running jaeger-cassandra-schema container but I don't remember what it was, maybe it works now.

I didn't specify the versions so it was using "latest" which is something from early July (not the most recent).

I'm testing jaeger on ECS now and things seem to work correctly there as well. Setting up Cassandra is the tricky part.

@de-robat
Copy link
Author

Hey guys, just a quick update: Right now i'm seeing traces in the ui. So the setup seems to work, even though ...:5778/sampling?service=import-test still gives me tcollector error: no peers available .

What did we change? Well basically, we are using a docker configured network, and our jager-agents port bindings were just wrong. All the communication happens via udp, so we needed to expose udp to get it running:

 ports:
      - "5775:5775/udp"
      - "5778:5778"
      - "6831:6831/udp"
      - "6832:6832/udp"

From here on out we'll work further to get a proper compose-file "production" ready. Thanks again for the support!

@pavolloffay
Copy link
Member

done in #493

@ervinb
Copy link

ervinb commented Feb 5, 2019

Just a quick note about the 404 response:

...
[root@8a73f12339e2 /]# curl jaeger-collector:14268
404 page not found
...

The collector responds to POST requests only, and that's why it's returning a 404. You'd need to run the following, to get a response:

curl -X POST jaeger-collector:14268/api/traces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants