From dc19ef73a35ee9f028a029a1fd441a0d285d9af0 Mon Sep 17 00:00:00 2001 From: Michael Perel Date: Tue, 25 May 2021 22:28:31 -0400 Subject: [PATCH] update README --- README.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 5232d12..aad5918 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,27 @@ # What is this? -This is a demo of Open Telemetry's distributed tracing capabilities. -In `docker-compose.yml` there are variety of services: +This is a demo of Open Telemetry's distributed tracing capabilities, using +a dummy application described by this architecture diagram: + +![Architecture](./docs/architecture.png) + +> Note: The solid arrows describe how the services send all non-telemetry data + to each other (for instance, http requests). The dotted arrows describes the + flow of all telemetry data (traces). + +These depicted services can be found in `docker-compose.yml`: * `client` - a service that sends a few requests to the server * `server` - a service that implements an HTTP server and publishes a message per request via [redis' pubsub](https://redis.io/topics/pubsub) * `worker` - a service that listens for messages on redis' pubsub and does work when a message is published +* `redis` - an open source key value store that is used for its + lightweight pubsub message broker capabilities * `jaeger` - an open source telemetry backend * `zipkin` - an open source telemetry backend * `otel-agent` - a service that receives traces from `server` and `client` * `otel-collector` - a service that receives traces forwarded from `otel-agent` and exports them to `jaeger` and `zipkin` -![Architecture](./docs/architecture.png) - # Why is this interesting? 1. By using Open Telemetry with the collector, backends are swappable and all services handle tracing in the same way, regardless of programming @@ -68,6 +76,42 @@ a trace, you can see the distributed spans that make up the trace: ![Spans](./docs/jaeger-span.png) +# Advice for running in production +## Do I *really* need agents/collectors? +In my opinion, the answer is almost always no. Here are cases where I believe +they are needed: + 1. You would like to change the telemetry backend without redeploying your + application. + 2. You would like to limit your points of egress. With the collector, + the only time where a network request would be made to a third party + would be when the collector sends traces to commercial backend, + such as Data Dog. + +Keep in mind the added complexity of using agents/collectors. + +Deploying an agent alongside every service introduces more configuration and +requires more compute. It also increases the surface area for bugs. +What if something goes wrong in the agent? If it is deployed as a sidecar, +would it affect the main service? + +Managing a collector in a large application may actually mean managing a +cluster of collectors behind a load balancer. This comes with the typical +headaches of managing any cluster, including extra responsibility and cost. + +## How can I use Open Telemetry without agents/collectors? +Determine if an exporter library for your desired backend(s) exist(s) that +do not require agents/collectors exist by searching +[the official registry](https://opentelemetry.io/registry/). When searching for +"Azure Monitor" one of the first results links to +[Azure Monitor exporter for python](https://github.com/microsoft/opentelemetry-azure-monitor-python). + +If this repo were written in python, to use the library, you would replace code +for the agent exporter in `pkg/tracer` with the Azure Monitor exporter. You +would no longer need agents or collectors. + +With serverless, this can be especially useful because it is often harder +to deploy agents and collectors. + # How to navigate the code? Start by reading the comments in `cmd/client/client.go`. They describe how to create a trace that propagates to the server via