-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ce12913
commit 137b312
Showing
18 changed files
with
2,469 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# Zipkin Plugin | ||
|
||
This plugin implements the Zipkin http server to gather trace and timing data needed to troubleshoot latency problems in microservice architectures. | ||
|
||
*Please Note: This plugin is experimental; Its data schema may be subject to change | ||
based on its main usage cases and the evolution of the OpenTracing standard.* | ||
|
||
## Configuration: | ||
```toml | ||
[[inputs.zipkin]] | ||
path = "/api/v1/spans" # URL path for span data | ||
port = 9411 # Port on which Telegraf listens | ||
``` | ||
|
||
## Tracing: | ||
|
||
This plugin uses Annotations tags and fields to track data from spans | ||
|
||
- __TRACE:__ is a set of spans that share a single root span. | ||
Traces are built by collecting all Spans that share a traceId. | ||
|
||
- __SPAN:__ is a set of Annotations and BinaryAnnotations that correspond to a particular RPC. | ||
|
||
- __Annotations:__ for each annotation & binary annotation of a span a metric is output. *Records an occurrence in time at the beginning and end of a request.* | ||
|
||
Annotations may have the following values: | ||
|
||
- __CS (client start):__ beginning of span, request is made. | ||
- __SR (server receive):__ server receives request and will start processing it | ||
network latency & clock jitters differ it from cs | ||
- __SS (server send):__ server is done processing and sends request back to client | ||
amount of time it took to process request will differ it from sr | ||
- __CR (client receive):__ end of span, client receives response from server | ||
RPC is considered complete with this annotation | ||
|
||
### Tags | ||
* __"id":__ The 64 bit ID of the span. | ||
* __"parent_id":__ An ID associated with a particular child span. If there is no child span, the parent ID is set to ID. | ||
* __"trace_id":__ The 64 or 128-bit ID of a particular trace. Every span in a trace shares this ID. Concatenation of high and low and converted to hexadecimal. | ||
* __"name":__ Defines a span | ||
|
||
##### Annotations have these additional tags: | ||
|
||
* __"service_name":__ Defines a service | ||
* __"annotation":__ The value of an annotation | ||
* __"endpoint_host":__ Listening port concat with IPV4, if port is not present it will not be concatenated | ||
|
||
##### Binary Annotations have these additional tag: | ||
|
||
* __"service_name":__ Defines a service | ||
* __"annotation":__ The value of an annotation | ||
* __"endpoint_host":__ Listening port concat with IPV4, if port is not present it will not be concatenated | ||
* __"annotation_key":__ label describing the annotation | ||
|
||
|
||
### Fields: | ||
* __"duration_ns":__ The time in nanoseconds between the end and beginning of a span. | ||
|
||
|
||
|
||
### Sample Queries: | ||
|
||
__Get All Span Names for Service__ `my_web_server` | ||
```sql | ||
SHOW TAG VALUES FROM "zipkin" with key="name" WHERE "service_name" = 'my_web_server' | ||
``` | ||
- __Description:__ returns a list containing the names of the spans which have annotations with the given `service_name` of `my_web_server`. | ||
|
||
__Get All Service Names__ | ||
```sql | ||
SHOW TAG VALUES FROM "zipkin" WITH KEY = "service_name" | ||
``` | ||
- __Description:__ returns a list of all `distinct` endpoint service names. | ||
|
||
__Find spans with longest duration__ | ||
```sql | ||
SELECT max("duration_ns") FROM "zipkin" WHERE "service_name" = 'my_service' AND "name" = 'my_span_name' AND time > now() - 20m GROUP BY "trace_id",time(30s) LIMIT 5 | ||
``` | ||
- __Description:__ In the last 20 minutes find the top 5 longest span durations for service `my_server` and span name `my_span_name` | ||
|
||
|
||
### Recommended InfluxDB setup | ||
|
||
This test will create high cardinality data so we reccomend using the [tsi influxDB engine](https://www.influxdata.com/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/). | ||
#### How To Set Up InfluxDB For Work With Zipkin | ||
|
||
##### Steps | ||
1. ___Update___ InfluxDB to >= 1.3, in order to use the new tsi engine. | ||
|
||
2. ___Generate___ a config file with the following command: | ||
```sh | ||
influxd config > /path/for/config/file | ||
``` | ||
3. ___Add___ the following to your config file, under the `[data]` tab: | ||
```toml | ||
[data] | ||
index-version = "tsi1" | ||
``` | ||
|
||
4. ___Start___ `influxd` with your new config file: | ||
```sh | ||
influxd -config=/path/to/your/config/file | ||
``` | ||
|
||
5. ___Update___ your retention policy: | ||
```sql | ||
ALTER RETENTION POLICY "autogen" ON "telegraf" DURATION 1d SHARD DURATION 30m | ||
``` | ||
|
||
### Example Input Trace: | ||
|
||
- [Cli microservice with two services Test](https://github.com/openzipkin/zipkin-go-opentracing/tree/master/examples/cli_with_2_services) | ||
- [Test data from distributed trace repo sample json](https://github.com/mattkanwisher/distributedtrace/blob/master/testclient/sample.json) | ||
#### [Trace Example from Zipkin model](http://zipkin.io/pages/data_model.html) | ||
```json | ||
{ | ||
"traceId": "bd7a977555f6b982", | ||
"name": "query", | ||
"id": "be2d01e33cc78d97", | ||
"parentId": "ebf33e1a81dc6f71", | ||
"timestamp": 1458702548786000, | ||
"duration": 13000, | ||
"annotations": [ | ||
{ | ||
"endpoint": { | ||
"serviceName": "zipkin-query", | ||
"ipv4": "192.168.1.2", | ||
"port": 9411 | ||
}, | ||
"timestamp": 1458702548786000, | ||
"value": "cs" | ||
}, | ||
{ | ||
"endpoint": { | ||
"serviceName": "zipkin-query", | ||
"ipv4": "192.168.1.2", | ||
"port": 9411 | ||
}, | ||
"timestamp": 1458702548799000, | ||
"value": "cr" | ||
} | ||
], | ||
"binaryAnnotations": [ | ||
{ | ||
"key": "jdbc.query", | ||
"value": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?", | ||
"endpoint": { | ||
"serviceName": "zipkin-query", | ||
"ipv4": "192.168.1.2", | ||
"port": 9411 | ||
} | ||
}, | ||
{ | ||
"key": "sa", | ||
"value": true, | ||
"endpoint": { | ||
"serviceName": "spanstore-jdbc", | ||
"ipv4": "127.0.0.1", | ||
"port": 3306 | ||
} | ||
} | ||
] | ||
} | ||
``` |
75 changes: 75 additions & 0 deletions
75
plugins/inputs/zipkin/cmd/stress_test_write/stress_test_write.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
/* | ||
This is a development testing cli tool meant to stress the zipkin telegraf plugin. | ||
It writes a specified number of zipkin spans to the plugin endpoint, with other | ||
parameters which dictate batch size and flush timeout. | ||
Usage as follows: | ||
`./stress_test_write -batch_size=<batch_size> -max_backlog=<max_span_buffer_backlog> -batch_interval=<batch_interval_in_seconds> -span_count<number_of_spans_to_write> -zipkin_host=<zipkin_service_hostname>` | ||
Or with a timer: | ||
`time ./stress_test_write -batch_size=<batch_size> -max_backlog=<max_span_buffer_backlog> -batch_interval=<batch_interval_in_seconds> -span_count<number_of_spans_to_write> -zipkin_host=<zipkin_service_hostname>` | ||
However, the flag defaults work just fine for a good write stress test (and are what | ||
this tool has mainly been tested with), so there shouldn't be much need to | ||
manually tweak the parameters. | ||
*/ | ||
|
||
package main | ||
|
||
import ( | ||
"flag" | ||
"fmt" | ||
"log" | ||
"time" | ||
|
||
zipkin "github.com/openzipkin/zipkin-go-opentracing" | ||
) | ||
|
||
var ( | ||
BatchSize int | ||
MaxBackLog int | ||
BatchTimeInterval int | ||
SpanCount int | ||
ZipkinServerHost string | ||
) | ||
|
||
const usage = `./stress_test_write -batch_size=<batch_size> -max_backlog=<max_span_buffer_backlog> -batch_interval=<batch_interval_in_seconds> -span_count<number_of_spans_to_write> -zipkin_host=<zipkin_service_hostname>` | ||
|
||
func init() { | ||
flag.IntVar(&BatchSize, "batch_size", 10000, usage) | ||
flag.IntVar(&MaxBackLog, "max_backlog", 100000, usage) | ||
flag.IntVar(&BatchTimeInterval, "batch_interval", 1, usage) | ||
flag.IntVar(&SpanCount, "span_count", 100000, usage) | ||
flag.StringVar(&ZipkinServerHost, "zipkin_host", "localhost", usage) | ||
} | ||
|
||
func main() { | ||
flag.Parse() | ||
var hostname = fmt.Sprintf("http://%s:9411/api/v1/spans", ZipkinServerHost) | ||
collector, err := zipkin.NewHTTPCollector( | ||
hostname, | ||
zipkin.HTTPBatchSize(BatchSize), | ||
zipkin.HTTPMaxBacklog(MaxBackLog), | ||
zipkin.HTTPBatchInterval(time.Duration(BatchTimeInterval)*time.Second)) | ||
defer collector.Close() | ||
if err != nil { | ||
log.Fatalf("Error intializing zipkin http collector: %v\n", err) | ||
} | ||
|
||
tracer, err := zipkin.NewTracer( | ||
zipkin.NewRecorder(collector, false, "127.0.0.1:0", "trivial")) | ||
|
||
if err != nil { | ||
log.Fatalf("Error: %v\n", err) | ||
} | ||
|
||
log.Printf("Writing %d spans to zipkin server at %s\n", SpanCount, hostname) | ||
for i := 0; i < SpanCount; i++ { | ||
parent := tracer.StartSpan("Parent") | ||
parent.LogEvent(fmt.Sprintf("Trace%d", i)) | ||
parent.Finish() | ||
} | ||
log.Println("Done. Flushing remaining spans...") | ||
} |
Oops, something went wrong.