Skip to content

Commit

Permalink
Port servicegraphprocessor to servicegraphconnector (#18389)
Browse files Browse the repository at this point in the history
  • Loading branch information
djaglowski authored Feb 11, 2023
1 parent 2ce8945 commit feb6c08
Show file tree
Hide file tree
Showing 19 changed files with 918 additions and 29 deletions.
16 changes: 16 additions & 0 deletions .chloggen/servicegraphconnector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: servicegraphconnector

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add preliminary implementation as a connector.

# One or more tracking issues related to the change
issues: [18389]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ cmd/telemetrygen/ @open-telemetry/collect
confmap/provider/s3provider/ @open-telemetry/collector-contrib-approvers @Aneurysm9

connector/countconnector/ @open-telemetry/collector-contrib-approvers @djaglowski @jpkrohling
connector/servicegraphconnector/ @open-telemetry/collector-contrib-approvers @jpkrohling @mapno

examples/demo/ @open-telemetry/collector-contrib-approvers @open-telemetry/collector-approvers

Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ body:
- cmd/telemetrygen
- confmap/provider/s3provider
- connector/count
- connector/servicegraph
- examples/demo
- exporter/alibabacloudlogservice
- exporter/awscloudwatchlogs
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ body:
- cmd/telemetrygen
- confmap/provider/s3provider
- connector/count
- connector/servicegraph
- examples/demo
- exporter/alibabacloudlogservice
- exporter/awscloudwatchlogs
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/other.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ body:
- cmd/telemetrygen
- confmap/provider/s3provider
- connector/count
- connector/servicegraph
- examples/demo
- exporter/alibabacloudlogservice
- exporter/awscloudwatchlogs
Expand Down
4 changes: 4 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ updates:
directory: "/connector/countconnector"
schedule:
interval: "weekly"
- package-ecosystem: "gomod"
directory: "/connector/servicegraphconnector"
schedule:
interval: "weekly"
- package-ecosystem: "gomod"
directory: "/examples/demo/client"
schedule:
Expand Down
1 change: 1 addition & 0 deletions connector/servicegraphconnector/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
141 changes: 141 additions & 0 deletions connector/servicegraphconnector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Service Graph Connector

| Status | |
|------------------------- |---------------------------------------------------------- |
| Stability | [in development] |
| Supported pipeline types | See [Supported Pipeline Types](#supported-pipeline-types) |
| Distributions | [] |

## Supported Pipeline Types

| [Exporter Pipeline Type] | [Receiver Pipeline Type] |
| ------------------------ | ------------------------ |
| traces | metrics |

## Overview

The service graphs connector builds a map representing the interrelationships between various services in a system.
The connector will analyse trace data and generate metrics describing the relationship between the services.
These metrics can be used by data visualization apps (e.g. Grafana) to draw a service graph.

Service graphs are useful for a number of use-cases:

* Infer the topology of a distributed system. As distributed systems grow, they become more complex. Service graphs can help you understand the structure of the system.
* Provide a high level overview of the health of your system.
Service graphs show error rates, latencies, among other relevant data.
* Provide an historic view of a system’s topology.
Distributed systems change very frequently,
and service graphs offer a way of seeing how these systems have evolved over time.

This component is based on [Grafana Tempo's service graph processor](https://github.com/grafana/tempo/tree/main/modules/generator/processor/servicegraphs).

## How it works

Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request.
The connector uses the OpenTelemetry semantic conventions to detect a myriad of requests.
It currently supports the following requests:

* A direct request between two services where the outgoing and the incoming span must have `span.kind` client and server respectively.
* A request across a messaging system where the outgoing and the incoming span must have `span.kind` producer and consumer respectively.
* A database request; in this case the connector looks for spans containing attributes `span.kind`=client as well as db.name.

Every span that can be paired up to form a request is kept in an in-memory store,
until its corresponding pair span is received or the maximum waiting time has passed.
When either of these conditions are reached, the request is recorded and removed from the local store.

Each emitted metrics series have the client and server label corresponding with the service doing the request and the service receiving the request.

```
traces_service_graph_request_total{client="app", server="db", connection_type="database"} 20
```

TLDR: The connector will try to find spans belonging to requests as seen from the client and the server and will create a metric representing an edge in the graph.

## Metrics

The following metrics are emitted by the connector:

| Metric | Type | Labels | Description |
|---------------------------------------------|-----------|---------------------------------|--------------------------------------------------------------|
| traces_service_graph_request_total | Counter | client, server, connection_type | Total count of requests between two nodes |
| traces_service_graph_request_failed_total | Counter | client, server, connection_type | Total count of failed requests between two nodes |
| traces_service_graph_request_server_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the server |
| traces_service_graph_request_client_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the client |
| traces_service_graph_unpaired_spans_total | Counter | client, server, connection_type | Total count of unpaired spans |
| traces_service_graph_dropped_spans_total | Counter | client, server, connection_type | Total count of dropped spans |

Duration is measured both from the client and the server sides.

Possible values for `connection_type`: unset, `messaging_system`, or `database`.

Additional labels can be included using the `dimensions` configuration option. Those labels will have a prefix to mark where they originate (client or server span kinds).
The `client_` prefix relates to the dimensions coming from spans with `SPAN_KIND_CLIENT`, and the `server_` prefix relates to the
dimensions coming from spans with `SPAN_KIND_SERVER`.

Since the service graph connector has to process both sides of an edge,
it needs to process all spans of a trace to function properly.
If spans of a trace are spread out over multiple instances, spans are not paired up reliably.
A possible solution to this problem is using the [load balancing exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/loadbalancingexporter)
in a layer on front of collector instances running this connector.

## Visualization

Service graph metrics are natively supported by Grafana since v9.0.4.
To run it, configure a Tempo data source's 'Service Graphs' by linking to the Prometheus backend where metrics are being sent:

```yaml
apiVersion: 1
datasources:
# Prometheus backend where metrics are sent
- name: Prometheus
type: prometheus
uid: prometheus
url: <prometheus-url>
jsonData:
httpMethod: GET
version: 1
- name: Tempo
type: tempo
uid: tempo
url: <tempo-url>
jsonData:
httpMethod: GET
serviceMap:
datasourceUid: 'prometheus'
version: 1
```
## Example configuration
```yaml
receivers:
otlp:
protocols:
grpc:

connectors:
servicegraph:
latency_histogram_buckets: [1,2,3,4,5]
dimensions:
- dimension-1
- dimension-2
store:
ttl: 1s
max_items: 10

exporters:
prometheus/servicegraph:
endpoint: localhost:9090
namespace: servicegraph

service:
pipelines:
traces:
receivers: [otlp]
exporters: [servicegraph]
metrics/servicegraph:
receivers: [servicegraph]
exporters: [prometheus/servicegraph]
```
[in development]: https://github.com/open-telemetry/opentelemetry-collector#development
19 changes: 19 additions & 0 deletions connector/servicegraphconnector/factory.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package servicegraphconnector // import "github.com/open-telemetry/opentelemetry-collector-contrib/connector/servicegraphconnector"

import "github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor"

var NewFactory = servicegraphprocessor.NewConnectorFactory
41 changes: 41 additions & 0 deletions connector/servicegraphconnector/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
module github.com/open-telemetry/opentelemetry-collector-contrib/connector/servicegraphconnector

go 1.18

require github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor v0.71.0

require (
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.2 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/knadh/koanf v1.5.0 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/collector v0.71.0 // indirect
go.opentelemetry.io/collector/component v0.71.0 // indirect
go.opentelemetry.io/collector/confmap v0.71.0 // indirect
go.opentelemetry.io/collector/consumer v0.71.0 // indirect
go.opentelemetry.io/collector/featuregate v0.71.0 // indirect
go.opentelemetry.io/collector/pdata v1.0.0-rc5 // indirect
go.opentelemetry.io/collector/semconv v0.71.0 // indirect
go.opentelemetry.io/otel v1.13.0 // indirect
go.opentelemetry.io/otel/metric v0.36.0 // indirect
go.opentelemetry.io/otel/trace v1.13.0 // indirect
go.uber.org/atomic v1.10.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
go.uber.org/zap v1.24.0 // indirect
golang.org/x/net v0.5.0 // indirect
golang.org/x/sys v0.4.0 // indirect
golang.org/x/text v0.6.0 // indirect
google.golang.org/genproto v0.0.0-20221202195650-67e5cbc046fd // indirect
google.golang.org/grpc v1.52.3 // indirect
google.golang.org/protobuf v1.28.1 // indirect
)

retract v0.65.0

replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor => ../../processor/servicegraphprocessor/
Loading

0 comments on commit feb6c08

Please sign in to comment.