Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the spanmetrics connector #4452

Merged
merged 12 commits into from
May 27, 2023
48 changes: 48 additions & 0 deletions docker-compose/monitor/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
.PHONY: build
build: export DOCKER_TAG = dev
build: clean-jaeger
cd ../../ && \
make build-all-in-one && \
make docker-images-jaeger-backend

# run starts up the system required for SPM using the latest jaeger and otel images.
.PHONY: run
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
run: export JAEGER_IMAGE_TAG = latest
run: _run-connector

# run starts up the system required for SPM using the latest otel image and a development jaeger image.
# Note: the jaeger "dev" image can be built with "make build".
.PHONY: run-dev
run-dev: export JAEGER_IMAGE_TAG = dev
run-dev: _run-connector

# _run-connector is the base target to bring up the system required for SPM using the new OTEL spanmetrics connector.
.PHONY: _run-connector
_run-connector: export OTEL_IMAGE_TAG = latest
_run-connector: export OTEL_CONFIG_SRC = ./otel-collector-config-connector.yml
_run-connector: export PROMETHEUS_QUERY_SUPPORT_SPANMETRICS_CONNECTOR = true
_run-connector:
docker compose -f docker-compose.yml up

# run the older spanmetrics processor setup, for example,
# to test backwards compatibility of Jaeger with spanmetrics processor.
.PHONY: run-dev-processor
run-dev-processor: export JAEGER_IMAGE_TAG = dev
# Fix to a version before the breaking changes were introduced.
run-dev-processor: export OTEL_IMAGE_TAG = 0.70.0
run-dev-processor: export OTEL_CONFIG_SRC = ./otel-collector-config-processor.yml
run-dev-processor:
docker compose -f docker-compose.yml up

.PHONY: clean-jaeger
clean-jaeger:
# Also cleans up intermediate cached containers.
docker system prune -f

.PHONY: clean-all
clean-all: clean-jaeger
docker rmi -f jaegertracing/all-in-one:dev ; \
docker rmi -f jaegertracing/all-in-one:latest ; \
docker rmi -f otel/opentelemetry-collector-contrib:latest ; \
docker rmi -f prom/prometheus:latest ; \
docker rmi -f grafana/grafana:latest
88 changes: 79 additions & 9 deletions docker-compose/monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This environment consists the following backend components:

- [MicroSim](https://github.com/yurishkuro/microsim): a program to simulate traces.
- [Jaeger All-in-one](https://www.jaegertracing.io/docs/1.24/getting-started/#all-in-one): the full Jaeger stack in a single container image.
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/): vendor agnostic integration layer for traces and metrics. Its main role in this particular development environment is to receive Jaeger spans, forward these spans untouched to Jaeger All-in-one while simultaneously aggregating metrics out of this span data. To learn more about span metrics aggregation, please refer to the [spanmetrics processor documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor).
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/): vendor agnostic integration layer for traces and metrics. Its main role in this particular development environment is to receive Jaeger spans, forward these spans untouched to Jaeger All-in-one while simultaneously aggregating metrics out of this span data. To learn more about span metrics aggregation, please refer to the [spanmetrics processor documentation][spanmetricsprocessor].
- [Prometheus](https://prometheus.io/): a metrics collection and query engine, used to scrape metrics computed by OpenTelemetry Collector, and presents an API for Jaeger All-in-one to query these metrics.
- [Grafana](https://grafana.com/): a metrics visualization, analytics & monitoring solution supporting multiple metrics databases.

Expand All @@ -26,11 +26,13 @@ The following diagram illustrates the relationship between these components:

# Getting Started
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved

## Bring up/down the dev environment
## Quickstart

```bash
docker compose up
docker compose down
This brings up the system necessary to use the SPM feature locally.
It uses the latest image tags from both Jaeger and OpenTelemetry.

```shell
make run
```

**Tips:**
Expand All @@ -42,12 +44,37 @@ docker compose down
**Warning:** The included [docker-compose.yml](./docker-compose.yml) file uses the `latest` version of Jaeger and other components. If your local Docker registry already contains older versions, which may still be tagged as `latest`, you may want to delete those images before running the full set, to ensure consistent behavior:

```bash
docker rmi -f jaegertracing/all-in-one:latest
docker rmi -f otel/opentelemetry-collector-contrib:latest
docker rmi -f prom/prometheus:latest
docker rmi -f grafana/grafana:latest
make clean-all
```

## Development

These steps allow for running the system necessary for SPM, built from Jaeger's source.

The primary use case is for testing source code changes to the SPM feature locally.

### Build jaeger-all-in-one docker image

```shell
make build
```

## Bring up the dev environment

```bash
make run-dev
```

## Backwards compatibility testing with spanmetrics processor

```bash
make run-dev-processor
```

For each "run" make target, you should expect to see the following in the Monitor tab after a few minutes:

![Monitor Screenshot](images/startup-monitor-tab.png)

## Sending traces

It is possible to send traces to this SPM Development Environment from your own application and viewing their RED metrics.
Expand Down Expand Up @@ -83,6 +110,45 @@ Then navigate to the Monitor tab at http://localhost:16686/monitor to view the R

![My Service RED Metrics](images/my_service_metrics.png)

## Migrating to Span Metrics Connector
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved

### Background

A new [Connector](https://pkg.go.dev/go.opentelemetry.io/collector/connector#section-readme) API was introduced
to the OpenTelemetry Collector to provide a means of receiving and exporting between any type of telemetry.

The existing [Span Metrics Processor][spanmetricsprocessor] was a good candidate to migrate over to the connector type,
resulting in the new [Span Metrics Connector][spanmetricsconnector] component.

The Span Metrics Connector variant introduces some [breaking changes][processor-to-connector], and the following
section aims to provide the instructions necessary to use the metrics produced by this component.

### Migrating

Assuming the OpenTelemetry Collector is running with the [Span Metrics Connector][spanmetricsconnector] correctly
configured, the minimum configuration required for jaeger-query or jaeger-all-in-one are as follows:

as command line parameters:
```shell
--prometheus.query.support-spanmetrics-connector=true
```

as environment variables:
```shell
PROMETHEUS_QUERY_SUPPORT_SPANMETRICS_CONNECTOR=true
```

If the Span Metrics Connector is configured with a namespace and/or an alternative duration unit,
the following configuration options are available, as both command line and environment variables:

```shell
--prometheus.query.namespace=span_metrics
--prometheus.query.duration-unit=s

PROMETHEUS_QUERY_NAMESPACE=span_metrics
PROMETHEUS_QUERY_DURATION_UNIT=s
```

## Querying the HTTP API

### Example 1
Expand Down Expand Up @@ -247,3 +313,7 @@ $ curl http://localhost:16686/api/metrics/minstep | jq .
]
}
```

[spanmetricsprocessor]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor
[spanmetricsconnector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector
[processor-to-connector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector#span-to-metrics-processor-to-span-to-metrics-connector
10 changes: 7 additions & 3 deletions docker-compose/monitor/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,25 @@ services:
jaeger:
networks:
- backend
image: jaegertracing/all-in-one:latest
image: jaegertracing/all-in-one:${JAEGER_IMAGE_TAG}
volumes:
- "./jaeger-ui.json:/etc/jaeger/jaeger-ui.json"
command: --query.ui-config /etc/jaeger/jaeger-ui.json
environment:
- METRICS_STORAGE_TYPE=prometheus
- PROMETHEUS_SERVER_URL=http://prometheus:9090
- LOG_LEVEL=debug
- PROMETHEUS_QUERY_SUPPORT_SPANMETRICS_CONNECTOR=${PROMETHEUS_QUERY_SUPPORT_SPANMETRICS_CONNECTOR}
- PROMETHEUS_QUERY_NAMESPACE=${PROMETHEUS_QUERY_NAMESPACE}
- PROMETHEUS_QUERY_DURATION_UNIT=${PROMETHEUS_QUERY_DURATION_UNIT}
ports:
- "16686:16686"
otel_collector:
networks:
- backend
image: otel/opentelemetry-collector-contrib:latest
image: otel/opentelemetry-collector-contrib:${OTEL_IMAGE_TAG}
volumes:
- "./otel-collector-config.yml:/etc/otelcol/otel-collector-config.yml"
- ${OTEL_CONFIG_SRC}:/etc/otelcol/otel-collector-config.yml
command: --config /etc/otelcol/otel-collector-config.yml
ports:
- "4317:4317"
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions docker-compose/monitor/otel-collector-config-connector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14278"

otlp:
protocols:
grpc:
http:

exporters:
prometheus:
endpoint: "0.0.0.0:8889"

jaeger:
endpoint: "jaeger:14250"
tls:
insecure: true

connectors:
spanmetrics:

processors:
batch:

service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [batch]
exporters: [spanmetrics, jaeger]
# The exporter name in this pipeline must match the spanmetrics.metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
4 changes: 4 additions & 0 deletions pkg/prometheus/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,8 @@ type Configuration struct {
ConnectTimeout time.Duration
TLS tlscfg.Options
TokenFilePath string

SupportSpanmetricsConnector bool
MetricNamespace string
LatencyUnit string
}
2 changes: 1 addition & 1 deletion plugin/metrics/prometheus/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func (f *Factory) AddFlags(flagSet *flag.FlagSet) {
// InitFromViper implements plugin.Configurable.
func (f *Factory) InitFromViper(v *viper.Viper, logger *zap.Logger) {
if err := f.options.InitFromViper(v); err != nil {
logger.Fatal("Failed to initialize metrics storage factory", zap.Error(err))
logger.Panic("Failed to initialize metrics storage factory", zap.Error(err))
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down
39 changes: 37 additions & 2 deletions plugin/metrics/prometheus/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,15 @@ func TestWithDefaultConfiguration(t *testing.T) {
f := NewFactory()
assert.Equal(t, "http://localhost:9090", f.options.Primary.ServerURL)
assert.Equal(t, 30*time.Second, f.options.Primary.ConnectTimeout)

// Ensure backwards compatibility with OTEL's spanmetricsprocessor.
assert.False(t, f.options.Primary.SupportSpanmetricsConnector)
assert.Empty(t, f.options.Primary.MetricNamespace)
assert.Equal(t, "ms", f.options.Primary.LatencyUnit)
}

func TestWithConfiguration(t *testing.T) {
t.Run("With custom configuration and no space in token file path", func(t *testing.T) {
t.Run("with custom configuration and no space in token file path", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
Expand All @@ -69,7 +74,7 @@ func TestWithConfiguration(t *testing.T) {
assert.Equal(t, 5*time.Second, f.options.Primary.ConnectTimeout)
assert.Equal(t, "test/test_file.txt", f.options.Primary.TokenFilePath)
})
t.Run("With space in token file path", func(t *testing.T) {
t.Run("with space in token file path", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
Expand All @@ -79,6 +84,36 @@ func TestWithConfiguration(t *testing.T) {
f.InitFromViper(v, zap.NewNop())
assert.Equal(t, "test/ test file.txt", f.options.Primary.TokenFilePath)
})
t.Run("with custom configuration of prometheus.query", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
"--prometheus.query.support-spanmetrics-connector=true",
"--prometheus.query.namespace=mynamespace",
"--prometheus.query.duration-unit=ms",
})
require.NoError(t, err)
f.InitFromViper(v, zap.NewNop())
assert.True(t, f.options.Primary.SupportSpanmetricsConnector)
assert.Equal(t, "mynamespace", f.options.Primary.MetricNamespace)
assert.Equal(t, "ms", f.options.Primary.LatencyUnit)
})
t.Run("with invalid prometheus.query.duration-unit", func(t *testing.T) {
defer func() {
if r := recover(); r == nil {
t.Errorf("Expected a panic due to invalid duration-unit")
}
}()

f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
"--prometheus.query.duration-unit=milliseconds",
})
require.NoError(t, err)
f.InitFromViper(v, zap.NewNop())
require.Empty(t, f.options.Primary.LatencyUnit)
})
}

func TestFailedTLSOptions(t *testing.T) {
Expand Down
29 changes: 23 additions & 6 deletions plugin/metrics/prometheus/metricsstore/dbmodel/to_domain.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,37 +23,54 @@ import (
"github.com/jaegertracing/jaeger/proto-gen/api_v2/metrics"
)

// Translator translates Prometheus's metrics model to Jaeger's.
type Translator struct {
labelMap map[string]string
}

// New returns a new Translator.
func New(spanNameLabel string) Translator {
return Translator{
// "operator" is the label name that Jaeger UI expects.
labelMap: map[string]string{spanNameLabel: "operation"},
}
}

// ToDomainMetricsFamily converts Prometheus' representation of metrics query results to Jaeger's.
func ToDomainMetricsFamily(name, description string, mv model.Value) (*metrics.MetricFamily, error) {
func (d Translator) ToDomainMetricsFamily(name, description string, mv model.Value) (*metrics.MetricFamily, error) {
if mv.Type() != model.ValMatrix {
return &metrics.MetricFamily{}, fmt.Errorf("unexpected metrics ValueType: %s", mv.Type())
}
return &metrics.MetricFamily{
Name: name,
Type: metrics.MetricType_GAUGE,
Help: description,
Metrics: toDomainMetrics(mv.(model.Matrix)),
Metrics: d.toDomainMetrics(mv.(model.Matrix)),
}, nil
}

// toDomainMetrics converts Prometheus' representation of metrics to Jaeger's.
func toDomainMetrics(matrix model.Matrix) []*metrics.Metric {
func (d Translator) toDomainMetrics(matrix model.Matrix) []*metrics.Metric {
ms := make([]*metrics.Metric, matrix.Len())
for i, ss := range matrix {
ms[i] = &metrics.Metric{
Labels: toDomainLabels(ss.Metric),
Labels: d.toDomainLabels(ss.Metric),
MetricPoints: toDomainMetricPoints(ss.Values),
}
}
return ms
}

// toDomainLabels converts Prometheus' representation of metric labels to Jaeger's.
func toDomainLabels(promLabels model.Metric) []*metrics.Label {
func (d Translator) toDomainLabels(promLabels model.Metric) []*metrics.Label {
labels := make([]*metrics.Label, len(promLabels))
j := 0
for k, v := range promLabels {
labels[j] = &metrics.Label{Name: string(k), Value: string(v)}
labelName := string(k)
if newLabel, ok := d.labelMap[labelName]; ok {
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
labelName = newLabel
}
labels[j] = &metrics.Label{Name: labelName, Value: string(v)}
j++
}
return labels
Expand Down
Loading