Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[servicegraphprocessor] Index out of range panic in updateDurationMetrics method #16000

Closed
Frapschen opened this issue Nov 1, 2022 · 2 comments · Fixed by #16025
Closed

[servicegraphprocessor] Index out of range panic in updateDurationMetrics method #16000

Frapschen opened this issue Nov 1, 2022 · 2 comments · Fixed by #16025
Labels
bug Something isn't working priority:p2 Medium processor/servicegraph Service graph processor

Comments

@Frapschen
Copy link
Contributor

What happened?

Description

Index out of range panic in updateDurationMetrics method.

Steps to Reproduce

Startting up with servicegraphprocessor configured for a while, the collector got panic.

Collector version

v0.63.0

Environment information

Environment

OS: macos 12.3.1
Compiler(if manually compiled): go 1.8

OpenTelemetry Collector configuration

extensions:
  health_check:
receivers:
  otlp:
    protocols:
      grpc:
      http:
  otlp/servicegraph: # Dummy receiver for the metrics pipeline
    protocols:
      grpc:
        endpoint: localhost:12345

processors:
  batch:
  servicegraph:
    metrics_exporter: prometheus/servicegraph
    latency_histogram_buckets: [2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 500ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]
    dimensions:
      - k8s.cluster.id
      - k8s.namespace.name
    store:
      ttl: 10s
      max_items: 100000

exporters:
  logging:
  prometheus/servicegraph:
    endpoint: 0.0.0.0:8889

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  pipelines:
    traces:
      receivers: [otlp]
      processors: [servicegraph,batch]
      exporters: [logging]
    metrics/servicegraph:
      receivers: [otlp/servicegraph]
      processors: []
      exporters: [prometheus/servicegraph]
  extensions: [health_check]

Log output

panic: runtime error: index out of range [16] with length 16 [recovered]
        panic: runtime error: index out of range [16] with length 16

goroutine 2747 [running]:
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/span.go:383 +0x30
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0x14001838480, {0x0, 0x0, 0x10?})
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/span.go:415 +0x6c8
panic({0x10abcb220, 0x14000dc95d8})
        /usr/local/go/src/runtime/panic.go:838 +0x204
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor.(*processor).updateDurationMetrics(0x14000b433f0, {0x14001fa6240, 0x5d}, 0x40ed687b645a1cac)
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/processor.go:331 +0x17c
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor.(*processor).aggregateMetricsForEdge(0x14000b433f0, 0x14001a195f0)
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/processor.go:301 +0x19c
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor.(*processor).onComplete(0x14000b433f0, 0x14001a195f0)
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/processor.go:273 +0x2f4
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor/internal/store.(*store).UpsertEdge(0x14000cc6a80, {0x14001b0fc40, 0x31}, 0x1400184eaa0)
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/internal/store/store.go:77 +0x154
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor.(*processor).aggregateMetrics(0x14000b433f0, {0x10b1369c8, 0x14002465290}, {0x0?})
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/processor.go:224 +0x9dc
github.com/open-telemetry/opentelemetry-collector-contrib/processor/servicegraphprocessor.(*processor).ConsumeTraces(0x14000b433f0, {0x10b1369c8, 0x14002465290}, {0x10884f9c7?})
        /Users/fraps/daocloud/github-code/opentelemetry-collector-contrib/processor/servicegraphprocessor/processor.go:145 +0x30
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export(0x1400059b170, {0x10b1369c8, 0x14002465200}, {0x10b1163b8?})
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/[email protected]/receiver/otlpreceiver/internal/trace/otlp.go:60 +0xb4
go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp.rawTracesServer.Export({{0x10b0ec300?, 0x1400059b170?}}, {0x10b1369c8?, 0x14002465200?}, 0x10885c200?)
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/ptrace/ptraceotlp/grpc.go:72 +0xf8
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1({0x10b1369c8, 0x14002465200}, {0x10adc99c0?, 0x14001fda4b0})
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:310 +0x78
go.opentelemetry.io/collector/config/configgrpc.enhanceWithClientInformation.func1({0x10b1369c8?, 0x140024651a0?}, {0x10adc99c0, 0x14001fda4b0}, 0x2?, 0x14001fda4c8)
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/[email protected]/config/configgrpc/configgrpc.go:415 +0x54
google.golang.org/grpc.chainUnaryInterceptors.func1.1({0x10b1369c8?, 0x140024651a0?}, {0x10adc99c0?, 0x14001fda4b0?})
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:1162 +0x64
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1({0x10b1369c8, 0x140024650e0}, {0x10adc99c0, 0x14001fda4b0}, 0x14001b0d0e0, 0x14001b03fc0)
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:341 +0x34c
google.golang.org/grpc.chainUnaryInterceptors.func1.1({0x10b1369c8?, 0x140024650e0?}, {0x10adc99c0?, 0x14001fda4b0?})
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:1165 +0x90
google.golang.org/grpc.chainUnaryInterceptors.func1({0x10b1369c8, 0x140024650e0}, {0x10adc99c0, 0x14001fda4b0}, 0x14001b0d0e0, 0x14001fda4c8)
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:1167 +0x124
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler({0x10a219640?, 0x14000cd0690}, {0x10b1369c8, 0x140024650e0}, 0x140007ce070, 0x140001ea8a0)
        /Users/fraps/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:312 +0x13c
google.golang.org/grpc.(*Server).processUnaryRPC(0x14000c925a0, {0x10b154440, 0x14000142340}, 0x1400184a900, 0x14000d19ec0, 0x10ea5e350, 0x0)
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:1340 +0xb90
google.golang.org/grpc.(*Server).handleStream(0x14000c925a0, {0x10b154440, 0x14000142340}, 0x1400184a900, 0x0)
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:1713 +0x840
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:965 +0x88
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /Users/fraps/go/pkg/mod/google.golang.org/[email protected]/server.go:963 +0x298

Additional context

No response

@Frapschen Frapschen added bug Something isn't working needs triage New item requiring triage labels Nov 1, 2022
@Frapschen
Copy link
Contributor Author

Frapschen commented Nov 1, 2022

The panic is arise at:

func (p *processor) updateDurationMetrics(key string, duration float64) {
index := sort.SearchFloat64s(p.reqDurationBounds, duration) // Search bucket index
if _, ok := p.reqDurationSecondsBucketCounts[key]; !ok {
p.reqDurationSecondsBucketCounts[key] = make([]uint64, len(p.reqDurationBounds))
}
p.reqDurationSecondsSum[key] += duration
p.reqDurationSecondsCount[key]++
p.reqDurationSecondsBucketCounts[key][index]++
}

@evan-bradley evan-bradley added priority:p2 Medium processor/servicegraph Service graph processor and removed needs triage New item requiring triage labels Nov 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2022

Pinging code owners: @jpkrohling @mapno. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/servicegraph Service graph processor
Projects
None yet
2 participants