Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM transactions broken in 7.16.3 #7374

Closed
paulgrav opened this issue Feb 22, 2022 · 3 comments
Closed

APM transactions broken in 7.16.3 #7374

paulgrav opened this issue Feb 22, 2022 · 3 comments

Comments

@paulgrav
Copy link

paulgrav commented Feb 22, 2022

APM Server version (apm-server version): 7.16.3

Description of the problem including expected versus actual behavior:

We have a simple Koa-based Node app. It’s instrumented with Jaeger and we use the OpenTelemetry collector to send traces to the Elastic stack so that we can view trace data in Kibana. Since upgrading to 7.16.3, trace data is missing for some apps.

Koa App > Jaeger Tracer (v3.19) thrift_binary > otel collector contrib (v0.45) otlp > elasticapm (v7.16.3) > es (v7.16.3)

I expect traces to be visible in Kibana.

Actual behaviour is that no traces are visible in Kibana.

Reverting the elasticapm server to < v7.16.2 resolves this issue.

The trace pipeline mentioned above should result in the creation of two records, one in the transaction index and one in the span index. When using ElasticAPM 7.16.3, the record in the transaction index is missing. The transaction record is present with using 7.16.2 and below.

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.

  1. Instrument a simple Koa application with jaeger-client 3.19
  2. Run docker-compose up
  3. Invoke a GET request on the Koa app to emit a trace.

Provide logs (if relevant):

docker-compose.yaml

version: '3.5'
services:
  web:
    build:
        context: .
        target: build
    ports:
        - "8000:80"
    command: ["npm", "run", "start-dev"]
    
  jaeger-agent:
    image: otel/opentelemetry-collector-contrib:0.45.0
    volumes:
      - ./otel.yaml:/etc/otelcol-contrib/config.yaml

  apm:
    image: docker.elastic.co/apm/apm-server:7.16.3
    command:
      - --strict.perms=false
    volumes:
      - ./apm.yaml:/usr/share/apm-server/apm-server.yml

otel.yaml

receivers:
  jaeger:
    protocols:
      thrift_binary:

processors:
  resource:
    attributes:
    - key: deployment.environment
      value: "test"
      action: upsert

exporters:
  otlp:
    endpoint: apm:8200
    tls:
      insecure: true
  logging:
    logLevel: debug

service:
  pipelines:
    traces:
      receivers: [jaeger]
      exporters: [logging, otlp]

apm.yaml

apm-server:
  host: "0.0.0.0:8200"
logging.level: debug
output:
  elasticsearch:
    hosts: <redacted>
    username: <redacted>
    password: <redacted>
    protocol: https

logging output from otel:

InstrumentationLibrarySpans #0
InstrumentationLibrarySpans SchemaURL: 
InstrumentationLibrary  
Span #0
    Trace ID       : 0000000000000000fb9436dec94f751c
    Parent ID      : fb9436dec94f751c
    ID             : 5d270d86cf25bed5
    Name           : fetch auth token
    Kind           : SPAN_KIND_UNSPECIFIED
    Start time     : 2022-02-21 15:44:52.111 +0000 UTC
    End time       : 2022-02-21 15:44:52.116 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Span #1
    Trace ID       : 0000000000000000fb9436dec94f751c
    Parent ID      : 
    ID             : fb9436dec94f751c
    Name           : health_route
    Kind           : SPAN_KIND_UNSPECIFIED
    Start time     : 2022-02-21 15:44:52.107 +0000 UTC
    End time       : 2022-02-21 15:44:52.12 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> sampler.type: STRING(const)
     -> sampler.param: BOOL(true)
     -> path: STRING(/healthz)

span record created:

{
  "_index": "apm-7.16.3-span-000004",
  "_type": "_doc",
  "_id": "-Pz1HH8BAlLHiCM3CFdz",
  "_version": 1,
  "_score": 1,
  "_source": {
    "parent": {
      "id": "fb9436dec94f751c"
    },
    "agent": {
      "name": "Jaeger/Node",
      "ephemeral_id": "9ba35c6b-6774-4ab6-9cf2-08a35b8fd438",
      "version": "3.18.0"
    },
    "processor": {
      "name": "transaction",
      "event": "span"
    },
    "observer": {
      "hostname": "7d67b4a7359d",
      "id": "c86501a2-4a25-4d80-af58-1e8badba79f7",
      "ephemeral_id": "247f8e47-a7e6-4c03-9452-872d65bbac4f",
      "type": "apm-server",
      "version": "7.16.3",
      "version_major": 7
    },
    "trace": {
      "id": "0000000000000000fb9436dec94f751c"
    },
    "@timestamp": "2022-02-21T15:44:52.111Z",
    "ecs": {
      "version": "1.12.0"
    },
    "service": {
      "node": {
        "name": "9bbee29c7179"
      },
      "name": "image-on-demand-build",
      "language": {
        "name": "Node"
      }
    },
    "host": {
      "hostname": "9bbee29c7179",
      "ip": "172.21.0.2",
      "name": "9bbee29c7179"
    },
    "event": {
      "outcome": "unknown"
    },
    "span": {
      "duration": {
        "us": 5000
      },
      "name": "fetch auth token",
      "id": "5d270d86cf25bed5",
      "type": "app"
    },
    "timestamp": {
      "us": 1645458292111000
    }
  },
  "fields": {
    "span.name": [
      "fetch auth token"
    ],
    "service.node.name": [
      "9bbee29c7179"
    ],
    "host.hostname": [
      "9bbee29c7179"
    ],
    "service.language.name": [
      "Node"
    ],
    "host.ip": [
      "172.21.0.2"
    ],
    "trace.id": [
      "0000000000000000fb9436dec94f751c"
    ],
    "span.duration.us": [
      5000
    ],
    "processor.event": [
      "span"
    ],
    "agent.name": [
      "Jaeger/Node"
    ],
    "host.name": [
      "9bbee29c7179"
    ],
    "event.outcome": [
      "unknown"
    ],
    "service.name": [
      "image-on-demand-build"
    ],
    "processor.name": [
      "transaction"
    ],
    "span.id": [
      "5d270d86cf25bed5"
    ],
    "observer.version_major": [
      7
    ],
    "observer.hostname": [
      "7d67b4a7359d"
    ],
    "span.type": [
      "app"
    ],
    "observer.id": [
      "c86501a2-4a25-4d80-af58-1e8badba79f7"
    ],
    "timestamp.us": [
      1645458292111000
    ],
    "@timestamp": [
      "2022-02-21T15:44:52.111Z"
    ],
    "observer.ephemeral_id": [
      "247f8e47-a7e6-4c03-9452-872d65bbac4f"
    ],
    "observer.version": [
      "7.16.3"
    ],
    "ecs.version": [
      "1.12.0"
    ],
    "observer.type": [
      "apm-server"
    ],
    "agent.ephemeral_id": [
      "9ba35c6b-6774-4ab6-9cf2-08a35b8fd438"
    ],
    "parent.id": [
      "fb9436dec94f751c"
    ],
    "agent.version": [
      "3.18.0"
    ]
  }
}```
@paulgrav paulgrav added the bug label Feb 22, 2022
@axw
Copy link
Member

axw commented Feb 23, 2022

Thanks for all the details @paulgrav!

I cannot see any commits between 7.16.2 and 7.16.3 that would explain this. Can you see any errors in the APM Server log or Elasticsearch log that might shed additional light?

Otherwise, if you're able to provide a standalone instrumented program so we can try to reproduce the issue, that would help speed things along.

@simitt
Copy link
Contributor

simitt commented Mar 15, 2022

@paulgrav any chance you could provide more infos on this, as suggested in #7374 (comment)?

@axw
Copy link
Member

axw commented Jun 16, 2022

@paulgrav if you are still experiencing this issue, please let us know and provide any additional info you can so we can try to reproduce. Closing in the meantime.

@axw axw closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants