Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheusreceiver and statsdreceiver behave differently in terms of setting "OTelLib" when awsemfexporter is used #24298

Closed
mkielar opened this issue Jul 17, 2023 · 5 comments
Labels
bug Something isn't working exporter/awsemf awsemf exporter needs triage New item requiring triage receiver/prometheus Prometheus receiver receiver/statsd statsd related issues

Comments

@mkielar
Copy link
Contributor

mkielar commented Jul 17, 2023

Component(s)

exporter/awsemf, receiver/prometheus, receiver/statsd

What happened?

Description

We have aws-otel-collector 0.30.0 running alongside a Java App (which exposes Prometheus metrics) and AWS/Envoy Sidecar (which exposes StatsD metrics). aws-otel-collector is configured to process both those sources using separate pipelines, and to push the metrics to AWS CloudWatch using awsemfexporter. We have previously used version 0.16.1 of the aws-otel-collector and are only now upgraging.

Previously, metrics from both sources were stored in CloudWatch "as-is". After the upgrade, however, we noticed, that the Prometheus metrics gained a new Dimension: OTelLib, with value otelcol/prometheusreceiver. This, obviously broke a few things on our end (like CloudWatch Alarms).

After digging a bit, I found this two tickets, which were supposed to get both of these receivers to the same place in terms of populating otel.library.name:

Unfortunately I was not able to grasp how that translates to OTelLib metric dimension set in awsemfexporter but it seems somehow related at this point.

My understanding is, that it's de-facto standard for the receivers to add the name and version of the library to processed metrics, but I do not understand how or why at all is that information being added as a dimension. I also do not understand if that's an expected outcome, thus, it's hard for me to figure out whether it's a bug in prometheusreceiver (that it adds that as a dimension), statsdreceiver (that it doesn't add it as a dimension) or awsemfexporter. I'd be grateful for any guidance on this matter.

Steps to Reproduce

  1. Use the collector configuration below with two separate sources of metrics (StatsD and Prometheus).
  2. You can adjust (or disable) metric filtering if your sources vary from mine.

Expected Result

I would expect the following:

  1. Make the receivers produce metrics the same way, so that the awsemfexporter would add the new OTelLib Dimension regardless where the metrics come from. Or would not add that at all. I'm not sure what is considered the "correct" behaviour here. I would expect it to be consistent across receivers, however.
  2. I'm not very proficient in Go, but from what I can make of the awsemfexporter configuration, it has dedicated logic to handle that OTelLib Dimension. I think it would be a good idea to be able to implement a switch that would control whether the OTelLib Dimension is being added or not. In our case, forcefully adding this new Dimension to all collected metrics will break A LOT of things around our observability solution.

Actual Result

  1. Metrics collected by prometheusreceiver are stored by awsemfexporter with additional OTelLib dimension set to otelcol/prometheusreceiver.
  2. Metrics collected by statsdreceiver are stored by identical configuration of awsemfexporter without OTelLib dimension.
  3. There's no way to configure awsemfexporter in a way that it would not add the OTelLib dimension.

Collector version

v0.78.0 (according to: https://github.com/aws-observability/aws-otel-collector/releases/tag/v0.30.0)

Environment information

Environment

OS: AWS ECS / Fargate
We're running custom-built Docker Image, based on amazonlinux:2, with a Dockerfile lookling like below:

FROM amazonlinux:2 as appmesh-otel-collector
ARG OTEL_VERSION=0.30.0
RUN yum install -y \
        procps \
        shadow-utils \
        https://aws-otel-collector.s3.amazonaws.com/amazon_linux/amd64/v${OTEL_VERSION}/aws-otel-collector.rpm \
    && yum clean all
RUN useradd -m --uid 1337 sidecar && \
    echo "sidecar ALL=NOPASSWD: ALL" >> /etc/sudoers && \
    chown -R sidecar /opt/aws/aws-otel-collector
USER sidecar
ENV RUN_IN_CONTAINER="True"
ENV HOME="/home/sidecar"
ENTRYPOINT ["/opt/aws/aws-otel-collector/bin/aws-otel-collector"]

OpenTelemetry Collector configuration

"exporters":
  "awsemf/prometheus/custom_metrics":
    "dimension_rollup_option": "NoDimensionRollup"
    "log_group_name": "/aws/ecs/staging/kafka-snowflake-connector"
    "log_stream_name": "emf/otel/prometheus/custom_metrics/{TaskId}"
    "namespace": "staging/KafkaSnowflakeConnector"
  "awsemf/statsd/envoy_metrics":
    "dimension_rollup_option": "NoDimensionRollup"
    "log_group_name": "/aws/ecs/staging/kafka-snowflake-connector"
    "log_stream_name": "emf/otel/statsd/envoy_metrics/{TaskId}"
    "namespace": "staging/AppMeshEnvoy"
"processors":
  "batch/prometheus/custom_metrics":
    "timeout": "60s"
  "batch/statsd/envoy_metrics":
    "timeout": "60s"
  "filter/prometheus/custom_metrics":
    "metrics":
      "include":
        "match_type": "regexp"
        "metric_names":
        - "^kafka_consumer_consumer_fetch_manager_metrics_bytes_consumed_rate$"
        - "^kafka_consumer_consumer_fetch_manager_metrics_records_consumed_rate$"
        - "^kafka_connect_connect_worker_metrics_connector_running_task_count$"
        - "^kafka_connect_connect_worker_metrics_connector_failed_task_count$"
        - "^kafka_consumer_consumer_fetch_manager_metrics_records_lag_max$"
        - "^kafka_consumer_consumer_fetch_manager_metrics_records_lag$"
        - "^snowflake_kafka_connector_.*_OneMinuteRate$"
  "filter/statsd/envoy_metrics":
    "metrics":
      "include":
        "match_type": "regexp"
        "metric_names":
        - "^envoy\\.http\\.rq_total$"
        - "^envoy\\.http\\.downstream_rq_xx$"
        - "^envoy\\.http\\.downstream_rq_total$"
        - "^envoy\\.http\\.downstream_rq_time$"
        - "^envoy\\.cluster\\.upstream_cx_connect_timeout$"
        - "^envoy\\.cluster\\.upstream_rq_timeout$"
        - "^envoy\\.appmesh\\.RequestCountPerTarget$"
        - "^envoy\\.appmesh\\.TargetResponseTime$"
        - "^envoy\\.appmesh\\.HTTPCode_.+$"
  "resource":
    "attributes":
    - "action": "extract"
      "key": "aws.ecs.task.arn"
      "pattern": "^arn:aws:ecs:(?P<Region>.*):(?P<AccountId>.*):task/(?P<ClusterName>.*)/(?P<TaskId>.*)$"
  "resourcedetection":
    "detectors":
    - "env"
    - "ecs"
"receivers":
  "prometheus/custom_metrics":
    "config":
      "global":
        "scrape_interval": "1m"
        "scrape_timeout": "10s"
      "scrape_configs":
      - "job_name": "staging/KafkaSnowflakeConnector"
        "metrics_path": ""
        "sample_limit": 10000
        "static_configs":
        - "targets":
          - "localhost:9404"
  "statsd/envoy_metrics":
    "aggregation_interval": "60s"
    "endpoint": "0.0.0.0:8125"
"service":
  "pipelines":
    "metrics/prometheus/custom_metrics":
      "exporters":
      - "awsemf/prometheus/custom_metrics"
      "processors":
      - "resourcedetection"
      - "resource"
      - "filter/prometheus/custom_metrics"
      - "batch/prometheus/custom_metrics"
      "receivers":
      - "prometheus/custom_metrics"
    "metrics/statsd/envoy_metrics":
      "exporters":
      - "awsemf/statsd/envoy_metrics"
      "processors":
      - "resourcedetection"
      - "resource"
      - "filter/statsd/envoy_metrics"
      - "batch/statsd/envoy_metrics"
      "receivers":
      - "statsd/envoy_metrics"

Log output

N/A

Additional context

N/A

@mkielar mkielar added bug Something isn't working needs triage New item requiring triage labels Jul 17, 2023
@github-actions github-actions bot added exporter/awsemf awsemf exporter receiver/prometheus Prometheus receiver receiver/statsd statsd related issues labels Jul 17, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mkielar
Copy link
Contributor Author

mkielar commented Jul 18, 2023

I think the reason for this may be difference in implementation.
See this fragment in prometheusreceiver implementation:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/internal/transaction.go#L201-L202

vs. this implementation in statsdreceiver:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/statsdreceiver/protocol/statsd_parser.go#L249-L252

You can see the objects/types on which the Name and Version attributes are set, differ (pcommon.InstrumentationScope for statsdreceiver vs. pmetric.NewMetrics -> ResourceMetrics -> ScopeMetrics -> Scope for prometheusexporter). It seems the latter makes awsemfexporter use the receiver name as Metric Dimension, and the former does not.

@paologallinaharbur, you seem to be the author of both of those implementations, can you please take a look and/or comment on the issue?

Also: We're also testing the behaviour of https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver with awsemfexporter, I should have some results later this week.

@mkielar mkielar changed the title prometheusreceiver and statsdreceiver behave differently in terms of setting "OTelLib" prometheusreceiver and statsdreceiver behave differently in terms of setting "OTelLib" when awsemfexporter is used Jul 18, 2023
@mkielar
Copy link
Contributor Author

mkielar commented Jul 18, 2023

I've just realized, aws-otel-collector 0.30.0 uses 0.78.0 release of opentelemetry-collector-contrib, and the changes introduced by #23563 were only merged in 0.81.0. I'm going to close this ticket, and wait for aws-otel-collector to catch up with the latest changes, then test again. Apologies for the noise...

@mkielar mkielar closed this as completed Jul 18, 2023
@paologallinaharbur
Copy link
Member

paologallinaharbur commented Jul 18, 2023

@mkielar

I did some investigation that I'll dump it here in case you need it. (Otherwise ignore it)

You can see the objects/types on which the Name and Version attributes are set, differ (pcommon.InstrumentationScope for statsdreceiver vs. pmetric.NewMetrics -> ResourceMetrics -> ScopeMetrics -> Scope for prometheusexporter). It seems the latter makes awsemfexporter use the receiver name as Metric Dimension, and the former does not.

prometheusexporter SetName acts as well on pcommon.InstrumentationScope returned by scope() in the line you mentioned

I run the tests and how the scope is added seems exactly the same (I would say that the SetName and SetVersion are good safenets). Moreover, you mentioned that

Statsd

image

Prometheus

image

Redis

Screenshot 2023-07-18 at 14 38 13

@mkielar
Copy link
Contributor Author

mkielar commented Jul 18, 2023

@paologallinaharbur, I managed to set up local workspace and debug the tests, and I saw exactly what you're showing on screenshots. Which led me to the fact, that it's not the implementation, but simply an older version of the dependency in aws-otel-collector. As I said, I'll wait for the AWS to catch up, and try upgrading again in a month or two.

Anyway, thanks a lot for looking into that (and apologies for wasting your time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/awsemf awsemf exporter needs triage New item requiring triage receiver/prometheus Prometheus receiver receiver/statsd statsd related issues
Projects
None yet
Development

No branches or pull requests

2 participants