Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datasetexporter]: Add metrics reporting #27487

Conversation

martin-majlis-s1
Copy link
Contributor

@martin-majlis-s1 martin-majlis-s1 commented Oct 9, 2023

Description: Add metrics reporting

DataSet exporter maintains its internal queue. So far we have been logging this information. This PR is adding reporting these metrics into the Meter object provided by the Collector.

Link to tracking Issue: #27650

Testing:

  • In the contrib repository:
    • build docker image - make docker-otelcontribcol
  • In the demo repository:
    • Stop dockers: docker compose down
    • Run without feature gate:
      • Run: docker compose up --abort-on-container-exit 2>&1 | tee out-no-gate.log
      • There is 205 metrics
    • Run with enabled metrics:
      • Add following configuration option - "--feature-gates=telemetry.useOtelForInternalMetrics" to the docker-compose.yml
      • Run: docker compose up --abort-on-container-exit 2>&1 | tee out-with-gate.log
      • There is 205 metrics

Documentation:

@crobert-1
Copy link
Member

It looks like from the PR title, this change is referencing an issue in another repository as well. If that's a publicly accessible issue please add a link, otherwise we can remove it from the title since you have a "local" issue as well.

@martin-majlis-s1 martin-majlis-s1 changed the title DSET-4469: Add metrics reporting [exporter/datasetexporter]: Add metrics reporting Oct 17, 2023
@martin-majlis-s1
Copy link
Contributor Author

When I run https://opentelemetry.io/docs/demo/ - metrics, that I have introduced are not logged. Although the call is copy-paste of the example - https://pkg.go.dev/go.opentelemetry.io/otel/metric#example-Meter-Asynchronous_multiple, I can see, that the callback is never called:

otel-col                          | 2023-10-25T11:55:09.285Z    info    [email protected]/statistics.go:16  AAAAA - reportStatistics - BEGIN        {"kind": "exporter", "data_type": "traces", "name": "dataset"}
otel-col                          | 2023-10-25T11:55:09.285Z    info    [email protected]/statistics.go:293 AAAAA - reportStatistics - REGISTRATION {"kind": "exporter", "data_type": "traces", "name": "dataset"}
otel-col                          | 2023-10-25T11:55:09.285Z    info    [email protected]/statistics.go:362 AAAAA - reportStatistics - END  {"kind": "exporter", "data_type": "traces", "name": "dataset", "err_is_nil": true}
otel-col                          | 2023-10-25T11:55:09.299Z    info    [email protected]/statistics.go:16  AAAAA - reportStatistics - BEGIN        {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-10-25T11:55:09.299Z    info    [email protected]/statistics.go:293 AAAAA - reportStatistics - REGISTRATION {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-10-25T11:55:09.299Z    info    [email protected]/statistics.go:362 AAAAA - reportStatistics - END  {"kind": "exporter", "data_type": "logs", "name": "dataset", "err_is_nil": true}

So, it's not obvious, why it's not working.

Other metrics are still there:

Screenshot 2023-10-25 at 13 38 18

@martin-majlis-s1
Copy link
Contributor Author

@martin-majlis-s1
Copy link
Contributor Author

cc: @sirianni since we have been discussing this on Slack

@sirianni
Copy link
Contributor

You may have to set the telemetry.useOtelForInternalMetrics feature gate

@martin-majlis-s1
Copy link
Contributor Author

Although I have set it, I can see in the logs:

otel-col                          | 2023-10-25T13:01:38.359Z    info    [email protected]/telemetry.go:84 Setting up own telemetry...
otel-col                          | 2023-10-25T13:01:38.367Z    info    [email protected]/telemetry.go:157        Serving metrics {"address": ":8888", "level": "Basic"}

But the function is not called. I will check this issue - open-telemetry/opentelemetry-collector#7454 - to see, what I am missing.

@martin-majlis-s1
Copy link
Contributor Author

In the logs I can see, that I am updating the metric:

prometheus                        | ts=2023-10-31T09:32:52.756Z caller=write_handler.go:212 level=warn component=web msg="Error translating OTLP metrics to Prometheus write request" err="invalid temporality and type combination for metric \"app_currency_counter\""
otel-col                          | 2023-10-31T09:32:52.754Z    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
prometheus                        | ts=2023-10-31T09:32:52.956Z caller=write_handler.go:212 level=warn component=web msg="Error translating OTLP metrics to Prometheus write request" err="empty data points. http.server.duration is dropped; empty data points. http.client.duration is dropped; empty data points. db.client.connections.usage is dropped; empty data points. db.client.connections.usage is dropped"
otel-col                          | 2023-10-31T09:32:53.274Z    info    [email protected]/statistics.go:307       AAAAA - reportStatistics - inc Foo by   {"kind": "exporter", "data_type": "traces", "name": "dataset", "foo": 14}
otel-col                          | 2023-10-31T09:32:53.298Z    info    [email protected]/statistics.go:307       AAAAA - reportStatistics - inc Foo by   {"kind": "exporter", "data_type": "logs", "name": "dataset", "foo": 14}
otel-col                          | 2023-10-31T09:32:54.126Z    info    [email protected]/statistics.go:307       AAAAA - reportStatistics - inc Foo by   {"kind": "exporter", "data_type": "traces", "name": "dataset", "foo": 15}
otel-col                          | 2023-10-31T09:32:54.150Z    info    [email protected]/statistics.go:307       AAAAA - reportStatistics - inc Foo by   {"kind": "exporter", "data_type": "logs", "name": "dataset", "foo": 15}

There is no indication, that metrics produced by datasetexporter are dropped. There is no error produced.

I have no idea, where the problem could be.

@sirianni: Do you, please, have some other idea, where could be the problem?

@martin-majlis-s1
Copy link
Contributor Author

I have found 2 related issues:

Both of these issues contains some discussion based on which I would guess, that it should be working right now. Although they are not closed.

@martin-majlis-s1
Copy link
Contributor Author

martin-majlis-s1 commented Oct 31, 2023

Based on comments in the issues ^^ and following documentation - https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver#getting-started - I have update the configuration with:

receivers:
  ...
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['0.0.0.0:8888']

...

    metrics:
      receivers: [httpcheck/frontendproxy, otlp, spanmetrics, prometheus]
      processors: [filter/ottl, transform, batch]
      exporters: [otlphttp/prometheus, debug]

And then run it with the flag - "--feature-gates=telemetry.useOtelForInternalMetrics", then I can see, that the callback is called.

otel-col                          | 2023-10-31T12:03:16.759Z    info    [email protected]/statistics.go:310       AAAAA - reportStatistics - SimpleCounter - down by      {"kind": "exporter", "data_type": "logs", "name": "dataset", "foo": 35}
otel-col                          | 2023-10-31T12:03:16.759Z    info    [email protected]/statistics.go:310       AAAAA - reportStatistics - SimpleCounter - down by      {"kind": "exporter", "data_type": "traces", "name": "dataset", "foo": 35}
otel-col                          | 2023-10-31T12:03:16.975Z    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 25}
otel-col                          | 2023-10-31T12:03:17.170Z    info    [email protected]/statistics.go:330       AAAAA - reportStatistics - CALLBACK BEGIN       {"kind": "exporter", "data_type": "traces", "name": "dataset"}
otel-col                          | 2023-10-31T12:03:17.170Z    info    [email protected]/statistics.go:333       AAAAA - reportStatistics - STATS        {"kind": "exporter", "data_type": "traces", "name": "dataset", "stats_is_nil": false}
otel-col                          | 2023-10-31T12:03:17.172Z    info    [email protected]/statistics.go:369       AAAAA - reportStatistics - CALLBACK - END       {"kind": "exporter", "data_type": "traces", "name": "dataset"}
otel-col                          | 2023-10-31T12:03:17.172Z    info    [email protected]/statistics.go:330       AAAAA - reportStatistics - CALLBACK BEGIN       {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-10-31T12:03:17.172Z    info    [email protected]/statistics.go:333       AAAAA - reportStatistics - STATS        {"kind": "exporter", "data_type": "logs", "name": "dataset", "stats_is_nil": false}
otel-col                          | 2023-10-31T12:03:17.172Z    info    [email protected]/statistics.go:369       AAAAA - reportStatistics - CALLBACK - END       {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-10-31T12:03:17.178Z    warn    internal/transaction.go:123     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698753797164, "target_labels": "{__name__=\"up\", instance=\"0.0.0.0:8888\", job=\"otel-collector\"}"}
otel-col                          | 2023-10-31T12:03:17.274Z    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
otel-col                          | 2023-10-31T12:03:17.378Z    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 8}
otel-col                          | 2023-10-31T12:03:17.580Z    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 87}
otel-col                          | 2023-10-31T12:03:18.280Z    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 35, "data points": 78}
otel-col                          | 2023-10-31T12:03:18.389Z    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
otel-col                          | 2023-10-31T12:03:18.992Z    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
otel-col                          | 2023-10-31T12:03:20.219Z    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
otel-col                          | 2023-10-31T12:03:20.885Z    info    client/client.go:489    Buffers' Queue Stats:   {"kind": "exporter", "data_type": "traces", "name": "dataset", "processed": 32, "enqueued": 32, "dropped": 0, "broken": 0, "waiting": 0, "successRate": 1, "processingS": 168.675172925, "processing": 168.675172925}
otel-col                          | 2023-10-31T12:03:20.886Z    info    client/client.go:503    Events' Queue Stats:    {"kind": "exporter", "data_type": "traces", "name": "dataset", "processed": 6335, "enqueued": 6335, "dropped": 0, "broken": 0, "waiting": 0, "successRate": 1, "processingS": 168.675172925, "processing": 168.675172925}

But when I check prometheus:
Screenshot 2023-10-31 at 12 56 48

But it's super strange, that SimpleCounterLoop is there, but SimpleCounterCallback is not, as well as the other observable counters.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 15, 2023
codeboten pushed a commit that referenced this pull request Nov 28, 2023
Upgrade to new version of the library.

This PR is implementing following issues:

* #27650 - metrics are not collected via open telemetry, so they can be
monitored. It's better version of the previous PR #27487 which was not
working.
* #27652 - it's configurable with the `debug` option whether
`session_key` is included or not

Other change is that fields that are specified as part of the `group_by`
configuration are now transferred as part of the session info.

**Link to tracking Issue:** #27650, #27652

**Testing:** 

1. Build docker image - make docker-otelcontribcol
2. Checkout https://github.com/open-telemetry/opentelemetry-demo
3. Update configuration in `docker-compose.yaml` and in the
`src/otelcollector/otelcol-config.yml`:
* In `docker-compose.yaml` switch image to the newly build one in step 1
* In `docker-compose.yaml` enable feature gate for collecting metrics -
`--feature-gates=telemetry.useOtelForInternalMetrics`
* In `src/otelcollector/otelcol-config.yml` enable metrics scraping by
prometheus
* In `src/otelcollector/otelcol-config.yml` add configuration for
dataset
```diff
diff --git a/docker-compose.yml b/docker-compose.yml
index 001f7c8..d7edd0d 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -646,14 +646,16 @@ services:

   # OpenTelemetry Collector
   otelcol:
-    image: otel/opentelemetry-collector-contrib:0.86.0
+    image: otelcontribcol:latest
     container_name: otel-col
     deploy:
       resources:
         limits:
           memory: 125M
     restart: unless-stopped
-    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml" ]
+    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml", "--feature-gates=telemetry.useOtelForInternalMetrics" ]
     volumes:
       - ./src/otelcollector/otelcol-config.yml:/etc/otelcol-config.yml
       - ./src/otelcollector/otelcol-config-extras.yml:/etc/otelcol-config-extras.yml
diff --git a/src/otelcollector/otelcol-config.yml b/src/otelcollector/otelcol-config.yml
index f2568ae..9944562 100644
--- a/src/otelcollector/otelcol-config.yml
+++ b/src/otelcollector/otelcol-config.yml
@@ -15,6 +15,14 @@ receivers:
     targets:
       - endpoint: http://frontendproxy:${env:ENVOY_PORT}

+  prometheus:
+    config:
+      scrape_configs:
+        - job_name: 'otel-collector'
+          scrape_interval: 5s
+          static_configs:
+            - targets: ['0.0.0.0:8888']
+
 exporters:
   debug:
   otlp:
@@ -29,6 +37,22 @@ exporters:
     endpoint: "http://prometheus:9090/api/v1/otlp"
     tls:
       insecure: true
+  logging:
+  dataset:
+    api_key: API_KEY
+    dataset_url: https://SERVER.scalyr.com
+    debug: true
+    buffer:
+      group_by:
+        - resource_name
+        - resource_type
+    logs:
+      export_resource_info_on_event: true
+    server_host:
+      server_host: Martin
+      use_hostname: false
+  dataset/aaa:
+    api_key: API_KEY
+    dataset_url: https://SERVER.scalyr.com
+    debug: true
+    buffer:
+      group_by:
+        - resource_name
+        - resource_type
+    logs:
+      export_resource_info_on_event: true
+    server_host:
+      server_host: MartinAAA
+      use_hostname: false

 processors:
   batch:
@@ -47,6 +71,11 @@ processors:
           - set(description, "") where name == "queueSize"
           # FIXME: remove when this issue is resolved: open-telemetry/opentelemetry-python-contrib#1958
           - set(description, "") where name == "http.client.duration"
+  attributes:
+    actions:
+      - key: otel.demo
+        value: 29446
+        action: upsert

 connectors:
   spanmetrics:
@@ -55,13 +84,13 @@ service:
   pipelines:
     traces:
       receivers: [otlp]
-      processors: [batch]
-      exporters: [otlp, debug, spanmetrics]
+      processors: [batch, attributes]
+      exporters: [otlp, debug, spanmetrics, dataset, dataset/aaa]
     metrics:
-      receivers: [httpcheck/frontendproxy, otlp, spanmetrics]
+      receivers: [httpcheck/frontendproxy, otlp, spanmetrics, prometheus]
       processors: [filter/ottl, transform, batch]
       exporters: [otlphttp/prometheus, debug]
     logs:
       receivers: [otlp]
-      processors: [batch]
-      exporters: [otlp/logs, debug]
+      processors: [batch, attributes]
+      exporters: [otlp/logs, debug, dataset, dataset/aaa]
```
4. Run the demo - `docker compose up --abort-on-container-exit`
5. Check, that metrics are in Grafana -
http://localhost:8080/grafana/explore?
<img width="838" alt="Screenshot 2023-11-27 at 12 29 29"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/43d365dd-37d8-4528-b768-1d7f0ac34989">
6. Check some metrics
![Screenshot 2023-11-22 at 14 06
56](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/81306486-eb5e-49b1-87ed-25d1eb8afcf8)
<img width="1356" alt="Screenshot 2023-11-27 at 12 59 10"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/34c36e45-850e-4e74-a18a-0a54ce97cbd3">
7. Check that data are available in dataset ![Screenshot 2023-11-22 at
13 33
50](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/77cb2f31-be14-463b-91a7-fd10f8dbfe3a)

**Documentation:** 

**Library changes:**
* Group By & Debug - scalyr/dataset-go#62
* Metrics  - scalyr/dataset-go#61

---------

Co-authored-by: Andrzej Stencel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants