Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chore] Add e2e test case for OpenTelemetry collector instance monitoring. #2246

Merged
merged 3 commits into from
Nov 29, 2023

Conversation

IshwarKanse
Copy link
Contributor

@IshwarKanse IshwarKanse commented Oct 19, 2023

Testing:
Add e2e test case for OpenTelemetry collector instance monitoring.

$ kuttl test --test=monitoring tests/e2e-openshift/
2023/10/19 11:52:29 kutt-test config testdirs is overridden with args: [ tests/e2e-openshift/ ]
=== RUN   kuttl
    harness.go:462: starting setup
    harness.go:252: running tests using configured kubeconfig.
I1019 11:52:31.268065   15812 request.go:682] Waited for 1.049248792s due to client-side throttling, not priority and fairness, request: GET:https://REDACTED:6443/apis/migration.k8s.io/v1alpha1?timeout=32s
    harness.go:275: Successful connection to cluster at: https://REDACTED:6443
    harness.go:360: running tests
    harness.go:73: going to run test suite with timeout of 150 seconds for each step
    harness.go:372: testsuite: tests/e2e-openshift/ has 4 tests
=== RUN   kuttl/harness
=== RUN   kuttl/harness/monitoring
=== PAUSE kuttl/harness/monitoring
=== CONT  kuttl/harness/monitoring
    logger.go:42: 11:52:39 | monitoring | Ignoring check_metrics.sh as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 11:52:39 | monitoring | Ignoring check_traces.sh as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 11:52:39 | monitoring | Ignoring check_user_workload_monitoring.sh as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 11:52:39 | monitoring | Creating namespace: kuttl-test-wise-marmoset
    logger.go:42: 11:52:39 | monitoring/0-install-jaeger | starting test step 0-install-jaeger
I1019 11:52:41.315264   15812 request.go:682] Waited for 1.345295172s due to client-side throttling, not priority and fairness, request: GET:https://REDACTED:6443/apis/network.operator.openshift.io/v1?timeout=32s
    logger.go:42: 11:52:43 | monitoring/0-install-jaeger | Namespace:/kuttl-monitoring created
    logger.go:42: 11:52:44 | monitoring/0-install-jaeger | Jaeger:kuttl-monitoring/jaeger-allinone created
    logger.go:42: 11:52:51 | monitoring/0-install-jaeger | test step completed 0-install-jaeger
    logger.go:42: 11:52:51 | monitoring/1-workload-monitoring | starting test step 1-workload-monitoring
I1019 11:52:52.663356   15812 request.go:682] Waited for 1.048057689s due to client-side throttling, not priority and fairness, request: GET:REDACTED:6443/apis/migration.k8s.io/v1alpha1?timeout=32s
    logger.go:42: 11:52:54 | monitoring/1-workload-monitoring | ConfigMap:openshift-monitoring/cluster-monitoring-config updated
    logger.go:42: 11:52:54 | monitoring/1-workload-monitoring | running command: [sh -c ./tests/e2e-openshift/monitoring/check_user_workload_monitoring.sh]
    logger.go:42: 11:52:57 | monitoring/1-workload-monitoring | test step completed 1-workload-monitoring
    logger.go:42: 11:52:57 | monitoring/2-otel-collector | starting test step 2-otel-collector
    logger.go:42: 11:53:01 | monitoring/2-otel-collector | OpenTelemetryCollector:kuttl-monitoring/cluster-collector created
    logger.go:42: 11:53:06 | monitoring/2-otel-collector | test step completed 2-otel-collector
    logger.go:42: 11:53:06 | monitoring/3-generate-traces | starting test step 3-generate-traces
I1019 11:53:08.412589   15812 request.go:682] Waited for 1.050214384s due to client-side throttling, not priority and fairness, request: GET:https://REDACTED:6443/apis/rbac.authorization.k8s.io/v1?timeout=32s
    logger.go:42: 11:53:10 | monitoring/3-generate-traces | Job:kuttl-test-wise-marmoset/telemetrygen-traces created
    logger.go:42: 11:53:11 | monitoring/3-generate-traces | test step completed 3-generate-traces
    logger.go:42: 11:53:11 | monitoring/4- | starting test step 4-
    logger.go:42: 11:53:14 | monitoring/4- | running command: [sh -c ./tests/e2e-openshift/monitoring/check_traces.sh]
    logger.go:42: 11:53:16 | monitoring/4- | Traces for telemetrygen exist in Jaeger.
    logger.go:42: 11:53:16 | monitoring/4- | test step completed 4-
    logger.go:42: 11:53:16 | monitoring/5- | starting test step 5-
I1019 11:53:18.448815   15812 request.go:682] Waited for 1.400235744s due to client-side throttling, not priority and fairness, request: GET:https://REDACTED:6443/apis/policy/v1?timeout=32s
    logger.go:42: 11:53:19 | monitoring/5- | running command: [sh -c ./tests/e2e-openshift/monitoring/check_metrics.sh]
    logger.go:42: 11:53:22 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:22 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   620  100   620    0     0    850      0 --:--:-- --:--:-- --:--:--   849
    logger.go:42: 11:53:23 | monitoring/5- | Metric 'otelcol_exporter_enqueue_failed_log_records' with value is present.
    logger.go:42: 11:53:23 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:23 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   621  100   621    0     0    868      0 --:--:-- --:--:-- --:--:--   867
    logger.go:42: 11:53:24 | monitoring/5- | Metric 'otelcol_exporter_enqueue_failed_metric_points' with value is present.
    logger.go:42: 11:53:24 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:24 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   614  100   614    0     0    793      0 --:--:-- --:--:-- --:--:--   794
    logger.go:42: 11:53:24 | monitoring/5- | Metric 'otelcol_exporter_enqueue_failed_spans' with value is present.
    logger.go:42: 11:53:24 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:24 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   611  100   611    0     0    838      0 --:--:-- --:--:-- --:--:--   838
    logger.go:42: 11:53:25 | monitoring/5- | Metric 'otelcol_exporter_queue_capacity' with value is present.
    logger.go:42: 11:53:25 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:25 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   604  100   604    0     0    897      0 --:--:-- --:--:-- --:--:--   897
    logger.go:42: 11:53:26 | monitoring/5- | Metric 'otelcol_exporter_queue_size' with value is present.
    logger.go:42: 11:53:26 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:26 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   605  100   605    0     0    764      0 --:--:-- --:--:-- --:--:--   764
    logger.go:42: 11:53:27 | monitoring/5- | Metric 'otelcol_exporter_sent_spans' with value is present.
    logger.go:42: 11:53:27 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:27 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   589  100   589    0     0    870      0 --:--:-- --:--:-- --:--:--   871
    logger.go:42: 11:53:28 | monitoring/5- | Metric 'otelcol_process_cpu_seconds' with value is present.
    logger.go:42: 11:53:28 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:28 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   592  100   592    0     0    817      0 --:--:-- --:--:-- --:--:--   817
    logger.go:42: 11:53:28 | monitoring/5- | Metric 'otelcol_process_memory_rss' with value is present.
    logger.go:42: 11:53:28 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:28 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   606  100   606    0     0    921      0 --:--:-- --:--:-- --:--:--   920
    logger.go:42: 11:53:29 | monitoring/5- | Metric 'otelcol_process_runtime_heap_alloc_bytes' with value is present.
    logger.go:42: 11:53:29 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:29 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   607  100   607    0     0    895      0 --:--:-- --:--:-- --:--:--   893
    logger.go:42: 11:53:30 | monitoring/5- | Metric 'otelcol_process_runtime_total_alloc_bytes' with value is present.
    logger.go:42: 11:53:30 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:30 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   612  100   612    0     0    803      0 --:--:-- --:--:-- --:--:--   804
    logger.go:42: 11:53:31 | monitoring/5- | Metric 'otelcol_process_runtime_total_sys_memory_bytes' with value is present.
    logger.go:42: 11:53:31 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:31 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   594  100   594    0     0    869      0 --:--:-- --:--:-- --:--:--   868
    logger.go:42: 11:53:31 | monitoring/5- | Metric 'otelcol_process_uptime' with value is present.
    logger.go:42: 11:53:31 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:31 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   628  100   628    0     0    944      0 --:--:-- --:--:-- --:--:--   944
    logger.go:42: 11:53:32 | monitoring/5- | Metric 'otelcol_receiver_accepted_spans' with value is present.
    logger.go:42: 11:53:32 | monitoring/5- |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    logger.go:42: 11:53:32 | monitoring/5- |                                  Dload  Upload   Total   Spent    Left  Speed
100   626  100   626    0     0    878      0 --:--:-- --:--:-- --:--:--   877
    logger.go:42: 11:53:33 | monitoring/5- | Metric 'otelcol_receiver_refused_spans' with value is present.
    logger.go:42: 11:53:33 | monitoring/5- | test step completed 5-
    logger.go:42: 11:53:33 | monitoring | monitoring events from ns kuttl-test-wise-marmoset:
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:10 +0530 IST	Normal	Pod telemetrygen-traces-f94dz	Binding	Scheduled	Successfully assigned kuttl-test-wise-marmoset/telemetrygen-traces-f94dz to ip-10-0-141-129.us-east-2.compute.internal	default-scheduler	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:10 +0530 IST	Normal	Job.batch telemetrygen-traces		SuccessfulCreate	Created pod: telemetrygen-traces-f94dz	job-controller	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:12 +0530 IST	Normal	Pod telemetrygen-traces-f94dz		AddedInterface	Add eth0 [10.128.2.22/23] from openshift-sdn		
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:12 +0530 IST	Normal	Pod telemetrygen-traces-f94dz.spec.containers{telemetrygen-traces}		Pulling	Pulling image "ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest"	kubelet	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:13 +0530 IST	Normal	Pod telemetrygen-traces-f94dz.spec.containers{telemetrygen-traces}		Pulled	Successfully pulled image "ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest" in 1.037668933s (1.037678681s including waiting)	kubelet	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:13 +0530 IST	Normal	Pod telemetrygen-traces-f94dz.spec.containers{telemetrygen-traces}		Created	Created container telemetrygen-traces	kubelet	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:13 +0530 IST	Normal	Pod telemetrygen-traces-f94dz.spec.containers{telemetrygen-traces}		Started	Started container telemetrygen-traces	kubelet	
    logger.go:42: 11:53:33 | monitoring | 2023-10-19 11:53:22 +0530 IST	Normal	Job.batch telemetrygen-traces		Completed	Job completed	job-controller	
    logger.go:42: 11:53:34 | monitoring | Deleting namespace: kuttl-test-wise-marmoset
=== CONT  kuttl
    harness.go:405: run tests finished
    harness.go:513: cleaning up
    harness.go:570: removing temp folder: ""
--- PASS: kuttl (71.32s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/monitoring (61.47s)
PASS

@IshwarKanse IshwarKanse requested a review from a team October 19, 2023 06:35
@IshwarKanse IshwarKanse changed the title [Chore] Add e2e test case for OpenTelemetry monitoring. [Chore] Add e2e test case for OpenTelemetry collector instance monitoring. Oct 19, 2023
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the test, thanks for submitting it.

But it's a bit long and complex. Could we split it into two separate test cases?

  1. Generate and send traces to Jaeger
  2. Enable monitoring and check that otel metrics are exposed via Prometheus

Your current test case does both, but they seem independent.

@IshwarKanse
Copy link
Contributor Author

@swiatekm-sumo, thank you for your feedback. After investigation, I've found that when we run the system without a Traces store (Jaeger) and use the debug exporter, we encounter limitations in exposing certain metrics, especially those related to the queue.

We have a few options to consider, and I'd like to hear your thoughts:

Option 1: Maintain the current setup, which generates traces and sends them to Jaeger for storage while also exploring all supported metric types for OpenTelemetry collector instance monitoring.

Option 2: Bypass Jaeger and use the debug exporter. This approach would require us to forego some of the metrics checks since they are not exposed in this configuration.

Option 3: Skip both Jaeger and trace generation, and focus solely on monitoring the metrics that are currently exposed. I haven't checked which metrics are available in this config without generating traces. 

@pavolloffay
Copy link
Member

@IshwarKanse do we need Jaeger for this test? I would prefer to remove it and just check the metrics. The collector emits metrics that show how many spans were accepted/refused/dropped. We could also use the logging/debug exporter for traces if needed.

@IshwarKanse
Copy link
Contributor Author

@pavolloffay @swiatekm-sumo I'll update this test case to skip Jaeger.

@swiatekm
Copy link
Contributor

Which metrics aren't available with debugexporter? I thought all the metrics we check for in this test are emitted by otel core itself.

If you just want to generate some traffic, there's simpler ways to do that than telemetrygen as well. For example, you could just start a hostmetrics receiver.

@IshwarKanse
Copy link
Contributor Author

@pavolloffay @swiatekm-sumo updated the test case and removed dependency on Jaeger.

@swiatekm swiatekm self-requested a review October 25, 2023 12:32
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but think someone with more Openshift expertise should also approve before we merge.

@swiatekm swiatekm requested a review from pavolloffay October 25, 2023 12:32
@jaronoff97 jaronoff97 merged commit e5f6ebd into open-telemetry:main Nov 29, 2023
26 checks passed
@IshwarKanse IshwarKanse deleted the monitoring branch April 11, 2024 09:08
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants