fix: Optimize Prometheus scrape target discovery #1663

hisarbalik · 2024-12-07T17:27:52Z

Description

A MetricPipeline using the input prometheus on a cluster with an high amount of pods will result in a DaemonSet of the telemetry-agent where every instance will consume high memory, even if not data get processed.
Every single agent is processing only the pods located on the related node, however, the discovery mechanism does not respect the node sharding and will discover all pods. This problem gets fixed by this PR by adjusting the discovery to exclude already pods not located on the same node as the agent.

Changes proposed in this pull request (what was done and why):

The metric agent should discover the scrape targets deployed on the same node as the agent instance is running, this will reduce memory and CPU consumption on larger clusters significant. Configuration documentation can be found here

Changes refer to particular issues, PRs or documents:

Traceability

The PR is linked to a GitHub issue.
The follow-up issues (if any) are linked in the Related Issues section.
If the change is user-facing, the documentation has been adjusted.
If a CRD is changed, the corresponding Busola ConfigMap has been adjusted.
The feature is unit-tested.
The feature is e2e-tested.

…e node as agent running

internal/otelcollector/config/metric/agent/prometheus_receiver.go

fix prometheus discovery job, optimize discovery for resources on sam…

8b5ccc4

…e node as agent running

hisarbalik requested a review from a team as a code owner December 7, 2024 17:27

kyma-bot added cla: yes Indicates the PR's author has signed the CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 7, 2024

fix pod affinity for metric agent load test setup

60d3751

hisarbalik added this to the 1.30.0 milestone Dec 7, 2024

hisarbalik added kind/bug Categorizes issue or PR as related to a bug. area/metrics MetricPipeline labels Dec 7, 2024

hisarbalik added 3 commits December 7, 2024 23:00

Merge branch 'main' into fix-metric-agent-promethues-scrape-config

952e40c

Merge branch 'main' into fix-metric-agent-promethues-scrape-config

c9c4199

Merge branch 'main' into fix-metric-agent-promethues-scrape-config

fe62991

hisarbalik changed the title ~~fix: Optimize Prometheus resource discovery~~ fix: Optimize Prometheus scrape target discovery Dec 9, 2024

k15r self-assigned this Dec 9, 2024

hisarbalik added 2 commits December 10, 2024 08:10

Merge branch 'main' into fix-metric-agent-promethues-scrape-config

d13c836

move discovery config creation to dedicated function

3da7eaf

k15r approved these changes Dec 10, 2024

View reviewed changes

internal/otelcollector/config/metric/agent/prometheus_receiver.go Outdated Show resolved Hide resolved

kyma-bot added the lgtm Looks good to me! label Dec 10, 2024

hisarbalik merged commit 502f50f into kyma-project:main Dec 10, 2024
41 of 42 checks passed

hisarbalik deleted the fix-metric-agent-promethues-scrape-config branch December 10, 2024 08:00

hisarbalik added a commit to hisarbalik/telemetry-manager that referenced this pull request Dec 11, 2024

fix: Optimize Prometheus scrape target discovery (kyma-project#1663)

8117239

hisarbalik mentioned this pull request Dec 11, 2024

fix: Optimize Prometheus scrape target discovery and prepare images for 1.29.1 release #1669

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Optimize Prometheus scrape target discovery #1663

fix: Optimize Prometheus scrape target discovery #1663

hisarbalik commented Dec 7, 2024 •

edited by a-thaler

Loading

fix: Optimize Prometheus scrape target discovery #1663

fix: Optimize Prometheus scrape target discovery #1663

Conversation

hisarbalik commented Dec 7, 2024 • edited by a-thaler Loading

Description

Traceability

hisarbalik commented Dec 7, 2024 •

edited by a-thaler

Loading