Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTLP Ingestion endpoint #12571

Merged
merged 2 commits into from
Jul 28, 2023
Merged

Conversation

gouthamve
Copy link
Member

We copy files from the otel-collector-contrib. See the README in storage/remote/otlptranslator/README.md.

This supersedes: #11965

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, need to fix a few small lint issues and I think we're ready to merge this over.

Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this wait 2 days so we first release 2.46? that gives us 6 weeks to polish

@gouthamve
Copy link
Member Author

Can absolutely wait. No need to rush this :)

@gouthamve gouthamve force-pushed the otlp-gateway-take-2 branch 3 times, most recently from 0a36aa0 to 2f04ded Compare July 17, 2023 21:54
We copy files from the otel-collector-contrib. See the README in
`storage/remote/otlptranslator/README.md`.

This supersedes: prometheus#11965

Signed-off-by: gouthamve <[email protected]>
@gouthamve gouthamve force-pushed the otlp-gateway-take-2 branch from 2f04ded to c055cd1 Compare July 17, 2023 21:59
@@ -1,6 +1,7 @@
/web/ui @juliusv
/web/ui/module @juliusv @nexucis
/storage/remote @csmarchbanks @cstyan @bwplotka @tomwilkie
/storage/remote/otlptranslator @gouthamve @jesusvazquez
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cstyan
Copy link
Member

cstyan commented Jul 17, 2023

Has the conversation been started within in OTEL about moving the translator code so that we can remove the copy/paste code?

@gouthamve
Copy link
Member Author

gouthamve commented Jul 18, 2023

Yes, see: https://cloud-native.slack.com/archives/C01LSCJBXDZ/p1689161965395529

They'd like to see more involvement from the Prometheus maintainers before they're okay with moving the code.

@gouthamve
Copy link
Member Author

I think this is ready to go. I've tested it by pushing metrics from a test application and the otel collector.

Found a bug in the OTEL Golang SDK in process too and updated the PR to account for it: open-telemetry/opentelemetry-go#4363

It is what the OTEL Golang SDK expect :(

open-telemetry/opentelemetry-go#4363

Signed-off-by: Goutham <[email protected]>
@gouthamve gouthamve force-pushed the otlp-gateway-take-2 branch from 6b6a924 to 08a2145 Compare July 25, 2023 21:17
@@ -380,6 +390,7 @@ func (api *API) Register(r *route.Router) {
r.Get("/status/walreplay", api.serveWALReplayStatus)
r.Post("/read", api.ready(api.remoteRead))
r.Post("/write", api.ready(api.remoteWrite))
r.Post("/otlp/v1/metrics", api.ready(api.otlpWrite))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we go for this strange URL?
Is this standard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is because typically the OTEL collector expects a single URL for all 3 signals (logs, traces, metrics) and then adds suffixes for each signal i.e v1/metrics.

Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just added one question.

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@roidelapluie
Copy link
Member

:shipit:

@gouthamve gouthamve merged commit ad4f514 into prometheus:main Jul 28, 2023
@gouthamve gouthamve deleted the otlp-gateway-take-2 branch July 28, 2023 10:35
@gouthamve
Copy link
Member Author

Now to write the documentation for this :)

@prathamesh-sonpatki
Copy link

@gouthamve and folks - Thanks for this ❤️ Please let me know if I can help in any way writing documentation for this :)


This files in the `prometheus/` and `prometheusremotewrite/` are copied from the OpenTelemetry Project[^1].

This is done instead of adding a go.mod dependency because OpenTelemetry depends on `prometheus/prometheus` and a cyclic dependency will be created. This is just a temporary solution and the long-term solution is to move the required packages from OpenTelemetry into `prometheus/prometheus`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the author the normalization of metrics and labels of the Prometheus exporters in the OpenTelemetry Contrib project.

Several comments:

  • I hope you guys intend to enable the code that used to be behind the feature gate, to make sure Otel metrics are converted to proper Prometheus metrics that follow the metrics naming conventions (with unit, etc.).
  • OpenTelemetry plans to use namespaces everywhere in metric attributes, which will inevitably lead to very long labels in Prometheus. We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.
  • To improve the performance, I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name. I think that would be a nice addition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the original package and the review!

I hope you guys intend to enable the code that used to be behind the feature gate

Already done :) We don't want our users to disable the feature gate, hence its removed, and the functionality is enabled by default.

We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.

Interesting, can you give an example?

I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name.

That sounds great. We're using the same package in Mimir and I think we're seeing CPU cycles being spent on the string replaces. Looking it up using a cache sounds good. cc @bboreham @charleskorn

Do you still have plans to implement it upstream? If no, could you open an issue to track it, and we'll try to get someone to work on it?

Copy link

@bertysentry bertysentry Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otel Technical Committee recently decided to enforce namespace prefixes for all metrics attributes. All metrics semantic conventions from OpenTelemetry are expected to be updated accordingly in the future.

For example, for JVM metrics: the process.runtime.jvm.gc.duration metric will come with the process.runtime.jvm.gc.name
and process.runtime.jvm.gc.action attributes. This will end up as process_runtime_jvm_gc_duration_seconds_counter{process_runtime_jvm_gc_name="...", process_runtime_jvm_gc_action="..."} in PromQL.

We could decide to simplify process_runtime_jvm_gc_name and process_runtime_jvm_gc_action labels to name and action respectively by removing the matching namespaces in metric name and its labels.

The benefit is that we get metrics and labels that look much like "classic" Prometheus metrics and labels. The problem is that the behavior could be counter-intuitive and difficult to predict in some cases (which labels get simplified or not?).

IMHO, everything would be better if OpenTelemetry didn't enforce namespaces in metrics attributes, but that boat has sailed 😅

Regarding the cache mechanism, I still had hopes to eventually implement it, but who am I kidding, I've been considering this for a year now, and haven't done a thing. I'll open an issue here ;-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New issue: #12627

@bwplotka
Copy link
Member

bwplotka commented Aug 1, 2023

Thanks everybody for great work on this! 💪🏽

@mx-psi
Copy link
Contributor

mx-psi commented Aug 1, 2023

I filed #12623 so that Prometheus uses a more recent version of pdata on its implementation.

prometheustranslator "github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus"
)

// addSingleSumNumberDataPoint converts the Gauge metric data point to a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect method name in the comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants