Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto metrics for jmx exporter #1581

Open
ayush-chauhan opened this issue Sep 25, 2019 · 36 comments
Open

Presto metrics for jmx exporter #1581

ayush-chauhan opened this issue Sep 25, 2019 · 36 comments

Comments

@ayush-chauhan
Copy link

ayush-chauhan commented Sep 25, 2019

I am trying to export presto JMX metrics to Prometheus using prometheus JMX exporter

Here is my jvm config

-server
 -Xmx1G
 -XX:-UseBiasedLocking
 -XX:+UseG1GC
 -XX:G1HeapRegionSize=32M
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:+UseGCOverheadLimit
 -XX:+ExitOnOutOfMemoryError
 -XX:ReservedCodeCacheSize=256M
 -Djdk.attach.allowAttachSelf=true
 -Djdk.nio.maxCachedBufferSize=2000000
 -javaagent:/usr/lib/presto/utils/jmx_prometheus_javaagent.jar=9000:/usr/lib/presto/utils/exporter_config.yaml

If I use empty exporter_config.yaml, then around 5000+ metrics are exposed.

Do we have a list of metrics which we should start from which will help us monitor the presto process and also autoscale the cluster?

For autoscale, I think we should focus on active and pending queries.

If I use patterns in my config.yaml, all presto metrics are filtered out, is something wrong with my exporter_config.yaml?

rules:
  - pattern: "presto.plugin.hive.s3<type=PrestoS3FileSystem,name=hive><>ActiveConnections.FifteenMinute.Rate"
    name: "presto_plugin_hive_s3_PrestoS3FileSystem_ActiveConnections_FifteenMinute_Rate"

  - pattern: "presto.plugin.hive.s3<type=PrestoS3FileSystem,name=hive><>MetadataCalls.OneMinute.Rate"
    name: "presto_hive_MetadataCalls_OneMinute_Rate"
@xingnailu
Copy link

same as me.

@tooptoop4
Copy link
Contributor

use avgquerytime, longestrunningquery too

@rmgpinto
Copy link

@ayush-chauhan replace your " in pattern with '.

@lozbrown
Copy link
Contributor

lozbrown commented Nov 9, 2021

For anyone finding this thread as there are as its the only one google finds that's relevant here, the following config seems to work for me. I've been unable to find anything relevant for avgquerytime or longestrunningquery that @tooptoop4 mentioned from the jvm metrics but happy for someone to point out the correct metric there.

- pattern : presto.execution<name=TaskManager><>InputDataSize.OneMinute.Count
  name: presto_coord_Input_bytes_sci

- pattern : presto.execution<name=TaskManager><>InputPositions.OneMinute.Count
  name: presto_coord_Input_rows
  
- pattern : presto.execution<name=TaskManager><>OutputDataSize.OneMinute.Count
  name: presto_coord_Output_bytes_sci
  
- pattern : presto.execution<name=TaskManager><>OutputPositions.OneMinute.Count
  name: presto_coord_Output_rows
  
- pattern : presto.memory<type=ClusterMemoryPool, name=general><>TotalDistributedBytes
  name: presto_TotalDistributedBytes
  
- pattern : presto.memory<type=ClusterMemoryPool, name=general><>ReservedDistributedBytes
  name: Presto_ReservedDistributedBytes

- pattern : presto.execution<name=QueryManager><>FailedQueries.OneMinute.Count
  name: presto_Failed_Queries
  
- pattern : presto.execution<name=QueryManager><>RunningQueries
  name: presto_running_queries

- pattern : presto.failuredetector<name=HeartbeatFailureDetector><>ActiveCount
  name: presto_active_nodes

- pattern : presto.memory<type=ClusterMemoryPool, name=general><>FreeDistributedBytes
  name: Presto_cluster_free_memory
  
- pattern : presto.execution<name=QueryManager><>ManagementExecutor.QueuedTaskCount
  name: presto_queued_task_count

- pattern : presto.execution<name=QueryManager><>StartedQueries.FiveMinute.Count
  name: presto_started_queries

- pattern : java.lang<type=Memory><HeapMemoryUsage>committed
  name: presto_jvm_heap_memory_usage

- pattern : java.lang<type=Threading><>ThreadCount
  name: presto_jvm_thread_count

- pattern : presto.execution<name=QueryManager><>InternalFailures.OneMinute.Count
  name: presto_failed_queries_internal

- pattern : presto.execution<name=QueryManager><>ExternalFailures.OneMinute.Count
  name: presto_failed_queries_external

- pattern : presto.execution<name=QueryManager><>UserErrorFailures.OneMinute.Count
  name: presto_failed_queries_user

@tooptoop4
Copy link
Contributor

some ways to see those stats from jmx or custom aggregate queries:

  1. https://<presto_dns>:<presto_port>/v1/jmx/mbean
    image

  2. jmx tables
    image

  3. system tables
    image

@tomrijntjes
Copy link

For anyone finding this thread as there are as its the only one google finds that's relevant here, the following config seems to work for me. I've been unable to find anything relevant for avgquerytime or longestrunningquery that @tooptoop4 mentioned from the jvm metrics but happy for someone to point out the correct metric there.

- pattern : presto.execution<name=TaskManager><>InputDataSize.OneMinute.Count
  name: presto_coord_Input_bytes_sci

- pattern : presto.execution<name=TaskManager><>InputPositions.OneMinute.Count
  name: presto_coord_Input_rows
  
- pattern : presto.execution<name=TaskManager><>OutputDataSize.OneMinute.Count
  name: presto_coord_Output_bytes_sci
  
- pattern : presto.execution<name=TaskManager><>OutputPositions.OneMinute.Count
  name: presto_coord_Output_rows
  
- pattern : presto.memory<type=ClusterMemoryPool, name=general><>TotalDistributedBytes
  name: presto_TotalDistributedBytes
  
- pattern : presto.memory<type=ClusterMemoryPool, name=general><>ReservedDistributedBytes
  name: Presto_ReservedDistributedBytes

- pattern : presto.execution<name=QueryManager><>FailedQueries.OneMinute.Count
  name: presto_Failed_Queries
  
- pattern : presto.execution<name=QueryManager><>RunningQueries
  name: presto_running_queries

- pattern : presto.failuredetector<name=HeartbeatFailureDetector><>ActiveCount
  name: presto_active_nodes

- pattern : presto.memory<type=ClusterMemoryPool, name=general><>FreeDistributedBytes
  name: Presto_cluster_free_memory
  
- pattern : presto.execution<name=QueryManager><>ManagementExecutor.QueuedTaskCount
  name: presto_queued_task_count

- pattern : presto.execution<name=QueryManager><>StartedQueries.FiveMinute.Count
  name: presto_started_queries

- pattern : java.lang<type=Memory><HeapMemoryUsage>committed
  name: presto_jvm_heap_memory_usage

- pattern : java.lang<type=Threading><>ThreadCount
  name: presto_jvm_thread_count

- pattern : presto.execution<name=QueryManager><>InternalFailures.OneMinute.Count
  name: presto_failed_queries_internal

- pattern : presto.execution<name=QueryManager><>ExternalFailures.OneMinute.Count
  name: presto_failed_queries_external

- pattern : presto.execution<name=QueryManager><>UserErrorFailures.OneMinute.Count
  name: presto_failed_queries_user

if you want a reasonable starting point, use this config and replace presto with trino.

The catchall config listed in the jmx exporter readme leads to a generous 34000 lines on the /metrics endpoint.

@ToanP
Copy link

ToanP commented Jun 16, 2023

How to push/ pull all Prometheus metrics exported from JMX metrics to a Prometheus server? And then they can graph on Grafana?
@ayush-chauhan @tomrijntjes Can you share your solution?

@martint
Copy link
Member

martint commented Jun 16, 2023

cc @mattstep

@mattstep
Copy link
Contributor

Trino exposes openmetrics (compatible with the prometheus scrape api) at /metrics for anything registered with jmxutils, you can also add non-jmxutils metrics via the config property here : https://github.com/airlift/airlift/blob/master/openmetrics/src/main/java/io/airlift/openmetrics/MetricsConfig.java#L34

@lozbrown
Copy link
Contributor

Trino exposes openmetrics (compatible with the prometheus scrape api) at /metrics for anything registered with jmxutils, you can also add non-jmxutils metrics via the config property here : https://github.com/airlift/airlift/blob/master/openmetrics/src/main/java/io/airlift/openmetrics/MetricsConfig.java#L34

@mattstep is this documented anywhere? any chance of an example config?

if i wanted to exspose SubmittedQueries.OneMinute.Count

Would it just be
openmetrics.jmx-object-names="SubmittedQueries.OneMinute.Count"

in a config file? which config file? what's the delimiter? space or comma

@lozbrown
Copy link
Contributor

@mattstep @wendigo

Any chance someone could throw us a bone here and give us an example of what the config might look like?

@mattstep
Copy link
Contributor

That metric should be available without passing in any config. I just fixed a bug with some metrics not showing up, which should be landing soon to Trino.

@lozbrown
Copy link
Contributor

lozbrown commented Sep 20, 2023

@mattstep I'm not currently seeing that one on v423
jmx
image

I cant find anything similar grepping the output of the metrics

but these metrics do exist from the query manager

TYPE trino_execution_name_QueryManager_QueuedQueries gauge

trino_execution_name_QueryManager_QueuedQueries 0.0

TYPE trino_execution_name_QueryManager_RunningQueries gauge

trino_execution_name_QueryManager_RunningQueries 0.0

either way a small amount of example for the config you mentioned would be lovely

@lozbrown
Copy link
Contributor

also @mattstep any chance of the commit id of the bug fix you mentioned so we can try and track if that made it to a release?

thanks

@mattstep
Copy link
Contributor

It's a change in jmxutils, it's making its way into the Trino pom this week. Hopefully today or Monday.

@lozbrown
Copy link
Contributor

@mattstep

In 429 we get far more metrics including most of the useful QueryManager stuff including
trino_execution_name_QueryManager_SubmittedQueries

but we don't get those as times rate counters like through the JMX catalog
"submittedqueries.oneminute.count"

thats fine because my monitoring system can do that

this is REALLY useful but should be documented on on the trino wedsite

@Akanksha-kedia
Copy link

hello,

startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
includeObjectNames: ["java.lang:type=Threading"]
autoExcludeObjectNameAttributes: true
excludeObjectNameAttributes:
"java.lang:type=OperatingSystem":
- "ObjectName"
"java.lang:type=Runtime":
- "ClassPath"
- "SystemProperties"
rules:

  • pattern: 'java.lang<type=Threading><(.)>ThreadCount: (.)'
    name: java_lang_Threading_ThreadCount
    value: '$2'
    help: 'ThreadCount (java.lang<type=Threading><>ThreadCount)'
    type: UNTYPED

    on usig this i m able to expose only java_lang_Threading_ThreadCount to the JMX Exporter's metrics endpoint.

@Akanksha-kedia
Copy link

Akanksha-kedia commented Feb 25, 2024

-javaagent:/path/to/jmx_prometheus_javaagent-0.11.0.jar=port:/path/etc/presto.yml

@Akanksha-kedia
Copy link

@wendigo
Copy link
Contributor

wendigo commented Feb 25, 2024

I do not recommend jmx exporter as it's causing instabilities. Trino has built-in support for OpenMetrics

@BrandenRice
Copy link

@wendigo what kind of instabilities have you seen jmx-exporter causing?

I have an implementation of jmx-exporter and I've noticed that some of the MBeans in the JMX I am interested in are not exported via the /metrics endpoint on the coordinator (e.g. trino.plugin.exchange.filesystem:name=filesystemexchangestats, java.lang:type=memory, io.trino.hdfs:name=iceberg,type=trinohdfsfilesystemstats). Are there plans to add these to the OpenMetrics endpoint? Currently using jmx-exporter seems like the only way to export these as metrics, but I don't want to introduce instability to our cluster willingly.

Conversely, there are also a lot of metrics I'm not interested in, such as the trino.sql.planner.* ones. Is there a way to specify a configuration for what metrics/MBeans I am interested in? If so, I (and probably also @lozbrown) would love an example of this.

Thanks 👍🏻

@lozbrown
Copy link
Contributor

lozbrown commented Feb 29, 2024

@BrandenRice

You can include additional metrics by adding

openmetrics.jmx-object-names

To your config.properties, with a list of the metrics you want.

The config property is comma separated, however you may find the metrics have Comas in them, there's currently no way to escape them but you can use * as a wildcard.

There's a pr in airlift to resolve that but it's still pending a review from @martint

airlift/airlift#1120

Eg
openmetrics.jmx-object-names=trino.plugin.exchange.filesystem:name=filesystemexchangestats,io.trino.hdfs:name=iceberg*

I'll try and update this comment tomorrow with an example when I'm back in the office

@BrandenRice
Copy link

Thanks @lozbrown, that's a huge help!

@BrandenRice
Copy link

Continuing this thread here since I'm not sure where else to ask about this:

I have added the suggested config to expose additional jmx metrics in config.properties, but I am unable to see any metrics being reported related to these.

Trino Coordinator ConfigMap (config.properties):

config.properties:
----
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=7GB
memory.heap-headroom-per-node=3GB
discovery.uri=http://localhost:8080
shutdown.grace-period=60s
iterative-optimizer-timeout=20m
query.max-planning-time=30m
query.max-history=5000
query.client.timeout=30m
optimizer.join-reordering-strategy=NONE
join-distribution-type=BROADCAST
retry-policy=TASK
openmetrics.jmx-object-names=trino.plugin.exchange.filesystem:name=filesystemexchangestats
exchange.compression-codec=LZ4

Checking the startup logs, we can see that the default value of [] is replaced with a list containing the metrics I have supplied. (No error invoking configuration method message, so it's being loaded correctly)

2024-03-11T15:32:31.419Z	INFO	main	Bootstrap	openmetrics.jmx-object-names                                                            []                                                                         [trino.plugin.exchange.filesystem:name=filesystemexchangestats]            JMX object names to include when retrieving all metrics.

However, there's nothing I can see within the metrics output containing this JMX property

❯ curl -H "X-Trino-User: admin" "http://localhost:8080/metrics" | grep exchange
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1081k    0 1081k    0     0  2094k      0 --:--:-- --:--:-- --:--:-- 2119k

I'm assuming that it would be mapped to a metric name such as trino_plugin_exchange_filesystem_name_FileSystemExchangeStats_<PROPERTY_NAME>. Perhaps this assumption is incorrect? Or perhaps there's some conflicting/missing configuration property that's not allowing this to get exported at /metrics properly?

Unsure of where else to look to debug this, so any help would be greatly appreciated 👍🏻

@BrandenRice
Copy link

I've found the problem. It's important that when you specify your additional JMX object names in the configuration file that the MBean attribute (i.e. the bit after the :) is in Pascal case (e.g. FileSystemExchangeStats, not filesystemexchangestats). Otherwise the configuration will silently fail on load since the object name exists within the JMX, but the property doesn't match anything.

Also worthwhile to note that the metric naming format changes slightly, to JMX_<JMX-PROPERTY>_TYPE_<PROPERTY-TYPE>_ATTRIBUTE_<ATTRIBUTE-NAME>.

@lozbrown
Copy link
Contributor

Cool

I'm not sure whether it's also important to put that in the workers, not sure it's gathering stats for the cluster.

Please post your working config here...

Also beware that the fix for the delimiter got soundly rejected and there's talk of changing the delimiter to something else so that might mess with your changes in a future version.

I was trying to mess with metrics this afternoon and somehow broke it so interested in working examples.

@BrandenRice
Copy link

Hey @lozbrown

We deploy our Trino cluster through the official Helm chart (unsure how you're deploying yours), so anything we specify under additionalConfigProperties gets added to the worker and coordinator configmaps. They also provide a server.worker/coordinator.extraConfig value for any config values you want to keep separate. Adding these config properties are the only thing I need to set in the Helm chart to expose the extra JMX metrics.

In terms of whether a metric is useful on the worker or not, it depends on the metric. Each node (worker and coordinator) exposes metrics on the /metrics endpoint, and some JMX properties, such as trino.memory:name=general,type=memorypool export values on a per-node basis, while something like trino.memory:name=general,type=clustermemorypool provides an overview of the whole cluster, so is really only needed on the coordinator. We just use the same config for coordinator and worker alike, but it shouldn't matter too much, I don't think? Still in the process of tuning and tinkering 😅

My working config comes out to look like this (only openmetrics value is really relevant here)

config.properties:
----
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=7GB
memory.heap-headroom-per-node=3GB
discovery.uri=http://localhost:8080
shutdown.grace-period=60s
iterative-optimizer-timeout=20m
query.max-planning-time=30m
query.max-history=5000
query.client.timeout=30m
optimizer.join-reordering-strategy=NONE
join-distribution-type=BROADCAST
retry-policy=TASK
openmetrics.jmx-object-names=trino.plugin.exchange.filesystem:name=FileSystemExchangeStats,trino.plugin.exchange.filesystem.s3:name=S3FileSystemExchangeStorageStats,io.trino.hdfs:*,java.lang:type=Memory
exchange.compression-codec=LZ4

Another thing to note is that something like io.trino.hdfs:name=iceberg* doesn't work, as I believe that doesn't allow for arbitrary keys (e.g. the type field for this JMX object). I had to do io.trino.hdfs:* to allow all keys and drop everything but iceberg related metrics during the prometheus scrape with a regex. Also the java.lang:type=Memory contains useful attributes about heap and non-heap memory usage (useful for detecting heap space OOM issues on workers), but they are wrapped in some type of composite data structure that doesn't seem be getting translated by openmetrics 🤷🏻

@lozbrown
Copy link
Contributor

Hi @BrandenRice

We deploy our Trino cluster through the official Helm chart (unsure how you're deploying yours), so anything we specify under additionalConfigProperties gets added to the [worker]

We are the same, although the notation format of additionalConfigProperties takes some working out, and i intend to test today if the way I'd specified had just fallen fowl of that

Another thing to note is that something like io.trino.hdfs:name=iceberg* doesn't work, as I believe that doesn't allow for arbitrary keys (e.g. the type field for this JMX object). I had to do io.trino.hdfs:* to allow all keys and drop everything but iceberg related metrics during the prometheus scrape with a regex.

That's useful info, thankyou

Also the java.lang:type=Memory contains useful attributes about heap and non-heap memory usage (useful for detecting heap space OOM issues on workers), but they are wrapped in some type of composite data structure that doesn't seem be getting translated by openmetrics 🤷🏻

Actually these were some of the metrics I was trying to pull, because I'm having OOM issues on workers. maybe this was the cause of my failure yesterday.

Either way do you think it might be worth a separate issue for those specific metrics to be broken out into separate fields in open metrics as they seem like extremely useful metrics

@BrandenRice
Copy link

Yeah, I was also hoping to have JVM data from the workers to monitor the same thing. It can be especially hard to investigate OOMs if you have something like autoscaling enabled, and the heap/non-heap memory attributes would be extremely useful for tweaking the memory.heap-headroom-per-node and JVM allocation parameters for optimal allocation of resources.

If you raise an issue, link it here and I'll back it as well, since this is something we'd also really like to have.

FYI, here is the format of that JMX property (fetched through Trino's JMX connector):

                                                                                                                                                                                                                                                                                       heapmemoryusage                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                               nonheapmemoryusage                                                                                                                                                                                                                                                                                |      objectname       | objectpendingfinalizationcount | verbose |                    node                     |      object_name      
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------------------+---------+---------------------------------------------+-----------------------
 javax.management.openmbean.CompositeDataSupport(compositeType=javax.management.openmbean.CompositeType(name=java.lang.management.MemoryUsage,items=((itemName=committed,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=init,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=max,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=used,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)))),contents={committed=7012876288, init=167772160, max=8589934592, used=6355872304})  | javax.management.openmbean.CompositeDataSupport(compositeType=javax.management.openmbean.CompositeType(name=java.lang.management.MemoryUsage,items=((itemName=committed,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=init,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=max,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)),(itemName=used,itemType=javax.management.openmbean.SimpleType(name=java.lang.Long)))),contents={committed=359661568, init=7667712, max=-1, used=343010232}) | java.lang:type=Memory |                              0 | false   | trino-worker                                | java.lang:type=Memory

It's quite long, but you can see that there are init, committed, max, and used amounts for both heap and nonheap usage in the JVM. The data type it's wrapped in doesn't seem to get exported though, and the only attribute that shows up in /metrics is objectpendingfinalizationcount, which is far less useful.

@hashhar
Copy link
Member

hashhar commented Mar 13, 2024

for jvm heap stuff in case of jmx exporter i did use the following snippet in past (but then Trino was on JDK 11).

- pattern: 'java.lang<type=Memory><(Heap|NonHeap)MemoryUsage>(max|init|committed|used)'
    name: 'jvm_memory_$2_bytes'
    help: '$2 bytes in $1 memory'
    type: GAUGE
    labels:
      area: '$1'

Not sure if this is helpful at all.

@lozbrown
Copy link
Contributor

This SQL may be helpful to extract some history

select regexp_extract(split_part(nonheapmemoryusage, ',', 11), '\d+') nonheapinit
,regexp_extract(split_part(nonheapmemoryusage, ',', 13), '\d+') nonheap_used
,regexp_extract(split_part(nonheapmemoryusage, ',', 12), '\d+') nonheap_max
,regexp_extract(split_part(nonheapmemoryusage, ',', 10), '\d+') nonheap_committed
, regexp_extract(split_part(heapmemoryusage, ',', 11), '\d+') heapinit
,regexp_extract(split_part(heapmemoryusage, ',', 13), '\d+') heap_used
,regexp_extract(split_part(heapmemoryusage, ',', 12), '\d+') heap_max
,regexp_extract(split_part(heapmemoryusage, ',', 10), '\d+') heap_committed

, node, "timestamp"
from "jmx".history."java.lang:type=memory"

@lozbrown
Copy link
Contributor

@BrandenRice

If you raise an issue, link it here and I'll back it as well, since this is something we'd also really like to have.

#21056

@lozbrown
Copy link
Contributor

@BrandenRice I've recently discovered from the slack channel that we need to pull metrics from all the workers (was previously only pulling from the coordinator https ingress )

I'm trying to make the Prometheus service discovery working but I keep getting 403.

I wonder if you have this working and if so you could share a sanitized version of the Prometheus config?

Also did you have to change much in the helm charts to make it work?

@BrandenRice
Copy link

Yeah, I also ran into this. The /metrics endpoint is secured behind authentication, so the reason you might be getting a 403 is if you're not running Trino with any kind of authentication, then Trino requires a username, but rejects any passwords (you can see this if you port-forward to a worker and navigate to /metrics). You can provide a username to Trino with the X-Trino-User header, or by prepending your username to the domain like username@localhost:8080.

However, Prometheus explicitly does not allow setting custom headers. So the options are either secure the Trino cluster to use Password/LDAP auth and then pass basic-auth credentials through Prometheus, or to inject a small NGINX proxy as a sidecar to each of your workers that injects the X-Trino-User header. Depending on what your Trino cluster looks like, the proxies might be the easiest (although certainly not the cleanest) way to get worker metrics up and running.

The sidecars can be set up with the sidecarContainers property in the helm chart without too much hassle if you choose to go the proxy route. In either case, you'll want to change from a ServiceMonitor to a PodMonitor (assuming you're using Prometheus Operator), to scrape all Trino pods directly rather than just the service.

@mosabua
Copy link
Member

mosabua commented Jan 15, 2025

This is a rather large issue in terms of the discussion here. I am finalizing my docs PR for OpenMetrics integration at #21089

After that I would like to close this issue since it is pretty much a collection of random snippets now .. if anyone want to add more info to the docs .. please submit PRs for the documentation or the helm charts or whatever else .. I can help with reviews.

@lozbrown
Copy link
Contributor

I think we could raise another issue with something along there lines of

Allow anonymous open metrics to simplify collection configuration

As that's the remaining issue here that makes collection complicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests