-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presto metrics for jmx exporter #1581
Comments
same as me. |
use avgquerytime, longestrunningquery too |
@ayush-chauhan replace your |
For anyone finding this thread as there are as its the only one google finds that's relevant here, the following config seems to work for me. I've been unable to find anything relevant for avgquerytime or longestrunningquery that @tooptoop4 mentioned from the jvm metrics but happy for someone to point out the correct metric there.
|
if you want a reasonable starting point, use this config and replace presto with trino. The catchall config listed in the jmx exporter readme leads to a generous 34000 lines on the /metrics endpoint. |
How to push/ pull all Prometheus metrics exported from JMX metrics to a Prometheus server? And then they can graph on Grafana? |
cc @mattstep |
Trino exposes openmetrics (compatible with the prometheus scrape api) at /metrics for anything registered with jmxutils, you can also add non-jmxutils metrics via the config property here : https://github.com/airlift/airlift/blob/master/openmetrics/src/main/java/io/airlift/openmetrics/MetricsConfig.java#L34 |
@mattstep is this documented anywhere? any chance of an example config? if i wanted to exspose SubmittedQueries.OneMinute.Count Would it just be in a config file? which config file? what's the delimiter? space or comma |
That metric should be available without passing in any config. I just fixed a bug with some metrics not showing up, which should be landing soon to Trino. |
@mattstep I'm not currently seeing that one on v423 I cant find anything similar grepping the output of the metrics but these metrics do exist from the query manager TYPE trino_execution_name_QueryManager_QueuedQueries gaugetrino_execution_name_QueryManager_QueuedQueries 0.0 TYPE trino_execution_name_QueryManager_RunningQueries gaugetrino_execution_name_QueryManager_RunningQueries 0.0 either way a small amount of example for the config you mentioned would be lovely |
also @mattstep any chance of the commit id of the bug fix you mentioned so we can try and track if that made it to a release? thanks |
It's a change in jmxutils, it's making its way into the Trino pom this week. Hopefully today or Monday. |
In 429 we get far more metrics including most of the useful QueryManager stuff including but we don't get those as times rate counters like through the JMX catalog thats fine because my monitoring system can do that this is REALLY useful but should be documented on on the trino wedsite |
hello,startDelaySeconds: 0
|
-javaagent:/path/to/jmx_prometheus_javaagent-0.11.0.jar=port:/path/etc/presto.yml |
I do not recommend jmx exporter as it's causing instabilities. Trino has built-in support for OpenMetrics |
@wendigo what kind of instabilities have you seen jmx-exporter causing? I have an implementation of jmx-exporter and I've noticed that some of the MBeans in the JMX I am interested in are not exported via the Conversely, there are also a lot of metrics I'm not interested in, such as the Thanks 👍🏻 |
You can include additional metrics by adding openmetrics.jmx-object-names To your config.properties, with a list of the metrics you want. The config property is comma separated, however you may find the metrics have Comas in them, there's currently no way to escape them but you can use * as a wildcard. There's a pr in airlift to resolve that but it's still pending a review from @martint Eg I'll try and update this comment tomorrow with an example when I'm back in the office |
Thanks @lozbrown, that's a huge help! |
Continuing this thread here since I'm not sure where else to ask about this: I have added the suggested config to expose additional jmx metrics in Trino Coordinator ConfigMap ( config.properties:
----
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=7GB
memory.heap-headroom-per-node=3GB
discovery.uri=http://localhost:8080
shutdown.grace-period=60s
iterative-optimizer-timeout=20m
query.max-planning-time=30m
query.max-history=5000
query.client.timeout=30m
optimizer.join-reordering-strategy=NONE
join-distribution-type=BROADCAST
retry-policy=TASK
openmetrics.jmx-object-names=trino.plugin.exchange.filesystem:name=filesystemexchangestats
exchange.compression-codec=LZ4 Checking the startup logs, we can see that the default value of
However, there's nothing I can see within the metrics output containing this JMX property ❯ curl -H "X-Trino-User: admin" "http://localhost:8080/metrics" | grep exchange
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1081k 0 1081k 0 0 2094k 0 --:--:-- --:--:-- --:--:-- 2119k I'm assuming that it would be mapped to a metric name such as Unsure of where else to look to debug this, so any help would be greatly appreciated 👍🏻 |
I've found the problem. It's important that when you specify your additional JMX object names in the configuration file that the MBean attribute (i.e. the bit after the Also worthwhile to note that the metric naming format changes slightly, to |
Cool I'm not sure whether it's also important to put that in the workers, not sure it's gathering stats for the cluster. Please post your working config here... Also beware that the fix for the delimiter got soundly rejected and there's talk of changing the delimiter to something else so that might mess with your changes in a future version. I was trying to mess with metrics this afternoon and somehow broke it so interested in working examples. |
Hey @lozbrown We deploy our Trino cluster through the official Helm chart (unsure how you're deploying yours), so anything we specify under In terms of whether a metric is useful on the worker or not, it depends on the metric. Each node (worker and coordinator) exposes metrics on the My working config comes out to look like this (only openmetrics value is really relevant here) config.properties:
----
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=7GB
memory.heap-headroom-per-node=3GB
discovery.uri=http://localhost:8080
shutdown.grace-period=60s
iterative-optimizer-timeout=20m
query.max-planning-time=30m
query.max-history=5000
query.client.timeout=30m
optimizer.join-reordering-strategy=NONE
join-distribution-type=BROADCAST
retry-policy=TASK
openmetrics.jmx-object-names=trino.plugin.exchange.filesystem:name=FileSystemExchangeStats,trino.plugin.exchange.filesystem.s3:name=S3FileSystemExchangeStorageStats,io.trino.hdfs:*,java.lang:type=Memory
exchange.compression-codec=LZ4 Another thing to note is that something like |
Hi @BrandenRice
We are the same, although the notation format of additionalConfigProperties takes some working out, and i intend to test today if the way I'd specified had just fallen fowl of that
That's useful info, thankyou
Actually these were some of the metrics I was trying to pull, because I'm having OOM issues on workers. maybe this was the cause of my failure yesterday. Either way do you think it might be worth a separate issue for those specific metrics to be broken out into separate fields in open metrics as they seem like extremely useful metrics |
Yeah, I was also hoping to have JVM data from the workers to monitor the same thing. It can be especially hard to investigate OOMs if you have something like autoscaling enabled, and the heap/non-heap memory attributes would be extremely useful for tweaking the If you raise an issue, link it here and I'll back it as well, since this is something we'd also really like to have. FYI, here is the format of that JMX property (fetched through Trino's JMX connector):
It's quite long, but you can see that there are |
for jvm heap stuff in case of jmx exporter i did use the following snippet in past (but then Trino was on JDK 11).
Not sure if this is helpful at all. |
This SQL may be helpful to extract some history select regexp_extract(split_part(nonheapmemoryusage, ',', 11), '\d+') nonheapinit , node, "timestamp" |
|
@BrandenRice I've recently discovered from the slack channel that we need to pull metrics from all the workers (was previously only pulling from the coordinator https ingress ) I'm trying to make the Prometheus service discovery working but I keep getting 403. I wonder if you have this working and if so you could share a sanitized version of the Prometheus config? Also did you have to change much in the helm charts to make it work? |
Yeah, I also ran into this. The However, Prometheus explicitly does not allow setting custom headers. So the options are either secure the Trino cluster to use Password/LDAP auth and then pass basic-auth credentials through Prometheus, or to inject a small NGINX proxy as a sidecar to each of your workers that injects the The sidecars can be set up with the |
This is a rather large issue in terms of the discussion here. I am finalizing my docs PR for OpenMetrics integration at #21089 After that I would like to close this issue since it is pretty much a collection of random snippets now .. if anyone want to add more info to the docs .. please submit PRs for the documentation or the helm charts or whatever else .. I can help with reviews. |
I think we could raise another issue with something along there lines of Allow anonymous open metrics to simplify collection configuration As that's the remaining issue here that makes collection complicated |
I am trying to export presto JMX metrics to Prometheus using prometheus JMX exporter
Here is my jvm config
If I use empty exporter_config.yaml, then around 5000+ metrics are exposed.
Do we have a list of metrics which we should start from which will help us monitor the presto process and also autoscale the cluster?
For autoscale, I think we should focus on active and pending queries.
If I use patterns in my config.yaml, all presto metrics are filtered out, is something wrong with my exporter_config.yaml?
The text was updated successfully, but these errors were encountered: