Expose fault tolerant execution and filesystem exchange metrics via JMX #12127

arhimondr · 2022-04-25T17:58:08Z

Description

Exposes fault tolerant execution related operational metrics via JMX to enable live monitoring

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core, Exchange

How would you describe this change to a non-technical end user or system administrator?

N/A

Related issues, pull requests, and links

-

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

arhimondr · 2022-04-25T17:59:31Z

I had to move exchange plugin from the io.trino.plugin.exchange package to the io.trino.plugin.exchange.filesystem to avoid any potential naming clashes with any future implementations when exposing JMX metrics. After changing the package name I also renamed the module to make it consistent with the package naming.

linzebing · 2022-04-25T18:58:36Z

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java

+        {
+            ExecutionFailureInfo failureInfo = info.getTaskStatus().getFailures().stream()
+                    .findFirst()
+                    .orElse(toFailure(new TrinoException(GENERIC_INTERNAL_ERROR, "A task failed for an unknown reason")));


When will this happen in practice?

Rather unlikely, but we have a similar safeguard in other places

linzebing · 2022-04-25T20:50:40Z

.../trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManagerFactory.java

+        Bootstrap app = new Bootstrap(
+                new MBeanModule(),
+                new MBeanServerModule(),
+                new PrefixObjectNameGeneratorModule("io.trino.plugin.exchange", "trino.plugin.exchange"),


What is this for?

This is to export the JMX beans under a specific prefix

linzebing · 2022-04-25T21:00:54Z

plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java

+        ImmutableList.Builder<ListenableFuture<Void>> futures = ImmutableList.builder();
        for (Integer taskPartitionId : allSinks) {
-            exchangeStorage.deleteRecursively(getTaskOutputDirectory(taskPartitionId));
+            futures.add(exchangeStorage.deleteRecursively(getTaskOutputDirectory(taskPartitionId)));
        }
+        stats.getCloseExchange().record(Futures.allAsList(futures.build()));


This is not on the critical path, do we need to record here too?

It might be useful for tracking (for example if for some reason it starts to fail)

linzebing · 2022-04-25T21:03:25Z

plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/ExecutionStats.java

+    @Managed
+    @Nested


For my understanding, can you briefly explain what these two annotations do?

Methods annotated with @Managed are exported via JMX. @Nested tells the framework to recurse into the object returned by a method and export nested methods annotated with @Managed.

linzebing · 2022-04-25T21:32:11Z

plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSource.java

    private final List<ExchangeStorageReader> readers;
+    private volatile CompletableFuture<Void> blocked;


Is this an optimization? Why do we need a class member blocked?

This is to make sure only a single blocked future exists and tracked.

I don't quite understand, why not simply the following:

return stats.getExchangeSourceBlocked().record(toCompletableFuture( nonCancellationPropagating( whenAnyComplete(readers.stream() .map(ExchangeStorageReader::isBlocked) .collect(toImmutableList())))));

Because isBlocked is called by multiple threads concurrently and a different feature may be returned to a different thread skewing the metric while ideally we would like to keep our measurements as close as possible to the time it takes for the entire ExchangeSource to transition from "blocked" state to "non-blocked".

losipiuk · 2022-04-26T09:46:30Z

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java

+            case RUNNING:
+            case FLUSHING:
+            default:
+                log.error("Unexpected task state: %s", state);


why not throw?

Exceptions thrown out of a listener are not logged. So it's more of a "log it or loose it".

losipiuk · 2022-04-26T09:49:38Z

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java

+            case CANCELED:
+            case ABORTED:
+                // ignore cancelled and aborted tasks
+                break;


I know currently dependent tasks are not executing until upstream task comptes. Yet it will change. In such case would we make downstream task as "FAILED" or "ABORTED".
If the latter then we should also compute stats for "ABORTED" tasks. It would be important to understand how much effort we are wasting on those.
Maybe we can just merge FAILED and ABORTED?

I'm going to add private final ExecutionStats abortedTasks = new ExecutionStats(); and store the stats for both, ABORTED and CANCELLED tasks there. Not sure if it makes sense to track metrics separately.

losipiuk · 2022-04-26T09:55:12Z

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java

+        private final TimeStat elapsedTime = new TimeStat(MILLISECONDS);
+        private final TimeStat scheduledTime = new TimeStat(MILLISECONDS);
+        private final TimeStat cpuTime = new TimeStat(MILLISECONDS);
+        private final TimeStat inputBlockedTime = new TimeStat(MILLISECONDS);
+        private final TimeStat outputBlockedTime = new TimeStat(MILLISECONDS);


Should we have memory and network stats here too?
It feels not costly to add them and then we can decide which are the most important for us for tracking.

I'm not sure if there's a reliable metric for network as the network traffic can occur at many different levels (connector / exchanges / coordinator-to-worker communication). Though it certainly feels like it would make sense to record peak memory utilization. Let me add it.

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java

losipiuk · 2022-04-26T10:07:00Z

plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/ExecutionStats.java

+
+    public <T> CompletableFuture<T> record(CompletableFuture<T> future)
+    {
+        long start = System.currentTimeMillis();


nit: Use Ticker/Stopwatch pair instead.

losipiuk

LGTM

Move from `io.trino.plugin.exchange` to `io.trino.plugin.exchange.filesystem`

From trino-exchange to trino-exchange-filesystem

arhimondr requested review from losipiuk and linzebing April 25, 2022 17:58

cla-bot bot added the cla-signed label Apr 25, 2022

github-actions bot added the tests:hive label Apr 25, 2022

arhimondr force-pushed the tardigrade-jmx-counters branch from 33de1e0 to 06986a0 Compare April 25, 2022 20:43

linzebing reviewed Apr 25, 2022

View reviewed changes

losipiuk reviewed Apr 26, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java Outdated Show resolved Hide resolved

losipiuk reviewed Apr 26, 2022

View reviewed changes

losipiuk approved these changes Apr 26, 2022

View reviewed changes

arhimondr added 4 commits April 26, 2022 13:28

Expose fault tolerant execution statistics via JMX

44e6fd1

Expose filesystem exchange execution statistics via JMX

220ee87

Change package name for filesystem exchange module

6cee5b9

Move from `io.trino.plugin.exchange` to `io.trino.plugin.exchange.filesystem`

Rename filesystem exchange module

6f8d936

From trino-exchange to trino-exchange-filesystem

arhimondr force-pushed the tardigrade-jmx-counters branch from 06986a0 to 6f8d936 Compare April 26, 2022 17:45

linzebing approved these changes Apr 27, 2022

View reviewed changes

losipiuk approved these changes Apr 27, 2022

View reviewed changes

arhimondr merged commit aa7bf5d into trinodb:master Apr 27, 2022

arhimondr deleted the tardigrade-jmx-counters branch April 27, 2022 20:46

github-actions bot added this to the 379 milestone Apr 27, 2022

mosabua mentioned this pull request Apr 27, 2022

Add Trino 379 release notes #12106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose fault tolerant execution and filesystem exchange metrics via JMX #12127

Expose fault tolerant execution and filesystem exchange metrics via JMX #12127

arhimondr commented Apr 25, 2022

arhimondr commented Apr 25, 2022

linzebing Apr 25, 2022

arhimondr Apr 26, 2022

linzebing Apr 25, 2022

arhimondr Apr 26, 2022

linzebing Apr 25, 2022

arhimondr Apr 26, 2022

linzebing Apr 25, 2022

arhimondr Apr 26, 2022

linzebing Apr 25, 2022

arhimondr Apr 26, 2022

linzebing Apr 26, 2022

arhimondr Apr 26, 2022

losipiuk Apr 26, 2022

arhimondr Apr 26, 2022

losipiuk Apr 26, 2022

arhimondr Apr 26, 2022

losipiuk Apr 26, 2022

arhimondr Apr 26, 2022

losipiuk Apr 26, 2022

losipiuk left a comment

		private final List<ExchangeStorageReader> readers;
		private volatile CompletableFuture<Void> blocked;

Expose fault tolerant execution and filesystem exchange metrics via JMX #12127

Expose fault tolerant execution and filesystem exchange metrics via JMX #12127

Conversation

arhimondr commented Apr 25, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

arhimondr commented Apr 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

losipiuk left a comment

Choose a reason for hiding this comment