Configure the router policy via router provider module #98

vishalya · 2023-11-24T01:24:57Z

This PR providers a way to configure the que length based router.
This router and it's logic was there, now it can be configured via the config file.
Please check the configuration.md, Router Policy section to understand on how to configure the current default router vs the queue length based router
HaGatewayProviderModule needed to be refactored and the db dependencies are moved into another base module, because of that the router module needs to be configured explicitly.
Hopefully this PR fixes Feature Request: Routing based on number of queued and running queries #77 and Load balancing scheme in Trino gateway? #84

willmostly

Looks good overall. My only real concern is that as config gets more complex, we should make sure we're giving understandable error messages to users and avoiding returning hundreds of lines of Guice injection errors.

docs/configuration.md

willmostly · 2023-12-20T17:50:45Z

docs/configuration.md

+There are 2 routers you can choose from
+### Que length based
+- This distributed the queries based on the number of queries running


suggestion: Queries are routed to the backend with the fewest queued queries. If multiple backends have zero queued queries, the backend with the fewest running queries is chosen.

Yes, it should route it to the cluster with the fewest running queries.

willmostly · 2023-12-20T18:06:50Z

gateway-ha/gateway-ha-config.yml

+  - io.trino.gateway.ha.module.QueueLengthListenerModule
+  - io.trino.gateway.ha.module.QueueLengthRouterProvider


The QueueLengthRouterProvider requires that the QueueLengthListenerModule is configured (right?). If it isn't, what kind of error does the end user see?

If its not an easy one to understand, we should add some checking in Baseapp to ensure that dependent modules are loaded. AppConfiguration.getModules() should give us the list of loaded modules.

In my latest change, I am providing the BasicRouter by default and config can override it. Is that ok?

mosabua · 2024-01-05T17:44:27Z

This needs a rebase before we can look at it again.

vishalya · 2024-01-09T15:54:57Z

Yes, let me rebase and then I can also add a new router which is much simpler. Currently, we are testing it internally.

vishalya · 2024-02-01T16:36:35Z

I have added a new router - QueryCountBasedRouter which is proved by QueryCountBasedRouterProvider. The clusterstats are supplied to the routers and any hence can be used by any new router in future. The new router can be enabled by adding the provider to config. The full modules section in the config would look like below

modules:
  - io.trino.gateway.ha.module.HaGatewayProviderModule
  - io.trino.gateway.ha.module.ClusterStateListenerModule
  - io.trino.gateway.ha.module.ClusterStatsMonitorModule
  - io.trino.gateway.ha.module.QueryCountBasedRouterProvider

The existing router could be used replace the last line with BasicRouterProvider. We can also set the existing router as default one, if none is provided explicitly in the config.

modules:
  - io.trino.gateway.ha.module.HaGatewayProviderModule
  - io.trino.gateway.ha.module.ClusterStateListenerModule
  - io.trino.gateway.ha.module.ClusterStatsMonitorModule
  - io.trino.gateway.ha.module.BasicRouterProvider

TrinoQueueLengthRoutingTable didn't seem to work that well, so I am dropping it and eventually we can get rid of that class all together.
The background task which collected stats can be configured to run with second as an unit, instead of the minute. It is a part of the monitor section of the config. It's not a mandatory parameter, as the previous default of 1 minute is respected with 60 second default. The old parameter taskDelayMin is removed.

monitor:
  connectionTimeout: 15
  taskDelaySeconds: 10

The logic for the QueryCountBasedRouter is in the comments.

willmostly

A few things to address, mainly around the locking and potential performance impacts

willmostly · 2024-02-07T21:28:04Z

gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStats.java

+        // The live stats refresh every few seconds, so we update the stats immediately
+        // so that they can be used for next queries to route
+        // We assume that if a user has queued queries then newly arriving queries
+        // for that user would also be queued


We assume that if a user has queued queries then newly arriving queries

This is a strong assumption given trino resource group options such as queryType and queryText.

My main concern however is putting a synchronized method in the getBackend critical path. This could cause performance issues at high concurrency.

I hear you on the assumption, but one could only assume in absense of the live stats. Currently, the cluster stats interval can be set with a granularity of seconds (default is 60 secs), but we are doing ok with 10 sec interval. This assumption phase is applicable during that 10 seconds and when the fresh stats arrive we start afresh with them.

There is tremendous overall gain in the performance, the BasicRouter (the one on the main branch) checks db for every request and it's literally 50 to 100 times slower. I have verified that.

You mean this is faster compared to the TrinoQueueLengthRoutingTable I assume?

Even compared to the BasicRouter it's very fast, since BasicRouter keeps cheking db for the active/inactive state.

This synchronized usage is incorrect.
Reads and writes on userQueuedCount and runningQueryCount must be synchronized.

Change from record class to normal class requires a lot of works on reimplementing getter/setters/equals. I think we can continue to use record class by using AtomicInteger and ConcurrentHashMap.

willmostly · 2024-02-07T21:53:24Z

gateway-ha/src/main/java/io/trino/gateway/ha/module/BasicRouterProvider.java

+import io.trino.gateway.ha.router.HaRoutingManager;
+import io.trino.gateway.ha.router.RoutingManager;
+
+public class BasicRouterProvider


The class name should reflect the router it provides, so HaRouterProvider. Tbh the HaRoutingManager could have a more meaningful name such as StochasticRoutingManager, this would be a good time to update it.

so StochasticRoutingManager and StochasticRoutingManagerProvide? I am ok with that.

willmostly · 2024-02-07T22:11:09Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+        Lock readLock = lock.readLock();
+        try {
+            readLock.lock();
+            filteredList = clusterStats.stream()


This would be a good use case for a Map<Array<ClusterStats>> or potentially com.google.common.collect.Table if you need to look up on various attributes vs filtering a list

Yeah, If it becomes more complicated then yes, may be the next iteration when we get more stats for the cost based routing we can use it. At this time the filter and sort is just simple and clean IMO.

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

alaturqua · 2024-02-13T23:08:04Z

Worked today on resolving the issues alongside @vishalya. The solution now operates as intended, effectively distributing queries based on the number of queued queries evenly across a routing group. A merge would greatly enhance our current setup and would be great feature for the community.

willmostly

Getting close, just spotted a few more code improvements to make.

Can you additionally remove TrinoQueueLengthRoutingTable as part of this PR? It will be orphaned once this merged.

willmostly · 2024-02-16T19:09:58Z

gateway-ha/src/main/java/io/trino/gateway/baseapp/BaseApp.java

+
+    private void validateModules(List<AppModule> modules, T configuration, Environment environment)
+    {
+        Optional routerProvider = modules.stream()


Add a <TypeParameter> to this declaration

willmostly · 2024-02-16T19:19:33Z

gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStatsMonitor.java

-    ClusterStats monitor(ProxyBackendConfiguration backend);
+    abstract ClusterStats monitor(ProxyBackendConfiguration backend);
+
+    protected void populateClusterStats(ClusterStats.Builder stats, ProxyBackendConfiguration backend)


This should return a ClusterStatsBuilder instead of modifying one passed as an argument. It looks like the builder passed is initialized with ClusterStats.builder(backend.getName()), so just change this to

ClusterStats.Builder getClusterStatsBuilder(ProxyBackendConfiguration backend) { ClusterStats.Builder builder = ProxyBackendConfiguration backend); //... return builder; }

willmostly · 2024-02-16T19:20:42Z

gateway-ha/src/main/java/io/trino/gateway/ha/module/BasicRouterProvider.java

+import io.trino.gateway.ha.router.HaRoutingManager;
+import io.trino.gateway.ha.router.RoutingManager;
+
+public class BasicRouterProvider


willmostly · 2024-02-16T20:00:17Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+            return Optional.empty();
+        }
+
+        Collections.sort(filteredList, new Comparator<ClusterStats>()


replace with Collections.max() instead of sort and get 0

It would be Collections.min() but I got the idea.

willmostly · 2024-02-16T20:06:07Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+
+        Collections.sort(filteredList, new Comparator<ClusterStats>()
+        {
+            public int compare(ClusterStats lhs, ClusterStats rhs)


ClusterStats should extend Comparable, then you can implement compareTo using this logic. That way it can be reused in other routers.

That's true! The comparison involves a user specific stats, so it won't be straight forward comparison, but I can a compare function and that can be resused and also makes the router code more readable with a simple lambda.

looks good, can you remove the comment block of the previous code?

willmostly · 2024-02-16T20:11:09Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+    {
+        Optional<ClusterStats> cluster = getClusterToRoute(user, routingGroup);
+        if (cluster.isPresent()) {
+            cluster.orElseThrow().updateLocalStats(user);


Why is orElseThrow used? isPresent() was just checked. If this is needed then an exception supplier should be passed.

That's because the static check doesn't allow it, but I guess there should be a way around it.

you should be able to eliminate the isPresent condition if you use ifPresent for the updateLocalStats call and return a flatMap from cluster

willmostly

Just a few loose ends

willmostly · 2024-02-27T17:16:37Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+    {
+        Optional<ClusterStats> cluster = getClusterToRoute(user, routingGroup);
+        if (cluster.isPresent()) {
+            cluster.orElseThrow().updateLocalStats(user);


you should be able to eliminate the isPresent condition if you use ifPresent for the updateLocalStats call and return a flatMap from cluster

willmostly · 2024-02-27T17:21:57Z

gateway-ha/src/main/java/io/trino/gateway/baseapp/BaseApp.java

+                                        .findFirst();
+        if (routerProvider.isEmpty()) {
+            logger.warn("Router provider doesn't exist in the config, using the StochasticRoutingManagerProvider");
+            String clazz = "io.trino.gateway.ha.module.StochasticRoutingManagerProvider";


this hard coded name could be missed in a refactor - use StochasticRoutingManagerProvider.class.getCanonicalName()

willmostly · 2024-02-27T18:42:33Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+
+        Collections.sort(filteredList, new Comparator<ClusterStats>()
+        {
+            public int compare(ClusterStats lhs, ClusterStats rhs)


looks good, can you remove the comment block of the previous code?

willmostly · 2024-02-27T18:46:55Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+        Optional<String> proxyUrl = getBackendForRoutingGroup(routingGroup, user);
+        if (proxyUrl.isPresent()) {
+            return proxyUrl.orElseThrow();
+        }
+        return provideAdhocBackend(user);


Suggested change

Optional<String> proxyUrl = getBackendForRoutingGroup(routingGroup, user);

if (proxyUrl.isPresent()) {

return proxyUrl.orElseThrow();

}

return provideAdhocBackend(user);

return getBackendForRoutingGroup(routingGroup, user).OrElse(provideAdhocBackend(user));

ebyhr

Just skimmed.

gateway-ha/src/main/java/io/trino/gateway/baseapp/BaseApp.java

gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStats.java

gateway-ha/src/test/java/io/trino/gateway/ha/router/TestQueryCountBasedRouter.java

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStats.java

vishalya · 2024-03-07T17:15:22Z

I have taken care of the review comments, the rebase is coming next.

vishalya · 2024-03-08T13:22:40Z

@ebyhr please take a look when you get a chance, changes and rebase is done.

oneonestar

Some comments on the implementation of concurrency.

oneonestar · 2024-03-13T02:24:33Z

gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStats.java

+        // The live stats refresh every few seconds, so we update the stats immediately
+        // so that they can be used for next queries to route
+        // We assume that if a user has queued queries then newly arriving queries
+        // for that user would also be queued


This synchronized usage is incorrect.
Reads and writes on userQueuedCount and runningQueryCount must be synchronized.

Change from record class to normal class requires a lot of works on reimplementing getter/setters/equals. I think we can continue to use record class by using AtomicInteger and ConcurrentHashMap.

oneonestar · 2024-03-13T02:43:28Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+    private ArrayList<ClusterStats> clusterStats;
+    private static final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(true);


The operations on clusterStats are read all and replace all.
ReentrantReadWriteLock can be replaced by AtomicReference<List<ClusterStats>>.

Good catch on the read, thanks!
Both suggestions for the synchronization are good.
I might need to modify the flow a bit to use the record, with record I won't be able to modify the member attributes.

I am going back to the class for ClusterStats as the router needs to update it, it can be achieved by copying the records back and forth, but it won't be performant.

I have moved the reading/sorting and updating of the cluster stats at the router level - inside QueryCountBasedRouter, for 2 reasons.

The logic is specific to the router.

The logic needs to read the stats, sort and the update the counters in the ClusterStats in a single transaction. This would avoid the race condition while processing simultaneous requests.

oneonestar · 2024-03-20T11:24:30Z

I have moved the reading/sorting and updating of the cluster stats at the router level - inside QueryCountBasedRouter, for 2 reasons.
The logic is specific to the router.
The logic needs to read the stats, sort and the update the counters in the ClusterStats in a single transaction. This would avoid the race condition while processing simultaneous requests.

Thanks for the refactoring. The logic is easier to follow now.

If I understand correctly, QueryCountBasedRouter does the following things:

store new ClusterStats received from observer
choose a backend for a query base on clusterStats
update the clusterStats with heuristic load

QueryCountBasedRouter is the only class that needs to update ClusterStats and read the updated ClusterStats.
The purpose of updating ClusterStats is to store the heuristic load of a new routed query.

We can keep ClusterStats as a Record class and store the heuristic load in another variable.
In this way, ClusterStats will stay as a read-only snapshot of a cluster's stats.

public class QueryCountBasedRouter {
    private List<ClusterStats> clusterStats;
    
    private Map<String, Integer> heuristicLoad; // store heuristic load on cluster
    // maybe HashBasedTable to store heuristic load on cluster per user
    
    public String provideBackendForRoutingGroup(...) {
        // make decision using clusterStats and heuristicLoad
        // update heuristicLoad after assigned a query to backend
    }
    public synchronized void upateBackEndStats(...) {
        // update clusterStats and clear heuristicLoad
    } 
}

Updates: Discussed the following in contributor meeting. The current implementation works well in production.
We could update this later if it doesn't work for some edge cases, but for now it's all good.

~~Another concern is how we choose a backend.~~
~~The current decision tree style of decision could off the mark by a lot.~~

For example, we have backend-1 and backend-2.

Case 1:
backend-1: 1 queued query for UserA
backend-2: 0 queued query for UserA
All the following queries (could be thousands) from UserA will route to backend-2, for the following 60 seconds before stats got refreshed.
Next time we refresh the stats, backend-2 will likely to have UserA's query queued and all queries now route to backend-1 and so on.

Case 2:
backend-1: 20 running queries, 1 queued query
backend-2: 100 running queries, 0 queued query
All the following queries will route to backend-2.

~~We have to assume a lot of the things to estimate the load.~~
* Number of worker / processing power of the backend clusters are roughly identical
* Resource group setting of the backend clusters are roughly identical
* The workload of each query are roughly identical

There are many assumptions that could go wrong. I think some kind of weighted round-robin is a safer choice.
How about something like this?

int clusterLoadForBackend1 = runningQueryCount()
                             + queuedQueryCount() * QUEUED_QUERY_WEIGHT
                             + userQueuedCount(user) *  USER_QUEUED_WEIGHT
                             + heuristicLoad;
...
return Integer.compare(clusterLoadForBackend1, clusterLoadForBackend2);

vishalya · 2024-03-28T19:49:11Z

I have pushed the changes to copy the stats locally and the router now works with them as discussed.

oneonestar

Just some minor comments. Others LGTM.

I think coding style can be improved overtime.
Hope we can get this merged soon to avoid conflicts on migration to JDBI & Airlift.

oneonestar · 2024-03-29T11:50:32Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+        return clusterStats;
+    }
+
+    class LocalStats


oneonestar · 2024-03-29T12:14:39Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+    List<LocalStats> clusterStats()
+    {
+        return clusterStats;
+    }


Let's prevent someone misuse this function in the future.

Suggested change

List<LocalStats> clusterStats()

{

return clusterStats;

}

@VisibleForTesting

synchronized List<LocalStats> clusterStats()

{

return new ArrayList<>(clusterStats);

}

oneonestar · 2024-03-29T12:17:36Z

gateway-ha/src/main/java/io/trino/gateway/ha/router/QueryCountBasedRouter.java

+        return Optional.of(Collections.min(filteredList, (lhs, rhs) -> compareStats(lhs, rhs, user)));
+    }
+
+    private void updateLocalStats(LocalStats stats, String user)


Add synchronized.

mosabua

Looks good. As agreed with @willmostly @Chaho12 and others in the dev sync, we are merging this now and will address any code style and other minor issues in follow up PRs.

cla-bot bot added the cla-signed label Nov 24, 2023

willmostly approved these changes Dec 20, 2023

View reviewed changes

vishalya force-pushed the router_config branch from 9c5bea3 to 41fa781 Compare February 1, 2024 16:07

vishalya changed the title ~~configure the router policy via router provider module~~ Configure the router policy via router provider module Feb 2, 2024

willmostly requested changes Feb 7, 2024

View reviewed changes

alaturqua mentioned this pull request Feb 13, 2024

Feature Request: Routing based on number of queued and running queries #77

Closed

vishalya force-pushed the router_config branch 3 times, most recently from e3501bf to 819af8f Compare February 16, 2024 18:44

willmostly reviewed Feb 16, 2024

View reviewed changes

vishalya force-pushed the router_config branch from cab9b71 to c39d411 Compare February 27, 2024 17:07

willmostly requested changes Feb 27, 2024

View reviewed changes

vishalya force-pushed the router_config branch from c39d411 to 067f2d8 Compare February 27, 2024 21:26

ebyhr reviewed Feb 27, 2024

View reviewed changes

vishalya force-pushed the router_config branch from 067f2d8 to 288cdce Compare February 29, 2024 16:15

vishalya force-pushed the router_config branch 2 times, most recently from 51997af to 9fbc3e7 Compare March 7, 2024 17:13

vishalya force-pushed the router_config branch from 9fbc3e7 to 6d7da79 Compare March 7, 2024 21:51

vishalya force-pushed the router_config branch from 6d7da79 to 6ebd81c Compare March 12, 2024 12:47

oneonestar requested changes Mar 13, 2024

View reviewed changes

vishalya force-pushed the router_config branch from 6ebd81c to 3590138 Compare March 19, 2024 20:35

Configurable routers, add query count based router

c7d5c38

vishalya force-pushed the router_config branch from 3590138 to c7d5c38 Compare March 28, 2024 19:09

oneonestar reviewed Mar 29, 2024

View reviewed changes

mosabua approved these changes Apr 3, 2024

View reviewed changes

mosabua merged commit aa2cbed into trinodb:main Apr 3, 2024
2 checks passed

github-actions bot added this to the 8 milestone Apr 3, 2024

mosabua mentioned this pull request Apr 3, 2024

Add release notes for Trino Gateway 8 #298

Merged

		- io.trino.gateway.ha.module.QueueLengthListenerModule
		- io.trino.gateway.ha.module.QueueLengthRouterProvider

		private ArrayList<ClusterStats> clusterStats;
		private static final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(true);

-    List<LocalStats> clusterStats()
-    {
-        return clusterStats;
-    }
+    @VisibleForTesting
+    synchronized List<LocalStats> clusterStats()
+    {
+        return new ArrayList<>(clusterStats);
+    }

Configure the router policy via router provider module #98

Configure the router policy via router provider module #98

Conversation

vishalya commented Nov 24, 2023 • edited Loading

willmostly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosabua commented Jan 5, 2024

vishalya commented Jan 9, 2024

vishalya commented Feb 1, 2024

willmostly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishalya Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alaturqua commented Feb 13, 2024

willmostly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishalya Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willmostly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr left a comment

Choose a reason for hiding this comment

vishalya commented Mar 7, 2024

vishalya commented Mar 8, 2024

oneonestar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oneonestar commented Mar 20, 2024 • edited Loading

vishalya commented Mar 28, 2024

oneonestar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosabua left a comment

Choose a reason for hiding this comment

vishalya commented Nov 24, 2023 •

edited

Loading

vishalya Feb 14, 2024 •

edited

Loading

vishalya Feb 26, 2024 •

edited

Loading

oneonestar commented Mar 20, 2024 •

edited

Loading