Correct service time parameter in ARS formula #70283

jtibshirani · 2021-03-11T00:42:19Z

The adaptive replica selection algorithm implements the C3 algorithm
for ranking nodes. The formula defines service time as a quantity 1/muBarS.
Our implementation accidentally plugs in service time for muBarS instead of
1/muBarS. This commit corrects the formula and adds invariant tests to
confirm it behaves as expected.

This change also fixes a bug in how we adjust node statistics. To ensure that
nodes with high ranks occasionally get selected, every time we select a
'winner' node, we average its stats with the node's stats and add it to the
moving average. For service time, we were accidentally overwriting the whole
moving average with the new stats, which caused the ranks to adjust too
quickly. This issue has a much bigger impact now that the formula correctly
incorporates service time, and is important to fix so the behavior remains
reasonable.

Fixes #65838.

jtibshirani · 2021-03-11T01:11:20Z

As mentioned in the issue, for moderate queue sizes the existing behavior roughly ranks based on response time. So in many cases we may not see a real difference with this change. However (based on my understanding of the formula) there are possible improvements:

The ranking could be more responsive, since we now better incorporate non-local information about each node's service time and queue size
When nodes have larger queue sizes, we'll always make the right selection, whereas before we might pick a node with worse service time

Note that I have not run large scale end-to-end tests to assess the impact. I was hoping for feedback here: what level of end-to-end testing do we think is appropriate for this change? The ARS algorithm has a few moving parts (for example stats adjustments), so it can be hard to fully understand the behavior with only unit tests.

elasticmachine · 2021-03-11T01:36:39Z

Pinging @elastic/es-search (Team:Search)

jtibshirani · 2021-03-12T02:40:30Z

server/src/test/java/org/elasticsearch/cluster/routing/OperationRoutingTests.java

        ClusterService clusterService = ClusterServiceUtils.createClusterService(threadPool);
        ResponseCollectorService collector = new ResponseCollectorService(clusterService);
-        Map<String, Long> outstandingRequests = new HashMap<>();


When a 'winner' node is selected, we update the local 'outstanding requests' map to increment its value by 1. This only affects the local copy, it doesn't update global statistics. While this has a purpose in the ARS logic, it doesn't make sense in the context of this test. So I removed the shared outstandingRequests map here, and added another test that explicitly checks this behavior.

jimczi

Great catch @jtibshirani ! The formula looks much better now ;).

jimczi · 2021-03-16T14:35:18Z

server/src/main/java/org/elasticsearch/cluster/routing/IndexShardRoutingTable.java

+                    ExponentiallyWeightedMovingAverage avgServiceTime = new ExponentiallyWeightedMovingAverage(
+                        ResponseCollectorService.ALPHA, stats.serviceTime);
+                    avgServiceTime.addValue((minStats.serviceTime + stats.serviceTime) / 2);
+                    final long updatedService = (long) avgServiceTime.getAverage();


As discussed offline, the "adjustment" seems odd. We should think about updating the node statistics explicitly. That's not in the scope of this PR but that would be a good follow up as discussed offline.

@jtibshirani
I'm curious, in NodeStatistics:

final ExponentiallyWeightedMovingAverage queueSize; final ExponentiallyWeightedMovingAverage responseTime; double serviceTime;

why not serviceTime also use type ExponentiallyWeightedMovingAverage, but adjust it here? It seems the effect is the same.

jtibshirani · 2021-03-16T21:49:51Z

Thanks @jimczi for the review! I also tagged @henningandersen to get input from the distributed area.

henningandersen

Thanks for looking into this. I have a question, probably just me needing this clarification.

Also, I wonder if we could add an integration test to validate that the overall mechanism works as we expect it (for instance, doing two searches in a row, we would want to see it hit two different nodes (I think)).

henningandersen · 2021-03-17T13:48:25Z

server/src/main/java/org/elasticsearch/node/ResponseCollectorService.java


            // The final formula
-            double rank = rS - (1.0 / muBarS) + (Math.pow(qHatS, queueAdjustmentFactor) / muBarS);
-            return rank;
+            return rS - muBarSInverse + Math.pow(qHatS, queueAdjustmentFactor) * muBarSInverse;


This change looks good. But I wonder if the clientNum above is representing the right number? As far as I can see, it is the number of nodes that this node has gotten a search response from. Compared to the paper this sounds more like "servers" than "clients". For instance in setups with a dedicated coordinating tier, this could be somewhat dynamic and the number of clients could differ between the nodes in the tier.

I wonder if you know the reasoning behind how the clientNum is derived?

I also found this surprising and don't understand the reasoning. I am guessing it is meant as a loose approximation to the number of clients, since by default every node can serve as a coordinating node. Since we were dividing by service time before, this clientNum approximation didn't have a big impact.

I can see a couple options. We could avoid making changes to clientNum in this PR to keep it well-scoped, while recognizing that it may give too much weight to the queue size factor. Or we could always set clientNum to 1 for now, which can underestimate the queue size, but is simpler and makes the calculation more predictable.

In any case, I will make sure we track this through an issue or other means.

My worry is primarily that if we split changes here, we end up with some customers seeing multiple changes to the behavior over releases.

On the other hand, my intuition on this matter is not really strong enough that I think it is a show-stopper to merge this first and then deal with the other part later. It seems likely enough that clientNum will stabilize around a similar number over time for "client" nodes in the cluster. It might be too high (or low) though and building up the numClients could take time after restarts.

The dedicated role for coordinating nodes might come in handy here.

This change identified a few aspects of ARS that could use improvement (including the 'stats adjustment' @jimczi mentioned above). So there will likely be more changes even apart from clientNum. To me it seems okay to introduce fixes/ improvements incrementally instead of assembling a single large update to the algorithm.

jtibshirani · 2021-03-18T00:22:14Z

I wonder if we could add an integration test to validate that the overall mechanism works as we expect it

I added SearchReplicaSelectionIT with some basic checks on end-to-end behavior.

henningandersen

LGTM.

henningandersen · 2021-03-18T10:32:43Z

server/src/main/java/org/elasticsearch/node/ResponseCollectorService.java


            // The final formula
-            double rank = rS - (1.0 / muBarS) + (Math.pow(qHatS, queueAdjustmentFactor) / muBarS);
-            return rank;
+            return rS - muBarSInverse + Math.pow(qHatS, queueAdjustmentFactor) * muBarSInverse;


My worry is primarily that if we split changes here, we end up with some customers seeing multiple changes to the behavior over releases.

On the other hand, my intuition on this matter is not really strong enough that I think it is a show-stopper to merge this first and then deal with the other part later. It seems likely enough that clientNum will stabilize around a similar number over time for "client" nodes in the cluster. It might be too high (or low) though and building up the numClients could take time after restarts.

The dedicated role for coordinating nodes might come in handy here.

When computing node’s ARS rank, we use the number of outstanding search requests to the node. If there are no connections to the node, we consider there to be 1 outstanding request. This isn’t accurate, the default should be 0 to indicate no outstanding requests. The ARS rank we return in node stats actually uses 0 instead of 1. This small fix lets us remove a test workaround. It also ensures the ARS ranks we return in node stats match the ranks we use to select shards during search. Follow-up to #70283.

jtibshirani marked this pull request as ready for review March 11, 2021 01:36

jtibshirani added :Search/Search Search-related issues that do not fall into other categories >bug v7.13.0 v8.0.0 labels Mar 11, 2021

elasticmachine added the Team:Search Meta label for search team label Mar 11, 2021

jtibshirani marked this pull request as draft March 11, 2021 17:22

jtibshirani added 4 commits March 11, 2021 17:12

Correct the ARS formula.

71e01c6

Correct the service time adjustment.

15e952c

Adjust test checks for the new formula.

fd5be9b

Add test for outstanding request tracking.

70c00bc

jtibshirani force-pushed the ars-formula branch from c1ec0ee to 70c00bc Compare March 12, 2021 02:39

jtibshirani commented Mar 12, 2021

View reviewed changes

jtibshirani marked this pull request as ready for review March 15, 2021 23:29

jimczi approved these changes Mar 16, 2021

View reviewed changes

jtibshirani requested a review from henningandersen March 16, 2021 16:29

henningandersen reviewed Mar 17, 2021

View reviewed changes

jtibshirani added 2 commits March 17, 2021 11:20

Merge remote-tracking branch 'upstream/master' into ars-formula

1b245e7

Add a simple integration test.

5f3d827

Remove test case added accidentally.

5dafd98

henningandersen approved these changes Mar 18, 2021

View reviewed changes

jtibshirani merged commit ca43dd1 into elastic:master Mar 18, 2021

jtibshirani deleted the ars-formula branch March 18, 2021 16:56

jtibshirani added the backport pending label Mar 20, 2021

henningandersen mentioned this pull request Mar 22, 2021

[CI] SearchReplicaSelectionIT.testNodeSelection #70621

Closed

jtibshirani mentioned this pull request Mar 29, 2021

Correct service time parameter in ARS formula #71015

Merged

jtibshirani removed the backport pending label Mar 29, 2021

jtibshirani mentioned this pull request Mar 30, 2021

In ARS, correct default number of outstanding requests #71022

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct service time parameter in ARS formula #70283

Correct service time parameter in ARS formula #70283

jtibshirani commented Mar 11, 2021 •

edited

Loading

jtibshirani commented Mar 11, 2021 •

edited

Loading

elasticmachine commented Mar 11, 2021

jtibshirani Mar 12, 2021

jimczi left a comment

jimczi Mar 16, 2021

kkewwei Mar 1, 2024

jtibshirani commented Mar 16, 2021 •

edited

Loading

henningandersen left a comment

henningandersen Mar 17, 2021

jtibshirani Mar 17, 2021 •

edited

Loading

henningandersen Mar 18, 2021

jtibshirani Mar 18, 2021 •

edited

Loading

jtibshirani commented Mar 18, 2021

henningandersen left a comment

henningandersen Mar 18, 2021

Correct service time parameter in ARS formula #70283

Correct service time parameter in ARS formula #70283

Conversation

jtibshirani commented Mar 11, 2021 • edited Loading

jtibshirani commented Mar 11, 2021 • edited Loading

elasticmachine commented Mar 11, 2021

jtibshirani Mar 12, 2021

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

jimczi Mar 16, 2021

Choose a reason for hiding this comment

kkewwei Mar 1, 2024

Choose a reason for hiding this comment

jtibshirani commented Mar 16, 2021 • edited Loading

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Mar 17, 2021

Choose a reason for hiding this comment

jtibshirani Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

henningandersen Mar 18, 2021

Choose a reason for hiding this comment

jtibshirani Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

jtibshirani commented Mar 18, 2021

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Mar 18, 2021

Choose a reason for hiding this comment

jtibshirani commented Mar 11, 2021 •

edited

Loading

jtibshirani commented Mar 11, 2021 •

edited

Loading

jtibshirani commented Mar 16, 2021 •

edited

Loading

jtibshirani Mar 17, 2021 •

edited

Loading

jtibshirani Mar 18, 2021 •

edited

Loading