-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save some RoutingNodes Instantiations #79941
Save some RoutingNodes Instantiations #79941
Conversation
This commit introduces the ability to create a mutable copy of `RoutingNodes` instead of having to re-compute the data structures from scratch which is much faster and allows for some reuse of immutable parts of a routing nodes instance. Also, this sets up further improvements that avoid some more redundant routing node instantiations in `RoutingAllocation`.
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few smaller comments. My main question is about the benefits of this and whether we are actually saving any significant amounts of instantiation by doing this. Do you have some data on that?
One option I was wondering about is whether the normal RoutingNodes
constructor can simply be tuned enough to have equivalent performance, but it might not be viable. Did you examine that direction?
return true; | ||
} | ||
final RoutingNodes expected = new RoutingNodes(routingTable, nodes); | ||
assert routingNodes.equals(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equals
does not look like it compares all fields, for instance on ShardRouting
. Also, it would be even nicer to verify that all the ShardRouting instances are the same?
But then again, that really does not change with this PR, so feel free to disregard this.
assert invariant(); | ||
} | ||
|
||
public RoutingNode copy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be package private instead?
Also, perhaps rename to mutableCopy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though these instances are all mutable to begin with, so maybe leave the name? :)
this.ignoredPrimaries = ignoredPrimaries; | ||
} | ||
|
||
public UnassignedShards copyTo(RoutingNodes newNodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename to copyFor
, at least to me To
signals that it writes into the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ renamed
if (version == 0 && indicesRouting.isEmpty()) { | ||
table = EMPTY_ROUTING_TABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand when this helps us, can you elaborate and/or comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this just leaked in here from the other branch. It doesn't really give a big performance boost, it's more of a cleanup and I suppose saves a couple of cycles around this code in tests :) I can back it out if it's too noisy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ε to revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted :)
@@ -931,7 +931,7 @@ public void testCanRemainWithShardRelocatingAway() { | |||
ClusterState clusterState = ClusterState.builder(baseClusterState).routingTable(builder.build()).build(); | |||
RoutingAllocation routingAllocation = new RoutingAllocation( | |||
null, | |||
new RoutingNodes(clusterState), | |||
new RoutingNodes(clusterState.routingTable(), clusterState.nodes()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would clusterState.getRoutingNodes()
not work here too (and in the other tests using the constructor)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, literally the only reason I didn't use it was because that would instantiate that field on the ClusterState
and thus change the behavior of the test. I can't see how that would be relevant here in practice, but I didn't want to make any changes to this in this test (could have just called that method before as well).
@@ -185,7 +185,7 @@ static Decision executeAllocation(ClusterState clusterState, ShardRouting shardR | |||
final AllocationDecider decider = new CcrPrimaryFollowerAllocationDecider(); | |||
final RoutingAllocation routingAllocation = new RoutingAllocation( | |||
new AllocationDeciders(List.of(decider)), | |||
new RoutingNodes(clusterState), | |||
new RoutingNodes(clusterState.routingTable(), clusterState.nodes()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use clusterState.getRoutingNodes()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above, probably fine to make that change but I didn't want to change test behavior here.
public RoutingNodes(RoutingTable routingTable, DiscoveryNodes discoveryNodes) { | ||
this(routingTable, discoveryNodes, true); | ||
} | ||
|
||
public RoutingNodes(ClusterState clusterState, boolean readOnly) { | ||
public RoutingNodes(RoutingTable routingTable, DiscoveryNodes discoveryNodes, boolean readOnly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make these private and add factory methods with names signaling the read-only-ness as well as that they are only to be called by ClusterState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure moved to two separate factory methods here, looks much cleaner indeed.
…g-temporary-routing-nodes
Yes, this is one of the most expensive steps in re-route today (and becomes the most expensive step in it once other fixes in #77466 to index settings have made it in). This makes the part where we create mutable routing nodes more than 50% cheaper in many cases which is a 10% saving on the full operation even today (and a larger one relatively speaking once the other fixes hit).
I have and #77466 contains a number of improvements on that front as well, there's limits on how far optimizations can go here though without fundamentally changing things. |
…uting-nodes' into avoid-instantiating-temporary-routing-nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
We should probably have different classes for mutable and immutable RoutingNodes
, given how many of the methods don't make sense if readOnly
is set.
if (version == 0 && indicesRouting.isEmpty()) { | ||
table = EMPTY_ROUTING_TABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ε to revert
this.relocatingShards = routingNodes.relocatingShards; | ||
this.activeShardCount = routingNodes.activeShardCount; | ||
this.totalShardCount = routingNodes.totalShardCount; | ||
this.attributeValuesByAttribute = new HashMap<>(routingNodes.attributeValuesByAttribute.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lazy cache of some immutable properties of the immutable set of nodes so it's never meaningfully mutated, so really we could just re-use the same map, or else instantiate it as an empty map. Also if it's only used in AwarenessAllocationDecider
it should be empty on immutable instances anyway. But maybe it's nicer to explicitly copy it here anyway just for clarity.
Maybe we should move this to DiscoveryNodes
so it doesn't get invalidated on every cluster state update too. Probably not a big deal, just thinking out loud here.
public RoutingNodes mutableCopy() { | ||
// we should not call this on mutable instances, it's still expensive to create the copy and callers should instead mutate a single | ||
// instance | ||
assert readOnly : "tried to create a mutable copy from a mutable instance"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we assert this in the (private) constructor instead, just in case we inadvertently add another caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ done
Jenkins run elasticsearch-ci/part-1 (known failure) |
Thanks David & Henning! |
This commit introduces the ability to create a mutable copy of
RoutingNodes
instead of having to re-compute the data structures from scratch which is much
faster and allows for some reuse of immutable parts of a routing nodes instance.
Also, this sets up further improvements that avoid some more redundant routing node
instantiations in
RoutingAllocation
.relates #77466