-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make MasterService.patchVersions not Rebuild the Full CS #79860
Make MasterService.patchVersions not Rebuild the Full CS #79860
Conversation
This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking for an assertion; could we also have some tests that we are preserving instance equality in the places on which we're relying for the efficiency gains here?
@@ -136,6 +145,7 @@ public ClusterState(ClusterName clusterName, long version, String stateUUID, Met | |||
this.blocks = blocks; | |||
this.customs = customs; | |||
this.wasReadFromDiff = wasReadFromDiff; | |||
this.routingNodes = routingNodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that this is the routingNodes
we would have got from a fresh rebuild here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easily. Neither RoutingNodes
nor ShardRouting
have any equals methods.
I could try to add those here, but that would make it a much larger change.
For now it's also not entirely trivial to test the gains made here in isolation I think. I was thinking of just taking this chunk in isolation, then doing a follow-up with the remaining chunks that eliminate the remainder of the unnecessary routing node rebuilds from the big PR and just adding an instance equality assertion in ClusterChangedEvent
that makes sure that if the routing table doesn't change, then the nodes instance didn't change.
This change seemed safe enough to not require additional tests in isolation (to me that is:)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh I really don't like having this invariant not be enforced. This is a public constructor, there's a risk that a future caller gets this wrong in future. They might just pass in a RoutingNodes
that they happen to have on hand (especially since the argument isn't marked as @Nullable
).
Instance equality on ShardRouting
should be enough right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instance equality on ShardRouting should be enough right?
Right ... I think. You convinced me :) I'll do it right. Working out a proper equals now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, added the assertion and all the necessary equals methods I did exploit some obvious invariants to not have to compare all fields, but I erred on the side of caution here and there might be possible optimizations to these methods, but since they're only used for the assertion it's probably irrelevant.
As for ShardRouting
, we unfortunately needed an equals
there as well because of the case where we re-create the instance with a null
DiscoveryNode
when building RoutingNodes
(hence the fix to its toString
here).
Seems to work fine now though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks David! |
This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
…9900) This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.
relates #77466
extracted from #79692 when I benchmarked this to show a pretty hefty speedup by saving lots of routing nodes rebuilds.