Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MasterService.patchVersions not Rebuild the Full CS #79860

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Oct 26, 2021

This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.

relates #77466

extracted from #79692 when I benchmarked this to show a pretty hefty speedup by saving lots of routing nodes rebuilds.

This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.
@original-brownbear original-brownbear added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.16.1 labels Oct 26, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking for an assertion; could we also have some tests that we are preserving instance equality in the places on which we're relying for the efficiency gains here?

@@ -136,6 +145,7 @@ public ClusterState(ClusterName clusterName, long version, String stateUUID, Met
this.blocks = blocks;
this.customs = customs;
this.wasReadFromDiff = wasReadFromDiff;
this.routingNodes = routingNodes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that this is the routingNodes we would have got from a fresh rebuild here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easily. Neither RoutingNodes nor ShardRouting have any equals methods.
I could try to add those here, but that would make it a much larger change.

For now it's also not entirely trivial to test the gains made here in isolation I think. I was thinking of just taking this chunk in isolation, then doing a follow-up with the remaining chunks that eliminate the remainder of the unnecessary routing node rebuilds from the big PR and just adding an instance equality assertion in ClusterChangedEvent that makes sure that if the routing table doesn't change, then the nodes instance didn't change.

This change seemed safe enough to not require additional tests in isolation (to me that is:)).

Copy link
Contributor

@DaveCTurner DaveCTurner Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh I really don't like having this invariant not be enforced. This is a public constructor, there's a risk that a future caller gets this wrong in future. They might just pass in a RoutingNodes that they happen to have on hand (especially since the argument isn't marked as @Nullable).

Instance equality on ShardRouting should be enough right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instance equality on ShardRouting should be enough right?

Right ... I think. You convinced me :) I'll do it right. Working out a proper equals now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, added the assertion and all the necessary equals methods I did exploit some obvious invariants to not have to compare all fields, but I erred on the side of caution here and there might be possible optimizations to these methods, but since they're only used for the assertion it's probably irrelevant.

As for ShardRouting, we unfortunately needed an equals there as well because of the case where we re-create the instance with a null DiscoveryNode when building RoutingNodes (hence the fix to its toString here).
Seems to work fine now though :)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks David!

@original-brownbear original-brownbear merged commit 0f9b4c4 into elastic:master Oct 27, 2021
@original-brownbear original-brownbear deleted the free-version-increment branch October 27, 2021 11:54
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 27, 2021
This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.
original-brownbear added a commit that referenced this pull request Oct 27, 2021
…9900)

This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.
@original-brownbear original-brownbear restored the free-version-increment branch April 18, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants