Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize getIndices in IndicesSegmentResponse #80064

Merged

Conversation

mushao999
Copy link
Contributor

The getIndices method in IndicesSegmentResponse will traverse shards for n+1 times (n is indices count), which can be optimized to just traverse once by using Map and its computeIfAbsent method.
Same logic can be found in IndexSegments’ constructor, where we just traverse shard once.
This PR just want to make elasticsearch code better, thougth getIndices method may be called Infrequently and has no performance issue.

@elasticsearchmachine elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 29, 2021
@DaveCTurner DaveCTurner added :Data Management/Stats Statistics tracking and retrieval APIs >enhancement labels Oct 29, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Oct 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Contributor

@elasticmachine ok to test

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice improvement. I left a few suggestions.

@@ -61,19 +59,16 @@
}
Map<String, IndexSegments> indicesSegments = new HashMap<>();

Set<String> indices = new HashSet<>();
Map<String, List<ShardSegments>> tmpIndicesSegment = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rename:

Suggested change
Map<String, List<ShardSegments>> tmpIndicesSegment = new HashMap<>();
final Map<String, List<ShardSegments>> segmentsByIndex = new HashMap<>();

Comment on lines 64 to 68
List<ShardSegments> indexSegments = tmpIndicesSegment.computeIfAbsent(
shard.getShardRouting().getIndexName(),
k -> new ArrayList<>()
);
indexSegments.add(shard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest inlining the intermediate variable:

Suggested change
List<ShardSegments> indexSegments = tmpIndicesSegment.computeIfAbsent(
shard.getShardRouting().getIndexName(),
k -> new ArrayList<>()
);
indexSegments.add(shard);
segmentsByIndex.computeIfAbsent(shard.getShardRouting().getIndexName(), n -> new ArrayList<>()).add(shard);

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also please make IndicesSegmentResponse#indicesSegments volatile. I don't see any places where we access it across threads today but it's possible we'd add one in future.

@mushao999
Copy link
Contributor Author

@DaveCTurner Thank you for your suggestion, I've updated the code.

@mushao999 mushao999 force-pushed the feature/optimize_indices_segment_response branch from e390ac4 to e652170 Compare October 29, 2021 13:32
@DaveCTurner
Copy link
Contributor

Thanks @mushao999, this looks good to me but CI is failing - you need to run ./gradlew spotlessapply to apply the correct formatting. Please don't force-push to PRs once they're being reviewed, it makes it hard to keep track of comments etc.

@mushao999
Copy link
Contributor Author

Thanks @mushao999, this looks good to me but CI is failing - you need to run ./gradlew spotlessapply to apply the correct formatting. Please don't force-push to PRs once they're being reviewed, it makes it hard to keep track of comments etc.

@DaveCTurner Thank you , I found this problem in jenkins log and have reformat the code. I am sorry for force pushing the code, I did this just to let it keep up with the latest master. No force push later.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It looks like we don't have a huge amount of testing of this API so I added a slightly stronger YAML test in 682e022. I also linked this from #77466 since it aligns closely with that work and I think this would be a problem in clusters with very large numbers of indices.

@DaveCTurner DaveCTurner merged commit 9233dc2 into elastic:master Oct 29, 2021
@mushao999 mushao999 deleted the feature/optimize_indices_segment_response branch October 30, 2021 06:38
@mushao999 mushao999 restored the feature/optimize_indices_segment_response branch November 5, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants