-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Building Indices Lookup in Metadata #83241
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1792,7 +1792,7 @@ static SortedMap<String, IndexAbstraction> buildIndicesLookup( | |||||
ImmutableOpenMap<String, IndexMetadata> indices | ||||||
) { | ||||||
SortedMap<String, IndexAbstraction> indicesLookup = new TreeMap<>(); | ||||||
Map<String, DataStream> indexToDataStreamLookup = new HashMap<>(); | ||||||
Map<String, IndexAbstraction.DataStream> indexToDataStreamLookup = new HashMap<>(); | ||||||
// If there are no indices, then skip data streams. This happens only when metadata is read from disk | ||||||
if (dataStreamMetadata != null && indices.size() > 0) { | ||||||
Map<String, List<String>> dataStreamToAliasLookup = new HashMap<>(); | ||||||
|
@@ -1817,35 +1817,27 @@ static SortedMap<String, IndexAbstraction> buildIndicesLookup( | |||||
assert dataStream.getIndices().isEmpty() == false; | ||||||
|
||||||
List<String> aliases = dataStreamToAliasLookup.getOrDefault(dataStream.getName(), List.of()); | ||||||
final IndexAbstraction.DataStream dsAbstraction = new IndexAbstraction.DataStream(dataStream, aliases); | ||||||
IndexAbstraction existing = indicesLookup.put( | ||||||
dataStream.getName(), | ||||||
new IndexAbstraction.DataStream(dataStream, aliases) | ||||||
); | ||||||
assert existing == null : "duplicate data stream for " + dataStream.getName(); | ||||||
|
||||||
for (Index i : dataStream.getIndices()) { | ||||||
indexToDataStreamLookup.put(i.getName(), dataStream); | ||||||
indexToDataStreamLookup.put(i.getName(), dsAbstraction); | ||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
Map<String, List<IndexMetadata>> aliasToIndices = new HashMap<>(); | ||||||
for (var indexMetadata : indices.values()) { | ||||||
ConcreteIndex index; | ||||||
DataStream parent = indexToDataStreamLookup.get(indexMetadata.getIndex().getName()); | ||||||
if (parent != null) { | ||||||
assert parent.getIndices() | ||||||
.stream() | ||||||
.map(Index::getName) | ||||||
.collect(Collectors.toList()) | ||||||
.contains(indexMetadata.getIndex().getName()) | ||||||
: "Expected data stream [" + parent.getName() + "] to contain index " + indexMetadata.getIndex(); | ||||||
index = new ConcreteIndex(indexMetadata, (IndexAbstraction.DataStream) indicesLookup.get(parent.getName())); | ||||||
} else { | ||||||
index = new ConcreteIndex(indexMetadata); | ||||||
} | ||||||
|
||||||
IndexAbstraction existing = indicesLookup.put(indexMetadata.getIndex().getName(), index); | ||||||
for (var entry : indices) { | ||||||
final String name = entry.key; | ||||||
final IndexMetadata indexMetadata = entry.value; | ||||||
final IndexAbstraction.DataStream parent = indexToDataStreamLookup.get(name); | ||||||
assert parent == null || parent.getIndices().stream().anyMatch(index -> name.equals(index.getName())) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. assertion change is unrelated, just felt the need to clean this up and make it a little clearer what we're asserting :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. totally a nit and feel free to ignore, but since we are using streams, I would prefer to keep
Suggested change
|
||||||
: "Expected data stream [" + parent.getName() + "] to contain index " + indexMetadata.getIndex(); | ||||||
IndexAbstraction existing = indicesLookup.put(name, new ConcreteIndex(indexMetadata, parent)); | ||||||
assert existing == null : "duplicate for " + indexMetadata.getIndex(); | ||||||
|
||||||
for (var aliasCursor : indexMetadata.getAliases()) { | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to build
indextToDataStreamLookup
at all? Could we not simply populateindicesLookup
directly here instead and ignore those below (asserting that they are data stream owned instead of the duplicate check)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right we don't ... now comes the interesting question of whether this would actually be faster.
Looking at the profiling for the full method (after my change here):
for a completely degenerate case where we have about as many indices as datastreams (one index per DS) ...
the hash map building barely shows up relative to the tree-map.
So the question becomes, is it cheaper to do 2 puts (or I guess one put and one computeIfAbsent) to the tree-map (the second put in the indices loop would be to a larger tree map than the one in the DS loop) or do one put into an on average smaller tree map and instead build the technically redundant hash-map.
Profiling suggests that this is faster I'd say. And in the real world, where you'd have multiple indices per DS the decision would be even more in favour of having the hash-map I think.
The problem we're running into is that the tree-map is super lob sided because we have the shared
.ds
prefix for all indices pretty much and then some more sharing for the specific DS, leading to loads of comparisons.We should've used a different naming schema for the internal DS indices I guess ... (in fact I think we still could).