Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Required for all non-trivial PRs
On data sets with many series, large series keys and many shards,
the cost of parsing the key and re-indexing can be high.
Loading the TSM keys into the index was being done repeatedly for
series that were already loaded into the index by an earlier TSM file. This was
wasted worked and slows down shard loading.
Parsing the key was also innefficient and allocated a new string
slice. This was simplified to remove that allocation.
I tested this on a dataset with two databases containing 155 and 85 shards each with ~300k series keys in each DB and keys ~250bytes in length.
0.13.0 (old/new)
5:32 -> 0:32
master (old/new)
4:25 -> 0.33
master cache flushed (old/new)
5:09 -> 1:42
This should help #6250 in some cases although there may still be other bottlenecks that this data set does not bring out.