-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LazySeriesIterator with fuzzy metric name queries #442
Conversation
7163bcc
to
abe573d
Compare
249b65e
to
affeca8
Compare
pkg/chunk/chunk_store.go
Outdated
@@ -151,17 +152,35 @@ func (c *Store) calculateDynamoWrites(userID string, chunks []Chunk) (WriteBatch | |||
} | |||
|
|||
// Get implements ChunkStore | |||
func (c *Store) Get(ctx context.Context, from, through model.Time, allMatchers ...*metric.LabelMatcher) ([]Chunk, error) { | |||
func (c *Store) Get(ctx context.Context, from, through model.Time, allMatchers ...*metric.LabelMatcher) ([]local.SeriesIterator, error) { | |||
if through < from { | |||
return nil, fmt.Errorf("invalid query, through < from (%d < %d)", through, from) | |||
} | |||
|
|||
filters, matchers := util.SplitFiltersAndMatchers(allMatchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be pushed into getMetricNameIterators
and getFuzzyMetricLazySeriesIterators
.
pkg/chunk/chunk_store.go
Outdated
return c.lookupChunksByMetricName(ctx, from, through, matchers, metricNameMatcher.Value) | ||
} | ||
|
||
func (c *Store) getFuzzyMetricLazySeriesIterators(ctx context.Context, from, through model.Time, filters []*metric.LabelMatcher, matchers []*metric.LabelMatcher, metricNameMatcher *metric.LabelMatcher) ([]local.SeriesIterator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better names getSeriesIterators
?
pkg/chunk/chunk_store.go
Outdated
if ok && !metricNameMatcher.Match(metricName) { | ||
skippedMetricNames++ | ||
// Apply metricNameMatcher filter | ||
if metricNameMatcher != nil && !metricNameMatcher.Match(metric[metricNameMatcher.Name]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not redundant with the checks below?
Also, if you push the util.SplitFiltersAndMatchers
into getMetricNameIterators
, this section will just become:
for _, matcher := range matchers {
if !... {...}
}
And you won't need all three checks.
pkg/chunk/schema_test.go
Outdated
@@ -236,6 +237,7 @@ func TestSchemaHashKeys(t *testing.T) { | |||
const ( | |||
MetricNameRangeValue = iota + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically you do
const (
_ iota
foo
bar
)
If you want to start at 1.
"flip": "flop", | ||
}, | ||
"KrbXMezYneba+o7wfEdtzOdAWhbfWcDrlVfs1uOCX3M", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you (for my sanity) add a test case for the same metric, with the pairs in a different order?
@@ -517,7 +532,7 @@ func (d *Distributor) queryIngesters(ctx context.Context, ingesters []*ring.Inge | |||
} | |||
fpToSampleStream[fp] = mss | |||
} | |||
mss.Values = util.MergeSamples(mss.Values, ss.Values) | |||
mss.Values = util.MergeSampleSets(mss.Values, ss.Values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such large PRs I generally try and avoid renamings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed it to MergeSampleSets
to keep it consistent with MergeNSampleSets
and it's only used in 2 places. I believe this was in a previous PR but we have now agreed to merge all PR's together.
@@ -0,0 +1,142 @@ | |||
package util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this is copied from the prometheus code, and the PR to make it public?
883c061
to
940291e
Compare
Is this gtg now? I'll review this weekend if so.
…On Fri, 9 Jun 2017 at 14:52, Aaron Kirkbride ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/distributor/distributor.go
<#442 (comment)>:
> @@ -517,7 +532,7 @@ func (d *Distributor) queryIngesters(ctx context.Context, ingesters []*ring.Inge
}
fpToSampleStream[fp] = mss
}
- mss.Values = util.MergeSamples(mss.Values, ss.Values)
+ mss.Values = util.MergeSampleSets(mss.Values, ss.Values)
We changed it to MergeSampleSets to keep it consistent with
MergeNSampleSets and it's only used in 2 places. I believe this was in a
previous PR but we have now agreed to merge all PR's together.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#442 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhYlkomWhsneYk4Z6FCReP1hRzwcmks5sCU4jgaJpZM4NmCNS>
.
|
pkg/chunk/schema.go
Outdated
return nil, ErrNoMetricNameNotSupported | ||
} | ||
|
||
// v8Entries is the same as v7Entries however with a series index instead of a metric name index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v8Entries is the same as v7Entries
does not seem consistent with:
type v8Entries struct { v6Entries }
which one is correct? is just a typo in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v8Entries
embedding v6Entries
is correct. The comment says it's the same as v7
because we are replacing the index v7 adds. However this comment is confusing because v8Entries GetWriteEntries is completely different.
I will update the comment - thanks
|
||
"fmt" | ||
|
||
"github.com/prometheus/common/model" | ||
) | ||
|
||
func metricSeriesID(m model.Metric) string { | ||
h := sha256.Sum256([]byte(m.String())) | ||
return string(encodeBase64Bytes(h[:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is there a special reason we use base64
here vs., for example, hex.EncodeToString
which:
- is more natural for such hash functions, and
- could be more convenient to use when debugging? (or e.g. printing https://play.golang.org/p/IeHqPaaNk4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No special reason. It's slightly more efficient to store (4 chars per 3 bytes instead of 2 chars per 2 bytes) and is seen more of an UID than a hash. Base64 is human readable and we also encode to base64 for the dynamo api (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.LowLevelAPI.html).
pkg/util/merger.go
Outdated
func MergeNSamples(sampleSets ...[]model.SamplePair) []model.SamplePair { | ||
l := len(sampleSets) | ||
switch l { | ||
case 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I've missed it, and even though that's trivial, I don't think there is an unit test covering this case
statement.
If I understood well, we'd need this additional test case:
{
sampleSets: [][]model.SamplePair{},
expected: []model.SamplePair{},
},
} | ||
|
||
// MergeNSampleSets merges and dedupes n sets of already sorted sample pairs. | ||
func MergeNSampleSets(sampleSets ...[]model.SamplePair) []model.SamplePair { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Are values within a
[]model.SamplePair
guaranteed to be sorted? (the comment above seems to indicate so) -
If so, would it be worth implementing:
- a n-way merge, instead of
- this 2-way merge for which we then enqueue results (and repeat)?
It feels like we would re-allocate quite a few arrays and process
model.SamplePair
s more than once for large number ofsampleSets
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this could be useful, here is an example of doing the n-way merge using:
- a "dedup iterator" on
- a "merging iterator" on
- a collection of "iterators".
Lack of generics make this pretty ugly in Go (feedback welcome as I'm a beginner at Go), however:
- a similar approach could be done with pure arrays/slices;
- the "dedup" logic and "merging" logic could be merged for better performance.
The thing about it is that it is nicely composable and the overall complexity is O(n.log(k))
,
k
being the number of iterators merged, andn
the total number of items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we are going to keep with the merge approach for now in this PR because the most common case is only a few sample sets and the code is simpler.
err = it.createSampleSeriesIterator() | ||
}) | ||
if err != nil { | ||
// TODO: Handle error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to handle this error before this PR gets merged? (I am just asking given the TODO
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR. I would like to discuss this - I'm unsure how to approach this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure as well right now... and it looks like it could fail in many ways 🙁
for _, it := range iterators { | ||
ss := &model.SampleStream{ | ||
Metric: it.Metric().Metric, | ||
Values: it.RangeValues(in), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to point out: RangeValues
could return nil
. Does it matter? What do we do with these SampleStream
s afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we only pass these to ToQueryResponse
but we don't check for nil
there, so we could potentially "panic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a look into this and it doesn't look like it will panic because we are ranging over a nil
slice which does 0 iterations. https://github.com/golang/go/blob/master/doc/go_spec.html#L5011
https://play.golang.org/p/4vvbbHMKUo
|
||
// ValueAtOrBeforeTime implements the SeriesIterator interface. | ||
func (it SampleStreamIterator) ValueAtOrBeforeTime(ts model.Time) model.SamplePair { | ||
// TODO: This is a naive inefficient approach - in reality, queries go mostly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something to resolve as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved from pkg/querier/iterator.go
. We still use SampleStreamIterator
in some cases and the comment is still relevant. We still need improve the iterators to iterate through chunks - this PR focuses on allowing queries which do not fetch samples to execute efficiently.
Yes, it's ready for review if you get some time. Thanks :) |
Great! I'll look at it tomorrow afternoon. |
pkg/chunk/chunk.go
Outdated
|
||
return iterators, nil | ||
} | ||
|
||
// ChunksToMatrix converts a slice of chunks into a model.Matrix. | ||
func ChunksToMatrix(chunks []Chunk) (model.Matrix, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only used in tests now, could be moved into them.
// Fetch chunk descriptors (just ID really) from storage | ||
chunks, err := c.lookupChunksByMatchers(ctx, from, through, matchers) | ||
func (c *Store) getMetricNameIterators(ctx context.Context, from, through model.Time, allMatchers []*metric.LabelMatcher, metricName model.LabelValue) ([]local.SeriesIterator, error) { | ||
chunks, err := c.getMetricNameChunks(ctx, from, through, allMatchers, metricName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to move the contents of getMetricNameChunks
into here, as this function seems overly short. I see that it is used from the tests, so you'd need to do some work there, in which case it might not be worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, once getMetricNameIterators
converts the chunks from getMetricNameChunks
into iterators, we cannot access the chunks for testing through the iterator interface. We would like access to these to test whether the right chunks are fetched or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't test at the interval level as we are concerned about which chunks are fetched to create the iterators. We could expose the chunks, but it's a prometheus interface and it doesn't seem like the right thing to do, exposing something that is not related. This currently seems like the best way to get around testing constraints.
pkg/chunk/iterator.go
Outdated
func (it *LazySeriesIterator) createSampleSeriesIterator() error { | ||
metricName, ok := it.metric[model.MetricNameLabel] | ||
if !ok { | ||
return fmt.Errorf("series does not have a metric name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push this error into the creation of the LazySeriesIterator?
pkg/chunk/iterator.go
Outdated
return fmt.Errorf("series does not have a metric name") | ||
} | ||
|
||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a ticket upstream to add contexts and error returns to iterators. Will probably have to wait until post v2 though.
20f2e43
to
8b31b36
Compare
Rebased - will test before merging. Thanks for the review @tomwilkie . |
8b31b36
to
aa37aa9
Compare
Tested locally with We're seeing I will also be adding some more metrics to get information about how many pages we are seeing for each query. |
how? EDIT: I think this refers to the v7 schema which had an index "Userid:day - hash(metric name) -> metric name". See #416 (comment) |
Step towards fixing #416
Create v8Schema with series index (#430)
This index will be used to fetch series from the index. It will replace the metric name index and we will make the switch after the next steps are complete.
Decided to encode the model.Metric as JSON because it already implements the json.Unmarshaler interface.
Used sha256 for identifying the series as the variant of fnv64a which if used for fingerprints is not unique enough for indexing and using sha256 keeps it simple (we don't have to deal with collision logic).
Move iterators inside chunk store using MergeSeriesIterator (#438)
Use LazySeriesIterator with fuzzy metric name queries
There is not opportunity for our iterator to return an error if values are request, so we are just returning no values for now.
Test plan:
count({__name__=~".+"}) by (__name__)
ran locally