Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LazySeriesIterator with fuzzy metric name queries #442

Merged
merged 38 commits into from
Jul 12, 2017

Conversation

aaron7
Copy link
Contributor

@aaron7 aaron7 commented May 25, 2017

Step towards fixing #416

Create v8Schema with series index (#430)

This index will be used to fetch series from the index. It will replace the metric name index and we will make the switch after the next steps are complete.

Decided to encode the model.Metric as JSON because it already implements the json.Unmarshaler interface.

Used sha256 for identifying the series as the variant of fnv64a which if used for fingerprints is not unique enough for indexing and using sha256 keeps it simple (we don't have to deal with collision logic).

Move iterators inside chunk store using MergeSeriesIterator (#438)

  • Move iterators inside the chunk store
  • Create MergeSeriesIterator which is created from multiple iterators and implements SeriesIterator.
  • Move iterator code to util as it is used across packages and cyclic imports would exist in some cases if it was kept in querier.

Use LazySeriesIterator with fuzzy metric name queries

There is not opportunity for our iterator to return an error if values are request, so we are just returning no values for now.

Test plan:

  • count({__name__=~".+"}) by (__name__) ran locally

@aaron7 aaron7 force-pushed the 416-lazy-series-when-no-metric-name branch from 7163bcc to abe573d Compare May 25, 2017 10:11
@aaron7 aaron7 changed the title [WIP] Use LazySeriesIterator with fuzzy metric name queries Use LazySeriesIterator with fuzzy metric name queries May 25, 2017
@aaron7 aaron7 requested a review from tomwilkie May 25, 2017 16:30
@aaron7 aaron7 force-pushed the 416-lazy-series-when-no-metric-name branch from 249b65e to affeca8 Compare May 30, 2017 13:30
@@ -151,17 +152,35 @@ func (c *Store) calculateDynamoWrites(userID string, chunks []Chunk) (WriteBatch
}

// Get implements ChunkStore
func (c *Store) Get(ctx context.Context, from, through model.Time, allMatchers ...*metric.LabelMatcher) ([]Chunk, error) {
func (c *Store) Get(ctx context.Context, from, through model.Time, allMatchers ...*metric.LabelMatcher) ([]local.SeriesIterator, error) {
if through < from {
return nil, fmt.Errorf("invalid query, through < from (%d < %d)", through, from)
}

filters, matchers := util.SplitFiltersAndMatchers(allMatchers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be pushed into getMetricNameIterators and getFuzzyMetricLazySeriesIterators.

return c.lookupChunksByMetricName(ctx, from, through, matchers, metricNameMatcher.Value)
}

func (c *Store) getFuzzyMetricLazySeriesIterators(ctx context.Context, from, through model.Time, filters []*metric.LabelMatcher, matchers []*metric.LabelMatcher, metricNameMatcher *metric.LabelMatcher) ([]local.SeriesIterator, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better names getSeriesIterators?

if ok && !metricNameMatcher.Match(metricName) {
skippedMetricNames++
// Apply metricNameMatcher filter
if metricNameMatcher != nil && !metricNameMatcher.Match(metric[metricNameMatcher.Name]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not redundant with the checks below?

Also, if you push the util.SplitFiltersAndMatchers into getMetricNameIterators, this section will just become:

for _, matcher := range matchers {
    if !... {...}
}

And you won't need all three checks.

@@ -236,6 +237,7 @@ func TestSchemaHashKeys(t *testing.T) {
const (
MetricNameRangeValue = iota + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically you do

const (
   _ iota
   foo
   bar
)

If you want to start at 1.

"flip": "flop",
},
"KrbXMezYneba+o7wfEdtzOdAWhbfWcDrlVfs1uOCX3M",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (for my sanity) add a test case for the same metric, with the pairs in a different order?

@@ -517,7 +532,7 @@ func (d *Distributor) queryIngesters(ctx context.Context, ingesters []*ring.Inge
}
fpToSampleStream[fp] = mss
}
mss.Values = util.MergeSamples(mss.Values, ss.Values)
mss.Values = util.MergeSampleSets(mss.Values, ss.Values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such large PRs I generally try and avoid renamings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed it to MergeSampleSets to keep it consistent with MergeNSampleSets and it's only used in 2 places. I believe this was in a previous PR but we have now agreed to merge all PR's together.

@@ -0,0 +1,142 @@
package util
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this is copied from the prometheus code, and the PR to make it public?

@tomwilkie
Copy link
Contributor

Let close #430 and #438 and put all the changes in one PR (this one). That way we won't miss any comments.

Done the first pass - looks good! Not obvious errors, just a bunch of nits.

Have you tested locally? I didn't see any changes to the checked-in k8s config to enable this.

@tomwilkie
Copy link
Contributor

tomwilkie commented Jun 9, 2017 via email

return nil, ErrNoMetricNameNotSupported
}

// v8Entries is the same as v7Entries however with a series index instead of a metric name index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v8Entries is the same as v7Entries
does not seem consistent with:
type v8Entries struct { v6Entries }
which one is correct? is just a typo in the comment?

Copy link
Contributor Author

@aaron7 aaron7 Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v8Entries embedding v6Entries is correct. The comment says it's the same as v7 because we are replacing the index v7 adds. However this comment is confusing because v8Entries GetWriteEntries is completely different.

I will update the comment - thanks


"fmt"

"github.com/prometheus/common/model"
)

func metricSeriesID(m model.Metric) string {
h := sha256.Sum256([]byte(m.String()))
return string(encodeBase64Bytes(h[:]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a special reason we use base64 here vs., for example, hex.EncodeToString which:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. It's slightly more efficient to store (4 chars per 3 bytes instead of 2 chars per 2 bytes) and is seen more of an UID than a hash. Base64 is human readable and we also encode to base64 for the dynamo api (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.LowLevelAPI.html).

func MergeNSamples(sampleSets ...[]model.SamplePair) []model.SamplePair {
l := len(sampleSets)
switch l {
case 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I've missed it, and even though that's trivial, I don't think there is an unit test covering this case statement.
If I understood well, we'd need this additional test case:

		{
			sampleSets: [][]model.SamplePair{},
			expected:   []model.SamplePair{},
		},

}

// MergeNSampleSets merges and dedupes n sets of already sorted sample pairs.
func MergeNSampleSets(sampleSets ...[]model.SamplePair) []model.SamplePair {
Copy link
Contributor

@marccarre marccarre Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are values within a []model.SamplePair guaranteed to be sorted? (the comment above seems to indicate so)

  2. If so, would it be worth implementing:

    • a n-way merge, instead of
    • this 2-way merge for which we then enqueue results (and repeat)?

    It feels like we would re-allocate quite a few arrays and process model.SamplePairs more than once for large number of sampleSets.

Copy link
Contributor

@marccarre marccarre Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this could be useful, here is an example of doing the n-way merge using:

  • a "dedup iterator" on
  • a "merging iterator" on
  • a collection of "iterators".

Lack of generics make this pretty ugly in Go (feedback welcome as I'm a beginner at Go), however:

  • a similar approach could be done with pure arrays/slices;
  • the "dedup" logic and "merging" logic could be merged for better performance.

The thing about it is that it is nicely composable and the overall complexity is O(n.log(k)),

  • k being the number of iterators merged, and
  • n the total number of items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we are going to keep with the merge approach for now in this PR because the most common case is only a few sample sets and the code is simpler.

err = it.createSampleSeriesIterator()
})
if err != nil {
// TODO: Handle error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to handle this error before this PR gets merged? (I am just asking given the TODO)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR. I would like to discuss this - I'm unsure how to approach this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure as well right now... and it looks like it could fail in many ways 🙁

for _, it := range iterators {
ss := &model.SampleStream{
Metric: it.Metric().Metric,
Values: it.RangeValues(in),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to point out: RangeValues could return nil. Does it matter? What do we do with these SampleStreams afterwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we only pass these to ToQueryResponse but we don't check for nil there, so we could potentially "panic".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look into this and it doesn't look like it will panic because we are ranging over a nil slice which does 0 iterations. https://github.com/golang/go/blob/master/doc/go_spec.html#L5011
https://play.golang.org/p/4vvbbHMKUo


// ValueAtOrBeforeTime implements the SeriesIterator interface.
func (it SampleStreamIterator) ValueAtOrBeforeTime(ts model.Time) model.SamplePair {
// TODO: This is a naive inefficient approach - in reality, queries go mostly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something to resolve as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved from pkg/querier/iterator.go. We still use SampleStreamIterator in some cases and the comment is still relevant. We still need improve the iterators to iterate through chunks - this PR focuses on allowing queries which do not fetch samples to execute efficiently.

@aaron7
Copy link
Contributor Author

aaron7 commented Jun 13, 2017

@tomwilkie

Is this gtg now? I'll review this weekend if so.

Yes, it's ready for review if you get some time. Thanks :)

@tomwilkie
Copy link
Contributor

Great! I'll look at it tomorrow afternoon.


return iterators, nil
}

// ChunksToMatrix converts a slice of chunks into a model.Matrix.
func ChunksToMatrix(chunks []Chunk) (model.Matrix, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used in tests now, could be moved into them.

// Fetch chunk descriptors (just ID really) from storage
chunks, err := c.lookupChunksByMatchers(ctx, from, through, matchers)
func (c *Store) getMetricNameIterators(ctx context.Context, from, through model.Time, allMatchers []*metric.LabelMatcher, metricName model.LabelValue) ([]local.SeriesIterator, error) {
chunks, err := c.getMetricNameChunks(ctx, from, through, allMatchers, metricName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to move the contents of getMetricNameChunks into here, as this function seems overly short. I see that it is used from the tests, so you'd need to do some work there, in which case it might not be worth the effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, once getMetricNameIterators converts the chunks from getMetricNameChunks into iterators, we cannot access the chunks for testing through the iterator interface. We would like access to these to test whether the right chunks are fetched or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't test at the interval level as we are concerned about which chunks are fetched to create the iterators. We could expose the chunks, but it's a prometheus interface and it doesn't seem like the right thing to do, exposing something that is not related. This currently seems like the best way to get around testing constraints.

func (it *LazySeriesIterator) createSampleSeriesIterator() error {
metricName, ok := it.metric[model.MetricNameLabel]
if !ok {
return fmt.Errorf("series does not have a metric name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push this error into the creation of the LazySeriesIterator?

return fmt.Errorf("series does not have a metric name")
}

ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a ticket upstream to add contexts and error returns to iterators. Will probably have to wait until post v2 though.

@aaron7 aaron7 force-pushed the 416-lazy-series-when-no-metric-name branch from 20f2e43 to 8b31b36 Compare July 11, 2017 13:18
@aaron7
Copy link
Contributor Author

aaron7 commented Jul 11, 2017

Rebased - will test before merging. Thanks for the review @tomwilkie .

@aaron7 aaron7 force-pushed the 416-lazy-series-when-no-metric-name branch from 8b31b36 to aa37aa9 Compare July 11, 2017 14:55
@aaron7
Copy link
Contributor Author

aaron7 commented Jul 12, 2017

Tested locally with http_request_duration_microseconds, up and count({__name__=~".+"}) by (__name__).

We're seeing Error: QueryPages error: table=cortex, err=SerializationError: failed decoding JSON RPC response caused by: header line too long errors locally for the fuzzy queries when we have a few chunks in DynamoDB, however we believe this could be a problem with localstack. The agreed plan is to roll out to dev and test against DynamoDB and file an issue with localstack if it is inconsistent.

I will also be adding some more metrics to get information about how many pages we are seeing for each query.

@aaron7 aaron7 merged commit aea9137 into master Jul 12, 2017
@aaron7 aaron7 deleted the 416-lazy-series-when-no-metric-name branch July 12, 2017 16:05
@bboreham bboreham mentioned this pull request Mar 4, 2018
@bboreham
Copy link
Contributor

bboreham commented Mar 4, 2018

It will replace the metric name index

how?

EDIT: I think this refers to the v7 schema which had an index "Userid:day - hash(metric name) -> metric name". See #416 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants