Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes store duplicate label names and values. #1790

Merged
merged 3 commits into from
Nov 14, 2019

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Nov 5, 2019

When we query label names or values across multiple stores from different schema period we can lose the unicity or ordering.

This changes sort and filter label in the composite store. I've also added some test to verify the output.

@cyriltovena cyriltovena changed the title Fixes store label name and values duplicate. Fixes store duplicate label names and values. Nov 5, 2019
@cyriltovena
Copy link
Contributor Author

see grafana/loki#1220.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure what's going on here.
Looks like it was a bug in recently added support to fetch labels?

pkg/chunk/composite_store_test.go Outdated Show resolved Hide resolved
pkg/chunk/chunk_store_utils.go Show resolved Hide resolved
pkg/chunk/chunk_store.go Show resolved Hide resolved
pkg/chunk/composite_store.go Outdated Show resolved Hide resolved
@cyriltovena
Copy link
Contributor Author

cyriltovena commented Nov 7, 2019

I'm not very sure what's going on here.

Have you read the PR description ? The composite store needs to sort and make labels or values unique.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @cyriltovena. To my understanding of the problem, changes look good to me. I left a couple of last minor comments.

pkg/chunk/composite_store_test.go Outdated Show resolved Hide resolved
pkg/chunk/chunk_store.go Outdated Show resolved Hide resolved
pkg/ingester/flush.go Outdated Show resolved Hide resolved
for _, entry := range entries {
_, labelValue, _, _, err := parseChunkTimeRangeValue(entry.RangeValue, entry.Value)
if err != nil {
return nil, err
}
result = append(result, string(labelValue))
result = appendUniqueStrings(result, values, string(labelValue))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change is (now) unrelated to the bugfix, I would prefer it goes in a different commit with its own explanation.

You may want to make the same change in parseIndexEntries(), which would let you remove uniqueStrings().

It's worth checking the benchmark numbers and/or making sure the benchmarks are using a realistic data set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch ! I'll remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

voila !

Copy link
Contributor Author

@cyriltovena cyriltovena Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before :

pkg: github.com/cortexproject/cortex/pkg/chunk
BenchmarkParseIndexEntries500-16      	    4441	    259747 ns/op	   96427 B/op	    1503 allocs/op
BenchmarkParseIndexEntries2500-16     	     891	   1363771 ns/op	  482397 B/op	    7503 allocs/op
BenchmarkParseIndexEntries10000-16    	     206	   5654314 ns/op	 1928897 B/op	   30005 allocs/op
BenchmarkParseIndexEntries50000-16    	      39	  30095557 ns/op	 9607707 B/op	  150006 allocs/op
PASS
ok  	github.com/cortexproject/cortex/pkg/chunk	8.054s

after:

pkg: github.com/cortexproject/cortex/pkg/chunk
BenchmarkParseIndexEntries500-16      	    3678	    318019 ns/op	  130810 B/op	    1522 allocs/op
BenchmarkParseIndexEntries2500-16     	     747	   1632040 ns/op	  612466 B/op	    7563 allocs/op
BenchmarkParseIndexEntries10000-16    	     172	   6779048 ns/op	 2441477 B/op	   30219 allocs/op
BenchmarkParseIndexEntries50000-16    	      31	  36258715 ns/op	11602340 B/op	  151867 allocs/op
PASS
ok  	github.com/cortexproject/cortex/pkg/chunk	8.108s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding half duplicate in the mix:

before:

pkg: github.com/cortexproject/cortex/pkg/chunk
BenchmarkParseIndexEntries500-16      	    4485	    262828 ns/op	   96346 B/op	    1503 allocs/op
BenchmarkParseIndexEntries2500-16     	     838	   1379892 ns/op	  482316 B/op	    7503 allocs/op
BenchmarkParseIndexEntries10000-16    	     201	   5907672 ns/op	 1928821 B/op	   30005 allocs/op
BenchmarkParseIndexEntries50000-16    	      39	  30373427 ns/op	 9607658 B/op	  150007 allocs/op
PASS
ok  	github.com/cortexproject/cortex/pkg/chunk	8.056s

after:

pkg: github.com/cortexproject/cortex/pkg/chunk
BenchmarkParseIndexEntries500-16      	    4454	    265512 ns/op	  109322 B/op	    1514 allocs/op
BenchmarkParseIndexEntries2500-16     	     878	   1356005 ns/op	  526739 B/op	    7536 allocs/op
BenchmarkParseIndexEntries10000-16    	     211	   5586487 ns/op	 2107155 B/op	   30116 allocs/op
BenchmarkParseIndexEntries50000-16    	      37	  29155700 ns/op	10205072 B/op	  150868 allocs/op
PASS
ok  	github.com/cortexproject/cortex/pkg/chunk	7.917s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like if there is duplicates the new append with the map unique values is faster, no surprise here. However if there is no duplicates the old version is faster.

Memory wise I think the increase is negligible. (25B per op).

I will submit the benchmark now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussions with Goutham, let's not take any risk and leave the parseIndexEntries as is. Other functions (Fetching labels) are not used by Cortex and should be fine to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ! I think we should be good to go here.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cyriltovena for addressing my feeback. Changes LGTM!

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gouthamve gouthamve merged commit 37c1f17 into cortexproject:master Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants