Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(blooms): Fully deduplicate chunks from FilterChunkRef responses #12807

Merged
merged 9 commits into from
Apr 30, 2024

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Apr 26, 2024

What this PR does / why we need it:

This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway.

Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the Removals (chunks) from the v1.Output are not guaranteed to be sorted.

When comparing ShortRefs, both From, Through, and Checksum need to be used.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@chaudum chaudum force-pushed the chaudum/full-dedupe branch from 2a084f3 to 3d4c66c Compare April 29, 2024 07:41
@pull-request-size pull-request-size bot added size/L and removed size/M labels Apr 29, 2024
@chaudum chaudum force-pushed the chaudum/full-dedupe branch from 3281995 to 95b011b Compare April 29, 2024 19:22
@chaudum chaudum marked this pull request as ready for review April 29, 2024 19:52
@chaudum chaudum requested a review from a team as a code owner April 29, 2024 19:52
chaudum added 6 commits April 30, 2024 08:12
for the case when chunks have same from/through time, but different
checksums

Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum force-pushed the chaudum/full-dedupe branch from 75517c9 to 628f1c2 Compare April 30, 2024 06:13
@chaudum chaudum added type/bug Somehing is not working as expected backport k200 labels Apr 30, 2024
@grafanabot
Copy link
Collaborator

This PR must be merged before a backport PR will be created.

1 similar comment
@grafanabot
Copy link
Collaborator

This PR must be merged before a backport PR will be created.

@@ -383,6 +383,8 @@ func (g *Gateway) consumeTask(ctx context.Context, task Task, tasksCh chan<- Tas
case <-ctx.Done():
// do nothing
default:
// chunks may not be sorted
sort.Slice(res.Removals, func(i, j int) bool { return res.Removals[i].Less(res.Removals[j]) })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We tend to use sort very often and I'm worried we end up sorting already sorted slices (quicksort can degrade to O(n^2) for sorted inputs). Should we check if the list is sorted before sorting it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

afaik, pdqsort which is used by sort.Slice() prevents the O(n^2) worst case scenario from quicksort and can do O(n log n) instead (which is still bad).

if a.Less(b) {
result = append(result, a)
i++
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nite: more idiomatic to add continue inside the if than having an else.

chaudum added 3 commits April 30, 2024 11:19
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Copy link
Contributor

@salvacorts salvacorts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but leaving a nit here

// chunks may not be sorted
sort.Slice(res.Removals, func(i, j int) bool { return res.Removals[i].Less(res.Removals[j]) })
// chunks may not always be sorted
if !slices.IsSortedFunc(res.Removals, func(a, b v1.ChunkRef) int { return a.Cmp(b) }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we end up using the same cmp funtion all over the places. Would be nice to have another function next to func (r *ChunkRef) Cmp(other ChunkRef) int that does this:

func CmpChunkRefs(a, b ChunkRef) int { return a.cmp(b) }

So we can use IsSortedFunc and SortFunc as:

                        if !slices.IsSortedFunc(a.Refs, CmpChunkRefs) {
				slices.SortFunc(a.Refs, CmpChunkRefs)
			}
			if !slices.IsSortedFunc(b.Refs, CmpChunkRefs) {
				slices.SortFunc(b.Refs, CmpChunkRefs)
			}

@chaudum chaudum merged commit a0f358f into main Apr 30, 2024
59 checks passed
@chaudum chaudum deleted the chaudum/full-dedupe branch April 30, 2024 11:33
grafanabot pushed a commit that referenced this pull request Apr 30, 2024
…12807)

This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway.

Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the Removals (chunks) from the v1.Output are not guaranteed to be sorted.

When comparing ShortRefs, both From, Through, and Checksum need to be used.

Signed-off-by: Christian Haudum <[email protected]>
(cherry picked from commit a0f358f)
chaudum added a commit that referenced this pull request Apr 30, 2024
chaudum added a commit that referenced this pull request May 3, 2024
shantanualsi pushed a commit that referenced this pull request May 6, 2024
…12807)

This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway.

Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the Removals (chunks) from the v1.Output are not guaranteed to be sorted.

When comparing ShortRefs, both From, Through, and Checksum need to be used.

Signed-off-by: Christian Haudum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport k200 size/L type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants