Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve performance of first_over_time and last_over_time queries by sharding them #11605

Merged
merged 39 commits into from
May 13, 2024

Conversation

jeschkies
Copy link
Contributor

@jeschkies jeschkies commented Jan 8, 2024

What this PR does / why we need it:
With the introduction of sending query plans with custom nodes to the querier we can now shard first_over_time queries.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copy link
Contributor

github-actions bot commented Jan 8, 2024

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-61a4205 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-61a4205 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

@jeschkies jeschkies assigned cstyan and unassigned cstyan Jan 9, 2024
@jeschkies jeschkies requested a review from cstyan January 9, 2024 17:11
@jeschkies jeschkies force-pushed the karsten/first-over-time branch from e70ccb5 to 8f589fe Compare January 11, 2024 09:45
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extended this to include implementing last_over_time here.

My only comments would be:

  • that renaming first_over_time.go to something more generic, or first_last_over_time.go like I did in my branch, probably makes sense
  • some comments would be useful (such as where I commented on the PR) for first time readers of the code

pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
T: ts,
//T: ts,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can do it like this. What do you think, @cstyan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code on main overrides he timestamps in each vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the timestamp from the actual vector samples more accurate than the one got from stepEvaluator.Next()?

Can you help me understand why it won't make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense for us. I'm just not sure some other component is relying on this overridden timestamp.

pkg/logql/first_over_time.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for addressing the feedbacks.

Still this one applies I think.

I see you added tests in TestMappingEquivalence. But do you think should be also added on other tests like TestRangeMappingEquivalence on downstream engine tests? Rationale being the later tests for more cases of different range aggregation.

approving to unblock.

@cstyan
Copy link
Contributor

cstyan commented Feb 20, 2024

hey @kavirajk, actually something fun I learned this past week when finally finding the cause of the issues with our quantile_over_time, TestMappingEquivalence is actually testing the results equivalency for each query as a range query. What we don't have (AFAICT) is tests for the results equivalency for instant queries. The test names here are actually a bit misleading; TestMappingEquivalence is testing for equivalency with sharded queries while TestRangeMappingEquivalence is testing for equivalency with time range split queries.

@cstyan cstyan changed the title Shard first_over_time and last_over_time queries. feat: improve performance of first_over_time and last_over_time queries by sharding them Apr 23, 2024
@cstyan
Copy link
Contributor

cstyan commented Apr 23, 2024

@jeschkies any opinions on merging this as is vs as behind a feature flag? given that it's not being executed via a probabilistic data structure I don't think we need a feature flag

@cstyan cstyan merged commit f66172e into grafana:main May 13, 2024
58 checks passed
cstyan added a commit that referenced this pull request May 13, 2024
…ueries by sharding them (#11605)

Signed-off-by: Callum Styan <[email protected]>
Co-authored-by: Callum Styan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants