Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval parameter for log queries #1779

Closed
slim-bean opened this issue Mar 6, 2020 · 7 comments
Closed

Interval parameter for log queries #1779

slim-bean opened this issue Mar 6, 2020 · 7 comments
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.

Comments

@slim-bean
Copy link
Collaborator

Recently I created #1773 and after merging realized that Grafana defaults to sending a step with log queries. I would consider this a bug but be that as it may keeping #1773 would result in some pretty nasty unexpected behavior when using Grafana against Loki (even if we fixed Grafana old versions would still be sending this parameter)

#1773 was reverted by #1777 however I would still like to add this feature and I can think of a couple ways to do it:

  • Use a new parameter, I am suggesting log_step
  • Use a new header such as: Accept: application/json;version=2

Some more thoughts:

I think the step functionality is worth having in Loki. While Loki is a logs database, more specifically it is a string database extended from a metric database (Cortex) and I think it's a valid use case to store string style metrics in Loki (e.g. GPS coordinates, or a more complex json doc of metrics), as such the step parameter is of great value at query time to allow range queries which return sane result sizes over different time periods.

At this point it's much too dangerous to just start using the "step" for log style queries, we use it correctly for metric style queries but previously it was just ignored by Loki for log queries.

I think of the two approaches mentioned above, I prefer the new parameter log_step for a few reasons, first it makes URL's easy to write and send (setting a header is tedious), second within our code there were also assumptions made about the use of step only being for metric queries and as a result it has specific rules applied to it which don't apply to log queries. #1773 handled this by parsing the query very early to determine the type before applying these rules to the incoming step parameter. Using a new parameter would make it more obvious which rules are applied and why within the Loki codebase.

Looking for thoughts/feedback.

@slim-bean
Copy link
Collaborator Author

I've been waffling about the idea of supporting both actually, that way if clients want to be able to issue queries using step as intended for both log and metric queries they can do so by setting the header correctly. I worry that supporting both would be confusing however

@rfratto
Copy link
Member

rfratto commented Mar 6, 2020

Personally between the two, I think the header is the right way to go. In general, I agree that setting a header is a little tedious but tooling helps with making this easy. Arguably, I'd say URLs are already not easy to write if your query is sufficiently complicated.

Another option might be to introduce the functionality as another API endpoint, something like /loki/api/v1/query_logs (as opposed to query_range). We could also introduce /loki/api/v2/query_range, although I'm less of a fan here, even though your initial PR was technically backwards incompatible.

@slim-bean
Copy link
Collaborator Author

After some discussions and thinking about this for a bit. I'm moving farther away from using step or anything similar to step.

It seems something more like interval might be appropriate here, or perhaps sample as we are basically looking to sample the values at a given interval.

@stale
Copy link

stale bot commented Apr 22, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 22, 2020
@cyriltovena cyriltovena removed the stale A stale issue or PR that will automatically be closed. label Apr 23, 2020
@slim-bean slim-bean changed the title Handling step parameter for log queries interval parameter for log queries Apr 27, 2020
@slim-bean slim-bean changed the title interval parameter for log queries Interval parameter for log queries Apr 27, 2020
@slim-bean slim-bean added the keepalive An issue or PR that will be kept alive and never marked as stale. label Apr 27, 2020
@slim-bean
Copy link
Collaborator Author

#1965 adds support for an interval query parameter which implements the original outlined use case:

** Downsampling of a log stream to return results >= requested interval **

Note, it does not fill in missing values, the logic just looks for the next entry which is >= interval.

After some discussions with @cyriltovena we both agreed there is probably a better store here for implementing this via LogQL, something like a function entry_over_time or entry_at_time where you can specify entry_at_time({job="app"}[1m]) and a step of say 1m which would evaluate the function every 1m and look back at most 1m for an entry and just return a single entry.

In this implementation step can be used to accomplish what interval is accomplishing in this PR and there would no longer be a need for a dedicated parameter to perform this function.

This feels like a more natural solution to the problem within the LogQL language.

That being said, we are going to merge #1965 in an experimental state to get feedback and make this capability available now.

Any users of interval please add your use cases to this issue! This will help us decide what to do with this in the future.

@slim-bean
Copy link
Collaborator Author

Will revist this at a later date, closing as it's not relevant at this time.

@jfolz
Copy link
Contributor

jfolz commented Mar 13, 2022

My use case for interval:
An intermediate API service that mimics Prometheus/Simplejson datasources. This service is currently necessary since LogQL can only extract one numeric key per query and said key needs to be known ahead of time. Our logs contain and a variable number of arbitrary keys. We also generate XY-plots from logs where the X value is automatically selected from the available keys according to some rules (special keys, monotonicity, etc).

Some way to limit the number of samples per time is necessary to make sure that even if the line limit is reached the full time range of the query is covered. We currently implement this with a dynamic values for the interval parameter. Sadly it doesn't really work right now.

Edit: turns out interval is broken for our use case, see #5613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.
Projects
None yet
Development

No branches or pull requests

4 participants