-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'sum by' inconsistent when grouped by multiple labels (Loki as Prometheus data source) #2334
Comments
Hello @Kayakflo ! Let's start with updating your Loki instance to latest, we have fixed couple of those bugs recently. If you can still the same, I'll dig into it more. Thanks ! |
Hi @cyriltovena , Thank you very much for your fast response! Please let me know if I can provide more information here! |
Well it's on my list now. If you get a chance try latest too. |
Sorry for misreading your first response. Thank you for noting it down! |
Sorry about that, I guess I always ordered all my labels correctly and never stumble upon this one. |
Thanks again @Kayakflo for the detailed issue! This was a great bug to find and fix! |
Awesome, thanks a lot for your help and the quick responses! |
Environment:
OS: Ubuntu 18.04
Docker: 5:19.03.12
Loki: 1.4.1
Promtail: 1.4.1
Grafana: 6.7.3
Describe the bug
Disclaimer: Loki is the first time I get in touch with LogQL and Prometheus functions.
This might not be a bug after all, but I have not found any online resource that helped me understand the observed behavior.
I am using Loki as a Prometheus data source in Grafana to display the rate of certain messages and add an alert to it.
The following behavior was visible in multiple different queries, so here is just one example query in which I want to monitor the rate of messages that include the label
rh_unknown="true"
:sum by (rh_customer, rh_stage, severity) (rate({app="my-app",rh_unknown="true",rh_customer=~"customerA|customerB",severity=~"$Severity", rh_stage=~"$Stage"}[15s])) * 15
For this example you can assume, that all variables were set to "All".
The query above works fine, you can find its output in the attached file 'customer-stage-severity.json'.
The data displayed matches the actual log lines.
If I change the order inside
sum by
to the following, the displayed amount of messages per group change everytime I run the query:sum by (rh_stage, severity, rh_customer) (rate({app="my-app",rh_unknown="true",rh_customer="customerA|customerB",severity=~"$Severity", rh_stage=~"$Stage"}[15s])) * 15
It does look as if Loki (or Grafana?) do no longer know for sure which type of severity each message has, which leads to changed distributions every time. The variation happens more frequently, the more severity types are present.
You can find the output in the attached files
stage-severity-customer-1
andstage-severity-customer-2
which is the same query run twice in a row.Things look even more concerning if I change the order to this:
sum by (severity, rh_stage, rh_customer) (rate({app="my-app",rh_unknown="true",rh_customer="customerA|customerB",severity=~"$Severity", rh_stage=~"$Stage"}[15s])) * 15
This time, data is not only switching beween severities among the same customer but also between customers.
In that sense it does look like totally randomized data on each query, which is supported by colors changes in Grafana.
The variation happens more frequently, the more customers are selected.
You can find the output in the attached files
severity-stage-customer-1
andseverity-stage-customer-2
which is the same query run twice in a row.The total sum of all counters remains steady and is correct, so no data is added or removed between queries.
Also the distribution over time remains steady and is correct.
To Reproduce
sum by
Expected behavior
Unless I have just not found the right documentation, order or aggretation by
sum by
should have no influence on results and remain constant after changing.Screenshots, Promtail config, or terminal output
I have attached named responses in a ZIP archive.
Can provide screenshots if needed.
20200709 - Loki Findings.zip
The text was updated successfully, but these errors were encountered: