Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki count_over_time null handling goes wrong #4420

Closed
BitProcessor opened this issue Sep 22, 2021 · 20 comments · Fixed by #4457
Closed

Loki count_over_time null handling goes wrong #4420

BitProcessor opened this issue Sep 22, 2021 · 20 comments · Fixed by #4457
Assignees
Labels
component/logql stale A stale issue or PR that will automatically be closed. type/bug Somehing is not working as expected

Comments

@BitProcessor
Copy link

What happened:
When running the following Loki query in an Explore window:

sum(count_over_time({namespace="my-namespace", container="my-container"} | json | __error__ = "" | level="ERROR" [$__range]))

(or sum(count_over_time({namespace="my-namespace", container="my-container"} | json | level="ERROR" | __error__ = "" [$__range])) )

where the resulting set is empty (as in: no logs found with level ERROR in the selected range), an error occurs:

Both Safari 15.0 and FF 92.0 throw a JS error:
Screenshot 2021-09-22 at 10 29 47

Screenshot 2021-09-22 at 10 32 51

Resulting in (Safari vs Firefox):
Screenshot 2021-09-22 at 10 49 16

Screenshot 2021-09-22 at 10 49 32

Adding the same query to a Stats panel, results in:
image (1)
with the error in the left top, showing the same error as mentioned above.

What you expected to happen:
A sum over a count of logs should result in 0 when no logs are found?
(instead of null)

How to reproduce it (as minimally and precisely as possible):
Run a Loki sum(count_over_time()) over a set of logs that returns no results in the select range

Anything else we need to know?:

Environment:

  • Grafana version: 8.1.2
  • Data source type & version: Loki 2.3
  • OS Grafana is installed on: MacOS 11.6
  • User OS & Browser: mentioned above
  • Grafana plugins: none
  • Others:
@BitProcessor
Copy link
Author

FYI: originally posted in the #loki Slack channel: https://grafana.slack.com/archives/CEPJRLQNL/p1632298704148900

@ifrost
Copy link

ifrost commented Sep 27, 2021

Grafana version: 8.1.2
Data source type & version: Loki 2.3

@BitProcessor I wasn't able to reproduce it with Loki 2.3.0 but it happens with latest Loki docker image. It's caused by difference in Loki response - result property is null, not []. @grafana/loki-team is it expected behaviour?

Example api call: http://localhost:3100/loki/api/v1/query_range?query=count_over_time({foo=%22bar%22}[1m])

Results:
2.3.0:

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 0,
        "linesProcessedPerSecond": 0,
        "totalBytesProcessed": 0,
        "totalLinesProcessed": 0,
        "execTime": 0.0029421
      },
      "store": {
        "totalChunksRef": 0,
        "totalChunksDownloaded": 0,
        "chunksDownloadTime": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 0,
        "totalLinesSent": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }
    }
  }
}

latest:

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": null,
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 0,
        "linesProcessedPerSecond": 0,
        "totalBytesProcessed": 0,
        "totalLinesProcessed": 0,
        "execTime": 0.0099657
      },
      "store": {
        "totalChunksRef": 0,
        "totalChunksDownloaded": 0,
        "chunksDownloadTime": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 0,
        "totalLinesSent": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }
    }
  }
}

@cyriltovena
Copy link
Contributor

No it's not expected.

@ifrost
Copy link

ifrost commented Sep 27, 2021

No it's not expected.

Can I move this issue to Loki's repo?

@ifrost ifrost transferred this issue from grafana/grafana Oct 6, 2021
kavirajk added a commit that referenced this issue Oct 11, 2021
Query http handler is wrapped with so many middleware for different purposes.
We want `serverutil.JSONMiddleware` to be wrapped on top of `serverutil.NewPrepopulatedMiddleware`,
reason being, we `PrepopulatedMiddleware.next` to have proper `Content-Type` set with `application/json; charset=UTF-8`.
without that, empty array response are convertion into `null` value in the HTTPResponse.
Fixes: #4420

Signed-off-by: Kaviraj <[email protected]>
@kavirajk
Copy link
Contributor

@BitProcessor @ifrost Can you confirm you seeing this issue when running Loki in single binary mode?

This PR #4457 fixes it!

@BitProcessor
Copy link
Author

BitProcessor commented Oct 12, 2021

@kavirajk @cyriltovena see #4457 (comment)
I was encountering the issue in distributed mode

@kavirajk kavirajk reopened this Oct 13, 2021
@kavirajk
Copy link
Contributor

kavirajk commented Oct 13, 2021

@BitProcessor Thanks for confirming!. It's odd though that I couldn't reproduce the issue with v2.3.0 (but only on the tip of the main branch that also only on single-binary mode). I re-opened the issue.

I will dig deep to see if I missed anything!.

@ifrost
Copy link

ifrost commented Oct 13, 2021

@kavirajk it was happening for me on docker image when running a query with a narrow time range. I just checked the latest docker image and it's working fine 👍 Just to double check I also quickly compared with some older image (7 days old main-04ab08d) and same query is failing there. Thanks!

@stale
Copy link

stale bot commented Mar 3, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 3, 2022
@stale stale bot closed this as completed Apr 18, 2022
@BitProcessor
Copy link
Author

If this was fixed, a bit of feedback might have been nice?

@darxriggs
Copy link
Contributor

This was not fixed. It was automatically closed by the stale bot. It should be reopened.

@slim-bean
Copy link
Collaborator

Hey @darxriggs, the original issue was reported on quite old versions of Grafana and Loki, are you still seeing it on the latest versions of both?

@BitProcessor
Copy link
Author

BitProcessor commented Apr 19, 2022

I tested it this morning on Grafana Cloud (8.4.6) and the result is now reported as an empty array instead of null :
Screenshot 2022-04-19 at 11 42 39
Which fixes the problem.

My point was that it would have been nice to see some feedback in a ticket, instead of just a bot closing it because of staleness.

@darxriggs
Copy link
Contributor

Loki 2.5.0 in single binary mode returns "result": [] in case of no matching logs.

It have tried queries like:

  • count_over_time(...)
  • sum(count_over_time(...))
  • sum by (...) (count_over_time(...))

After reading the details of this issue again, I understand that returning an empty array instead of null is the expected output. So I can confirm that this is fixed.

I have had also #5074 in mind which was already mentioned here. It is about the same function but a different expected output.

@slim-bean
Copy link
Collaborator

hey @darxriggs @BitProcessor thanks for confirming the fix!

Apologies about the stalebot, we try our best to keep up but sometimes we don't get to everything. The stalebot does help us when issues go abandoned but it's an imperfect science.

@rafaelpirolla
Copy link

Is there a way to make the resulting [] be displayed as 0 on a Time Series widget?

@BitProcessor
Copy link
Author

@rafaelpirolla Try adding a second Loki query that outputs a value of 0 at all times, next calculate the result and replace the final value.

Screenshot 2022-05-03 at 14 16 08
Screenshot 2022-05-03 at 14 16 12

@jonathanrgreene
Copy link

This doesn't work when showing multiple lines, e.g. with the 'by' operator.

@PRIHLOP
Copy link

PRIHLOP commented Aug 31, 2022

I think too that need to count no data as null, because it can be used in alerting for no data from pods/containers.
For example:

(sum by(pod,container)(count_over_time({namespace="app"} [60m])) != bool 0) < 1

In this case, this query will return containers and pods with no logs from it.

@PRIHLOP
Copy link

PRIHLOP commented Sep 9, 2022

Maybe implemented by #7023

@chaudum chaudum added the type/bug Somehing is not working as expected label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/logql stale A stale issue or PR that will automatically be closed. type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants