Grafana dashboard not displaying any logs in loki panels #761

candlerb · 2024-04-17T09:02:39Z

Required information

Distribution: Ubuntu
Distribution version: 22.04
Incus: 6.0-202404040310-ubuntu22.04

Issue description

This is an issue with the grafana dashboard, https://grafana.com/grafana/dashboards/19727-incus/

Metrics display is working fine, but the loki panels at the bottom are empty. This is because the LogQL queries are wrong.

They have {app="incus",type="lifecycle",instance="$job"} for the first panel, and {app="incus",type="logging",instance="$job"} for the second.

However the "instance" label in logs don't contain the job name or the container name. Also, the $job variable in Grafana is the prometheus scrape job name, and has nothing to do with loki logs.

Here are some example logs:

# logcli query '{app="incus"}' --since=1h --limit=2000
2024/04/17 08:18:06 http://localhost:3100/loki/api/v1/query_range?direction=BACKWARD&end=1713341886833710643&limit=1000&query=%7Bapp%3D%22incus%22%7D&start=1713338286833710643
2024/04/17 08:18:06 Common labels: {app="incus", instance="none", location="none"}
2024-04-17T07:57:25Z {name="netbox4", project="default", type="lifecycle"} requester-username="root" action="instance-started" source="/1.0/instances/netbox4" requester-address="@" requester-protocol="unix" instance-started
2024-04-17T07:57:25Z {type="logging"}                                      context-action="start" context-created="2024-04-03 13:50:03.830202559 +0000 UTC" context-ephemeral="false" context-instance="netbox4" context-instanceType="container" context-project="default" context-stateful="false" context-used="2024-04-03 13:50:09.473557693 +0000 UTC" level="info" Started instance
2024-04-17T07:57:24Z {type="logging"}                                      context-action="start" context-created="2024-04-03 13:50:03.830202559 +0000 UTC" context-ephemeral="false" context-instance="netbox4" context-instanceType="container" context-project="default" context-stateful="false" context-used="2024-04-03 13:50:09.473557693 +0000 UTC" level="info" Starting instance
2024-04-17T07:57:23Z {name="netbox4", project="default", type="lifecycle"} requester-protocol="unix" requester-username="root" action="instance-shutdown" source="/1.0/instances/netbox4" requester-address="@" instance-shutdown
2024-04-17T07:57:20Z {type="logging"}                                      context-action="shutdown" context-created="2024-04-03 13:50:03.830202559 +0000 UTC" context-ephemeral="false" context-instance="netbox4" context-instanceType="container" context-project="default" context-timeout="10m0s" context-used="2024-04-03 13:50:09.473557693 +0000 UTC" level="info" Shutting down instance
2024-04-17T07:55:50Z {name="nfsen", project="default", type="lifecycle"}   source="/1.0/instances/nfsen" context-command="[su -l]" action="instance-exec" instance-exec
2024-04-17T07:41:45Z {type="logging"}                                      level="info" Done updating images
2024-04-17T07:41:45Z {type="logging"}                                      context-err="Failed getting remote image info: Failed getting image: The requested image couldn't be found" context-fingerprint="d9802562f513aa78eb45bfb68a47194b144da9a08639e6fcf8c54f9b261e3c65" level="error" Failed to update the image
2024-04-17T07:41:44Z {type="logging"}                                      context-err="Failed getting remote image info: Failed getting image: The requested image couldn't be found" context-fingerprint="c533845b5db1747674ee915cbb20df6eb47c953bb7caf1fec5b35ae9ccf98c18" level="error" Failed to update the image
2024-04-17T07:41:41Z {type="logging"}                                      level="info" Done pruning expired backups
2024-04-17T07:41:41Z {type="logging"}                                      level="info" Pruning expired backups
2024-04-17T07:41:41Z {type="logging"}                                      level="info" Updating images
2024-04-17T07:40:43Z {type="logging"}                                      level="info" Done updating images
2024-04-17T07:40:43Z {type="logging"}                                      context-err="Failed getting remote image info: Failed getting image: The requested image couldn't be found" context-fingerprint="d9802562f513aa78eb45bfb68a47194b144da9a08639e6fcf8c54f9b261e3c65" level="error" Failed to update the image
2024-04-17T07:40:42Z {type="logging"}                                      level="info" Done pruning expired backups
2024-04-17T07:40:42Z {type="logging"}                                      level="info" Updating images
2024-04-17T07:40:42Z {type="logging"}                                      level="info" Pruning expired backups
2024-04-17T07:35:46Z {type="logging"}                                      level="info" Done pruning expired backups
2024-04-17T07:35:46Z {type="logging"}                                      level="info" Done updating images
2024-04-17T07:35:46Z {type="logging"}                                      level="info" Updating images
2024-04-17T07:35:46Z {type="logging"}                                      level="info" Pruning expired backups
2024/04/17 08:18:06 http://localhost:3100/loki/api/v1/query_range?direction=BACKWARD&end=1713339346929598024&limit=1000&query=%7Bapp%3D%22incus%22%7D&start=1713338286833710643
2024/04/17 08:18:06 Common labels: {app="incus", instance="none", location="none", type="logging"}
#

Notice how some logs have the name of the container as a label (name="netbox4"), but some other logs relating to this container don't. They may have it buried in the logfmt data though, e.g. context-instance="nfsen" or source="/1.0/instances/nfsen". If you filter logs by container, you still want to see those logs.

I propose that at simplest, the queries need to change to:

# option 1 (simple)
{app="incus", type="lifecycle", name=~"|$name", project=~"|$project"}
{app="incus", type="logging", name=~"|$name", project=~"|$project"}

The vertical bar inside the regexp is because the "name" and "project" labels may be missing (even for logs specific to one container), so we must allow through lines where this label is missing.

However, that will also show logs for other containers, when those logs have no name or project label. A bit of additional filtering can ensure that the container name appears somewhere in the log line:

# option 2 (more selective)
{app="incus", type="lifecycle", name=~"|$name", project=~"|$project"} |~ "$name"
{app="incus", type="logging", name=~"|$name", project=~"|$project"} |~ "$name"

This now works as expected:

However, if one container name is a prefix of another container name, or two projects have containers with the same name, it may show some logs for another container. A more sophisticated filter is possible:

# option 3 (complex)
{app="incus", type="lifecycle", name=~"|$name", project=~"|$project"}  | logfmt | context_instance=~"|$name" | context_project=~"|$project"
{app="incus", type="logging", name=~"|$name", project=~"|$project"}  | logfmt | context_instance=~"|$name"  | context_project=~"|$project"

This assumes that every log relating to container X either has label name="X" or the log message contains context_instance="X". (Note that hyphens in logfmt attributes are converted to underscores, so that they become valid LogQL label names)

This is true for the logs shown above. In fact, in these examples the lifecycle logs all have name="X",project="Y" and the event logs have context_instance="X",context_project="Y", so the queries can simplify to:

# option 4 (final)
{app="incus", type="lifecycle", name=~"|$name", project=~"|$project"}
{app="incus", type="logging"}  | logfmt | context_instance=~"|$name" | context_project=~"|$project"

I've tried this and it works for me. However I'm not sure if those conditions are true in general for all possible logs from incus. It could be argued it's hard-coding too much info about the log attributes.

Steps to reproduce

Install loki
On incus host, turn on Loki logging: incus config set loki.api.url=http://loki.example.net:3100
Start and stop a container, check it creates some logs in Loki
Install grafana, add Loki as data source. Use "Explore" to browse Loki logs and check you can see the incus logs.
Install the incus grafana dashboard, 19727
Open the incus dashboard, find that no Loki logs are visible
Edit panels and adjust the queries to those given above; logs appear
Try changing the selected project and/or container, and check that logs are filtered appropriately.

Cross-reference

Issue appears to be inherited from lxd dashboard, raised previously: canonical/lxd#13165

The text was updated successfully, but these errors were encountered:

stgraber · 2024-04-17T14:14:00Z

You need to set loki.instance to match the name of your prometheus job as that's what we get through the dropdown at the top of the dashboard.

candlerb · 2024-04-17T14:29:35Z

But if you do that, you lose the visibility of which host each log message came from. If your prometheus job for scraping metrics is called "incus" then you'll have to set loki.instance="incus" on all nodes, and then all logs will just say instance="incus". That isn't very useful.

Whereas for metrics, instance="XXX" tells you the host which is scraped, and therefore which host the container is running on. This is true even for clusters according to the documentation:

In a cluster environment, Incus returns only the values for instances running on the server that is being accessed. Therefore, you must scrape each cluster member separately.

It would be strange if metrics had instance="nuc1", instance="nuc2", instance="nuc3" but logs all had instance="incus".

stgraber · 2024-04-17T14:30:13Z

You don't, the location field still tells you where things are.

stgraber · 2024-04-17T14:31:09Z

candlerb · 2024-04-17T14:33:11Z

I get location="none" in my loki logs (you can see it in the examples above), and I don't see a way to override it. Unless you mean I should use loki.labels to set it?

I also get instance="none" when loki.instance is unset (reported separately at #762). Documentation does say it should default to the hostname, which implies it should work like a prometheus instance label.

stgraber · 2024-04-17T14:35:31Z

Yeah, those two fields seem wrong in non-clustered case, both should default to your hostname in non-clustered cases and instance should be overrideable through loki.instance.

stgraber · 2024-04-17T14:37:57Z

From your suggested changes above, I'll be taking the project~="|$project" part as that makes sense and works well.

The $name part doesn't work because when you select All it will pass a regexp of all individual instances through the filter rather than passing an empty string. So in my case it means passing several thousand instance names through which seriously impacts Loki and it also means that any log or lifecycle even which doesn't have a name field set won't work.

Also name in a Loki entry doesn't necessarily mean instance name, if the event is network or storage related, it may refer to a network or storage pool. So I think we need to stay away from filtering based on instances for now.

candlerb · 2024-04-17T14:51:44Z

The $name part doesn't work because when you select All it will pass a regexp of all individual instances through the filter rather than passing an empty string. So in my case it means passing several thousand instance names through which seriously impacts Loki and it also means that any log or lifecycle even which doesn't have a name field set won't work.

Oops, I had hoped it would do something sensible like empty string or ".*".

However, apparently there is a "custom all value", which I found via here and here.

Also name in a Loki entry doesn't necessarily mean instance name, if the event is network or storage related, it may refer to a network or storage pool. So I think we need to stay away from filtering based on instances for now.

OK, fair enough - although if "All" did match .* then it would be OK.

Closes lxc#761 Signed-off-by: Stéphane Graber <[email protected]>

Closes canonical#13165 and lxc/incus#761 Signed-off-by: Stéphane Graber <[email protected]> (cherry picked from commit 9d31814a93669e38d6b6a2a8215175f546f582d1) Signed-off-by: Simon Deziel <[email protected]> License: Apache-2.0

Closes #761 Signed-off-by: Stéphane Graber <[email protected]>

Closes canonical#13165 and lxc/incus#761 Signed-off-by: Stéphane Graber <[email protected]> (cherry picked from commit 9d31814a93669e38d6b6a2a8215175f546f582d1) Signed-off-by: Simon Deziel <[email protected]> License: Apache-2.0

stgraber added a commit to stgraber/incus that referenced this issue Apr 17, 2024

grafana: Better filter Loki events by project

dc8e927

Closes lxc#761 Signed-off-by: Stéphane Graber <[email protected]>

tych0 closed this as completed in 9d31814 Apr 17, 2024

stgraber added a commit that referenced this issue May 27, 2024

grafana: Better filter Loki events by project

119726f

Closes #761 Signed-off-by: Stéphane Graber <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grafana dashboard not displaying any logs in loki panels #761

Grafana dashboard not displaying any logs in loki panels #761

candlerb commented Apr 17, 2024 •

edited

Loading

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024

stgraber commented Apr 17, 2024

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024 •

edited

Loading

stgraber commented Apr 17, 2024

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024

Grafana dashboard not displaying any logs in loki panels #761

Grafana dashboard not displaying any logs in loki panels #761

Comments

candlerb commented Apr 17, 2024 • edited Loading

Required information

Issue description

Steps to reproduce

Cross-reference

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024

stgraber commented Apr 17, 2024

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024 • edited Loading

stgraber commented Apr 17, 2024

stgraber commented Apr 17, 2024

candlerb commented Apr 17, 2024

candlerb commented Apr 17, 2024 •

edited

Loading

candlerb commented Apr 17, 2024 •

edited

Loading