[Loki-distributed] query error open /var/loki/chunks/ #1111

danielserrao · 2022-03-18T19:17:00Z

I have Grafana with the Loki datasource pointing to the loki querier-frontend but I get the following error when making queries:

Query error
open /var/loki/chunks/ZmFrZS9kOGU4OGYwOTg3ZTM0NWUyOjE3ZjllMTk0NmE4OjE3ZjllMTk1NmVkOmMwMWFiYmNm: no such file or directory

Sometimes it is working and then it gets the same error for some reason that is not clear to me.

On the logs of the querier-frontend pod I can see:

caller=logging.go:72 traceID=5c8361c04594c7a2 orgID=fake msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Bjob%3D%22fbit_k8s%22%7D&start=1647619419284000000&end=1647630219285000000&step=5 (500) 53.767877ms Response: \"open /var/loki/chunks/ZmFrZS9kOGU4OGYwOTg3ZTM0NWUyOjE3ZjllMTk0NmE4OjE3ZjllMTk1NmVkOmMwMWFiYmNm: no such file or directory\\n\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br; Accept-Language: en-GB,en;q=0.9,en-US;q=0.8; Sec-Ch-Ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"99\", \"Microsoft Edge\";v=\"99\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"Windows\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/8.3.5; X-Forwarded-For: 127.0.0.1, 127.0.0.1; X-Grafana-Org-Id: 1; "

When doing "helm template", the K8s manifest (which is applied) is the following:

test.txt

I already tried multiple types of configurations, but I always get this annoying error.

Some help would be very appreciated.

The text was updated successfully, but these errors were encountered:

danielserrao · 2022-03-21T12:30:28Z

This started working after using s3 storage with the following loki-distributed config:

loki:
  config: |
    auth_enabled: false
    chunk_store_config:
      max_look_back_period: 0s
    compactor:
      shared_store: s3
    distributor:
      ring:
        kvstore:
          store: memberlist
    frontend:
      compress_responses: true
      log_queries_longer_than: 5s
      tail_proxy_url: http://loki-distributed-querier:3100
    frontend_worker:
      frontend_address: loki-distributed-query-frontend:9095
    ingester:
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_idle_period: 5m
      chunk_retain_period: 30s
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      max_chunk_age: 5m
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal
    limits_config:
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    memberlist:
      join_members:
      - loki-distributed-memberlist
    query_range:
      align_queries_with_step: true
      cache_results: true
      max_retries: 5
      results_cache:
        cache:
          enable_fifocache: true
          fifocache:
            max_size_items: 1024
            validity: 24h
      split_queries_by_interval: 15m
    ruler:
      alertmanager_url: https://alertmanager.xx
      external_url: https://alertmanager.xx
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      storage:
        local:
          directory: /etc/loki/rules
        type: local
    schema_config:
      configs:
      - from: "2020-05-15"
        index:
          period: 24h
          prefix: index_
        object_store: s3
        schema: v11
        store: boltdb-shipper
    server:
      http_listen_port: 3100
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 168h
        index_gateway_client:
          server_address: dns:///loki-distributed-index-gateway:9095
        shared_store: s3
      aws:
        bucketnames: <bucket-name>
        s3: s3://<region>
    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

rafaribe · 2022-04-12T11:17:23Z

I'm experiencing the same thing with a very similar config to yours but using azure blob storage.
If I query anything over 1h i get this annoying message.

aberenshtein · 2022-04-26T09:22:20Z

Thanks @danielserrao
changing all filesystem references to s3 worked for me

kumarganesh2814 · 2022-05-09T05:24:13Z

Hi

I am getting same error, may I know config which you mentioned where needs to be updated. I have installed Loki as

Is there any configmap we can update?

Best Regards
Ganesh

aberenshtein · 2022-05-09T07:34:09Z

These are all the places I changed
https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L144
https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L163
https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L172

kfkawalec · 2022-05-21T06:52:08Z

I have the same problem after restarting some components. Anyone have a solution on how to fix it?

error: open /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy: no such file or directory

But the file exists and all perminision is OK

$ more /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy
rke2-ingress-nginx-controller","filename":"/var/log/pods/kube-system_rk
--More--(1%)

liuxuzxx · 2022-05-30T09:29:53Z

When I use loki-simple-scalabel, and I use nfs of storageClass,
When I select the time range is 5 minutes ,It is ok, but when I select 15 minutes or 1hour time range,the error occured!

open /var/loki/chunks/fake/755005aa5e414340/MTgxMTNjOGM5MGI6MTgxMTQzNmE2NTI6M2RkYjQzYmQ=: no such file or directory

when I enter the write pod ,the file is exists!

This error occurs sometimes and sometimes not

tobifroe · 2022-05-30T09:37:34Z

This is mentioned in the chart README I think:

NOTE: In its default configuration, the chart uses boltdb-shipper and filesystem as storage. The reason for this is that the chart can be validated and installed in a CI pipeline. However, this setup is not fully functional. Querying will not be possible (or limited to the ingesters' in-memory caches) because that would otherwise require shared storage between ingesters and queriers which the chart does not support and would require a volume that supports ReadWriteMany access mode anyways. The recommendation is to use object storage, such as S3, GCS, MinIO, etc., or one of the other options documented at https://grafana.com/docs/loki/latest/storage/.

Using filesystem storage in the multi pod setup would require multiple pods to access the same volume, so data is only queryable as long as it's cached in memory.
I got around this issue by installing the single binary Loki chart

andretadeu · 2022-08-29T09:53:49Z

I could get the things working by configuring the volumes:

loki-distributed:
  ingester:
    extraVolumes:
      - name: loki-chunks
        hostPath:
          path: "/var/loki/chunks"
          type: Directory
    extraVolumeMounts:
      - name: loki-chunks
        mountPath: "/var/loki/chunks"
  querier:
    extraVolumes:
      - name: loki-chunks
        hostPath:
          path: "/var/loki/chunks"
          type: Directory
    extraVolumeMounts:
      - name: loki-chunks
        mountPath: "/var/loki/chunks"

and I created this folder with permissions to the pods to write on them. Of course, this settings are for local directories, not for volumes on GCS or S3, for example.

matthewei · 2022-09-23T01:07:25Z

@aberenshtein hi, have you solved this issue? i meet the same issue. I don't use object storage and just use filesystem(lvm-localpv)

aberenshtein · 2022-09-23T06:15:56Z

yes, but I see that the references I put for the value files are outdated.
I guess they were updated in later versions

ak2766 · 2022-11-06T07:01:31Z

I'm getting this error when there's high traffic in the cluster. I managed to duplicate by running the benchmark tool - wrk. It seems that when promtail is unable to send logs to loki due to high network traffic in my cluster, then querying loki datasource in grafana results in this error if the query range includes the time period of high traffic.

Any solution for this?

UPDATE: I'm running the following kube-prometheus-stack components in the cluster:

$ helm -n monitoring list
NAME            NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                           APP VERSION
loki            monitoring      1               2022-09-27 14:48:32.011792243 +1000 AEST        deployed        loki-distributed-0.58.0         2.6.1
prom            monitoring      1               2022-09-27 14:47:26.820679248 +1000 AEST        deployed        kube-prometheus-stack-40.1.2    0.59.1
promtail        monitoring      1               2022-09-27 14:48:23.583706894 +1000 AEST        deployed        promtail-6.4.0                  2.6.1

jdgomeza · 2022-11-17T21:54:22Z

For me, the problem was solved by removing the default configuration storage_config/filesystem that the helm template generates after applying my values.yaml file. I am using the helm chart loki-distributed v0.63.1.

here is the snippet that removes the extra filesystem config property

# values.yaml
loki:
  annotations: {}

  ...  

  storageConfig:
    boltdb_shipper:
      shared_store: s3
    aws:
      s3: s3://${cluster_region}
      bucketnames: ${bucket_name}
    filesystem: null

Notice the latest filesystem: null. That line removes the reference to directory: /var/loki/chunks that was confusing the querier

# generated configMap
apiVersion: v1
data:
  config.yaml: |
    auth_enabled: false

...

    storage_config:
      aws:
        bucketnames: bucket-for-logs
        s3: s3://${region}
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 168h
        shared_store: s3
-      filesystem:
-       directory: /var/loki/chunks

adapasuresh · 2023-03-04T12:53:27Z

I have distributed micro services working in one cluster, but in production facing issues after couple of weeks. I added pvc to grafana and restarted the same and now I am not able to get labels in grafana UI with "failed to call resource "

shengjiangfeng · 2024-08-30T07:32:44Z

i still have problem when use distributed to query log

Hamza-Aziz · 2024-12-19T14:45:13Z

As pointed in a previous comment, the querier and the ingester needs to have access to the same dir /var/loki/chunks

I got working and this is by creating a pvc out of the helm chart because the helm chart is hard coding the access mode of the data pvc, and therefore I had to create a RWX pvc loki-chunks and attach it to both components

    querier:
      extraVolumes:
        - name: loki-chunks
          persistentVolumeClaim:
            claimName: loki-chunks
      extraVolumeMounts:
        - name: loki-chunks
          mountPath: "/var/loki/chunks"

    ingester:
      extraVolumes:
        - name: loki-chunks
          persistentVolumeClaim:
            claimName: loki-chunks
      extraVolumeMounts:
        - name: loki-chunks
          mountPath: "/var/loki/chunks"

danielserrao changed the title ~~[Loki-distributed]~~ [Loki-distributed] query error open /var/loki/chunks/ Mar 18, 2022

j-karls mentioned this issue Nov 14, 2023

Loki cannot query logs from before the upgrade from 2.3 -> 2.7 grafana/loki#8738

Open

calinah mentioned this issue Dec 20, 2023

loki: Failed to load chunks when object storage is not configured graphops/launchpad-namespaces#313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Loki-distributed] query error open /var/loki/chunks/ #1111

[Loki-distributed] query error open /var/loki/chunks/ #1111

danielserrao commented Mar 18, 2022 •

edited

Loading

danielserrao commented Mar 21, 2022

rafaribe commented Apr 12, 2022

aberenshtein commented Apr 26, 2022 •

edited

Loading

kumarganesh2814 commented May 9, 2022

aberenshtein commented May 9, 2022

kfkawalec commented May 21, 2022

liuxuzxx commented May 30, 2022

tobifroe commented May 30, 2022 •

edited

Loading

andretadeu commented Aug 29, 2022 •

edited

Loading

matthewei commented Sep 23, 2022

aberenshtein commented Sep 23, 2022

ak2766 commented Nov 6, 2022 •

edited

Loading

jdgomeza commented Nov 17, 2022

adapasuresh commented Mar 4, 2023

shengjiangfeng commented Aug 30, 2024

Hamza-Aziz commented Dec 19, 2024

[Loki-distributed] query error open /var/loki/chunks/ #1111

[Loki-distributed] query error open /var/loki/chunks/ #1111

Comments

danielserrao commented Mar 18, 2022 • edited Loading

danielserrao commented Mar 21, 2022

rafaribe commented Apr 12, 2022

aberenshtein commented Apr 26, 2022 • edited Loading

kumarganesh2814 commented May 9, 2022

aberenshtein commented May 9, 2022

kfkawalec commented May 21, 2022

liuxuzxx commented May 30, 2022

tobifroe commented May 30, 2022 • edited Loading

andretadeu commented Aug 29, 2022 • edited Loading

matthewei commented Sep 23, 2022

aberenshtein commented Sep 23, 2022

ak2766 commented Nov 6, 2022 • edited Loading

jdgomeza commented Nov 17, 2022

adapasuresh commented Mar 4, 2023

shengjiangfeng commented Aug 30, 2024

Hamza-Aziz commented Dec 19, 2024

danielserrao commented Mar 18, 2022 •

edited

Loading

aberenshtein commented Apr 26, 2022 •

edited

Loading

tobifroe commented May 30, 2022 •

edited

Loading

andretadeu commented Aug 29, 2022 •

edited

Loading

ak2766 commented Nov 6, 2022 •

edited

Loading