Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]GrafanaDashboard resource created dashboards are not cleaned up when removed #1581

Closed
ak185158 opened this issue Jun 13, 2024 · 10 comments
Assignees
Labels
bug Something isn't working triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@ak185158
Copy link

ak185158 commented Jun 13, 2024

Describe the bug
A clear and concise description of what the bug is.

When a GrafanaDashboard custom resource is used to create/manage a dashboard instance, it is expected that resulting dashboard instance created in Grafana would be cleaned up when the resource is removed. This does not appear to be the case and results in stale/orphaned dashboard instances that persist.

Version
Full semver version of the operator being used e.g. v4.10.0, v5.0.0-rc0

v5.9.2

To Reproduce
Steps to reproduce the behavior:

  1. Create a GrafanaDashboard custom resource
  2. Verify the corresponding dashboard instance is created in Grafana from the GrafanaDashboard resource
  3. Remove the GrafanaDashboard custom resource
  4. Verify the dashboard instance persists even though the originating custom resource that created it has been removed

Expected behavior
Grafana-operator should remove the dashboard instance that was created by the custom resource once it is no longer present. Not doing so results in stale, orphaned dashboard instances once the underlying resource that created it is removed.

@ak185158 ak185158 added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024
@theSuess theSuess self-assigned this Jun 17, 2024
@theSuess
Copy link
Member

Hey, I was unable to reproduce this issue. Maybe this has something to do with the permissions of your setup. How did you deploy the Grafana operator?

@theSuess theSuess added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 17, 2024
@chaijunkin
Copy link

I have similar issue when I deployed the grafana dashboard (operator managing) via argocd, not sure can I reproduce the step, but I will list them below
Step
1 - original dashboard
2 - upgrade dashboard version (change original folder path name and remove original dashboard)
3 - the dashboard is not deleted

@mkyc
Copy link

mkyc commented Jul 5, 2024

exactly the same issue here, but it is inconsistent. During tests I approach it on random occasions.

Here are steps to reproduce (I'm copying from my k3d setup script):

setup

  1. install operator
kubectl create namespace pmon-grafana-operator || true
helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-operator --version v5.9.2 --namespace pmon-grafana-operator --values grafana-operator.values.yaml --wait

grafana-operator.values.yaml:

serviceMonitor:
  enabled: true
  1. install Grafana
kubectl create namespace pmon-grafana || true
kubectl apply -f grafana.yaml --namespace pmon-grafana

grafana.yaml:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-var-lib-grafana-pv
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/var-lib-grafana
...
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-var-lib-grafana-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  name: grafana
  labels:
    dashboards: gitops
spec:
  deployment:
    spec:
      template:
        spec:
          containers:
            - name: grafana
              volumeMounts:
                - name: grafana-var-lib-grafana-pv
                  mountPath: /var/lib/grafana
          volumes:
            - name: grafana-var-lib-grafana-pv
              persistentVolumeClaim:
                claimName: grafana-var-lib-grafana-pvc
  service:
    spec:
      type: NodePort
    metadata:
      labels:
        app: grafana
  config:
    log:
      mode: "console"
    security:
      admin_user: root
      admin_password: secret
      disable_gravatar: "true"
    auth.anonymous:
      enabled: "false"
...
  1. install Grafana resources:
kubectl create namespace pmon-grafana-resources || true
kubectl apply -f grafana-resources.yaml --namespace pmon-grafana-resources

grafana-resources.yaml:

---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  name: loki-datasource
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  datasource:
    name: loki
    type: loki
    uid: loki1
    access: proxy
    url: http://lgtm-loki-gateway.pmon-lgtm.svc.cluster.local
    isDefault: true
    jsonData:
      timeout: 60
      maxLines: 1000
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  name: mimir-datasource
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  datasource:
    name: mimir
    uid: mimir1
    type: prometheus
    access: proxy
    url: http://lgtm-mimir-nginx.pmon-lgtm.svc.cluster.local/prometheus
    isDefault: false
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaFolder
metadata:
  name: test-folder
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops

  # If title is not defined, the value will be taken from metadata.name
  title: lalala/lilili
  # When permissions value is empty/absent, a folder is created with default permissions
  # When empty JSON is passed ("{}"), the access is stripped for everyone except for Admin (default Grafana behaviour)
  permissions: |
    {
      "items": [
        {
          "role": "Admin",
          "permission": 4
        },
        {
          "role": "Editor",
          "permission": 2
        }, 
        {
          "role": "Viewer",
          "permission": 1
        }
      ]
    }
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
  name: coredns-test-dashboard
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  grafanaCom:
    id: 15762
    revision: 18
...

result

as expected:

Screenshot 2024-07-05 at 13 23 55

("Logs/App" is added manually to test if operator doesn't interfere with those).

remove

Option 1:

kubectl delete -f grafana-resources.yaml --namespace pmon-grafana-resources || true

with grafana-resources.yaml form previous step:

Screenshot 2024-07-05 at 13 41 58

There are errors regarding loki datasource during reconciliation loop, but eventually those go away and are unrelated I guess.

Option 2:

kubectl delete --namespace pmon-grafana-resources GrafanaDashboard/coredns-test-dashboard 

not even single log message and:

Screenshot 2024-07-05 at 13 47 21

so nothing got removed, and it looks like operator didn't even noticed that resource was deleted.

But ... sometimes it works. If I run that same sequence of steps 3-5 times:

kubectl apply -f grafana-resources.yaml --namespace pmon-grafana-resources
kubectl delete --namespace pmon-grafana-resources GrafanaDashboard/coredns-test-dashboard 

eventually it will start removing that dashboard:

Screenshot 2024-07-05 at 13 55 49

and it will be adding and removing in next repeats.

It looks to me like that is operator not getting some events on removed dashboards sometimes. I didn't notice that for folders though, just for Dashboards.

@pb82
Copy link
Collaborator

pb82 commented Jul 8, 2024

thanks @mkyc I'll try to reproduce from the provided steps now.

Copy link

github-actions bot commented Aug 8, 2024

This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label

@github-actions github-actions bot added the stale label Aug 8, 2024
@Fantaztig
Copy link

As I read the example by @mkyc the commands will delete both the dashboard and the containing folder at once, which leads to none of them being deleted in the instance.
This behavior looks to be the same as described in #1626, right? @ak185158 do you experience the same issue when deleting only the dashboard?

@github-actions github-actions bot removed the stale label Aug 15, 2024
Copy link

This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label

@github-actions github-actions bot added the stale label Sep 14, 2024
@theSuess theSuess removed the stale label Sep 17, 2024
Copy link

This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label

@github-actions github-actions bot added the stale label Oct 18, 2024
@theSuess theSuess removed the stale label Oct 18, 2024
@yurii-kryvosheia
Copy link

We have recently added PVC to our instance and noticed some dashboards are still hanging in UI even though custom resources were deleted long ago. We guessed this related to persistence, though it wasn't confirmed after a few creation\deletion actions.
I can confirm it is inconsistent.

@theSuess
Copy link
Member

With #1728, folder deletions are now forced which solves the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

7 participants