Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV.spec.ClaimRef.uid removed from PersistentVolume after reconciliation. #2250

Closed
1 task done
a-mcf opened this issue Dec 28, 2021 · 16 comments
Closed
1 task done

PV.spec.ClaimRef.uid removed from PersistentVolume after reconciliation. #2250

a-mcf opened this issue Dec 28, 2021 · 16 comments

Comments

@a-mcf
Copy link

a-mcf commented Dec 28, 2021

Describe the bug

I have several statically provisioned PV/PVCs. When the PV/PVCs are first applied by either flux or via kubectl, the PVCs show up as "Bound" and everything looks good. However whenever flux next reconciles the repository, the logs say that flux detected a change and the PVC is now reported as lost:

NAME            STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-homer-pvc   Lost     nfs-homer-pv   0                                        11m

After digging around a bit, I figured out that spec.ClaimRef.uid is being removed from the PV. If I manually specify PV.spec.ClaimRef.uid with the value of the uid from the PersistentVolumeClaim it survives reconciliation.

This behavior does NOT occur if I apply the configuration manually with kubectl.

Here is my configuration

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-homer-pv
  namespace: default
spec:
  capacity:
    storage: 128Mi    
  accessModes:
  - ReadWriteMany
  nfs:
    path: /tank/appdata/homer
    server: 192.168.1.10
  persistentVolumeReclaimPolicy: Retain
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    namespace: default
    name: nfs-homer-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-homer-pvc
  namespace: default
spec:
  storageClassName: ""
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 128Mi

Steps to reproduce

  1. Commit the configuration in the description above to git.
  2. Reconcile the repository with flux reconcile or just wait for it to do it on it's own.
  3. The PVC will show up as bound.
  4. Make another change and force flux reconciliation or wait. The PVC status will now be "Lost" and the uid will be removed from claimRef on the PV.

Expected behavior

The dynamically added uid should not be removed from PV.spec.claimRef.uid when the repository is reconciled.

Screenshots and recordings

No response

OS / Distro

Ubuntu 20.04

Flux version

flux: v0.24.1

Flux check

► checking prerequisites
✔ Kubernetes 1.22.5+k3s1 >=1.19.0-0
► checking controllers
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.19.0
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.14.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.18.2
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.19.2
✔ all checks passed

Git provider

GitHub

Container Registry provider

n/a

Additional context

Here is a blurb from the logs. Most of the PVs here have already had the uid removed and their associated PVCs in a "Lost" state which is why they are marked as unchanged. I was working with home, so note the "PersistentVolume/nfs-homer-pv":"configured" in the json below.

{"level":"info","ts":"2021-12-28T00:50:00.912Z","logger":"controller.kustomization","msg":"server-side apply completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"apps","namespace":"flux-system","output":{"Certificate/networking/mcf-ninja":"unchanged","ConfigMap/monitoring/loki-chunks-dashboard":"unchanged","ConfigMap/monitoring/loki-deletion-dashboard":"unchanged","ConfigMap/monitoring/loki-logs-dashboard":"unchanged","ConfigMap/monitoring/loki-mixin-recording-rules-dashboard":"unchanged","ConfigMap/monitoring/loki-operational-dashboard":"unchanged","ConfigMap/monitoring/loki-reads-dashboard":"unchanged","ConfigMap/monitoring/loki-reads-resources-dashboard":"unchanged","ConfigMap/monitoring/loki-retention-dashboard":"unchanged","ConfigMap/monitoring/loki-writes-dashboard":"unchanged","ConfigMap/monitoring/loki-writes-resources-dashboard":"unchanged","HelmRelease/default/heimdall":"unchanged","HelmRelease/default/homer":"configured","HelmRelease/default/paperless":"unchanged","HelmRelease/default/photoprism":"unchanged","HelmRelease/home/home-assistant":"unchanged","HelmRelease/media/plex":"unchanged","HelmRelease/monitoring/grafana":"unchanged","HelmRelease/monitoring/kube-prometheus-stack":"unchanged","HelmRelease/monitoring/loki":"unchanged","HelmRelease/monitoring/promtail":"unchanged","HelmRelease/networking/ingress-nginx":"unchanged","HelmRelease/nextcloud/nextcloud":"unchanged","PersistentVolume/nfs-heimdall-pv":"unchanged","PersistentVolume/nfs-home-assistant-config-pv":"unchanged","PersistentVolume/nfs-homer-pv":"configured","PersistentVolume/nfs-loki-pv":"unchanged","PersistentVolume/nfs-movies-pv":"unchanged","PersistentVolume/nfs-music-pv":"unchanged","PersistentVolume/nfs-nextcloud-data-pv":"unchanged","PersistentVolume/nfs-nextcloud-mariadb-pv":"unchanged","PersistentVolume/nfs-paperless-consume-pv":"unchanged","PersistentVolume/nfs-paperless-data-pv":"unchanged","PersistentVolume/nfs-paperless-export-pv":"unchanged","PersistentVolume/nfs-paperless-media-pv":"unchanged","PersistentVolume/nfs-photoprism-import-pv":"unchanged","PersistentVolume/nfs-photoprism-originals-pv":"unchanged","PersistentVolume/nfs-photoprism-storage-pv":"unchanged","PersistentVolume/nfs-plex-pv":"unchanged","PersistentVolume/nfs-prometheus-pv":"unchanged","PersistentVolume/nfs-tv-pv":"unchanged","PersistentVolumeClaim/default/nfs-heimdall-pvc":"unchanged","PersistentVolumeClaim/default/nfs-homer-pvc":"unchanged","PersistentVolumeClaim/default/nfs-paperless-consume-pvc":"unchanged","PersistentVolumeClaim/default/nfs-paperless-data-pvc":"unchanged","PersistentVolumeClaim/default/nfs-paperless-export-pvc":"unchanged","PersistentVolumeClaim/default/nfs-paperless-media-pvc":"unchanged","PersistentVolumeClaim/default/nfs-photoprism-import-pvc":"unchanged","PersistentVolumeClaim/default/nfs-photoprism-originals-pvc":"unchanged","PersistentVolumeClaim/default/nfs-photoprism-storage-pvc":"unchanged","PersistentVolumeClaim/home/nfs-home-assistant-config-pvc":"unchanged","PersistentVolumeClaim/media/nfs-movies-pvc":"unchanged","PersistentVolumeClaim/media/nfs-music-pvc":"unchanged","PersistentVolumeClaim/media/nfs-plex-pvc":"unchanged","PersistentVolumeClaim/media/nfs-tv-pvc":"unchanged","PersistentVolumeClaim/monitoring/nfs-loki-pvc":"unchanged","PersistentVolumeClaim/monitoring/prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0":"unchanged","PersistentVolumeClaim/nextcloud/nfs-nextcloud-data-pvc":"unchanged","PersistentVolumeClaim/nextcloud/nfs-nextcloud-mariadb-pvc":"unchanged","Plan/system-upgrade/k3s-agent":"unchanged","Plan/system-upgrade/k3s-server":"unchanged"}}

Code of Conduct

  • I agree to follow this project's Code of Conduct
@kingdonb
Copy link
Member

kingdonb commented Dec 28, 2021

I set up persistent volumes with Flux in a similar way, but I do not write any claimRef on the persistent volume itself.

Instead, I use volumeName on the persistentVolume, and I believe it is the cluster which fills out claimRef when it does the binding. I think it is usually considered an anti-pattern for a Custom Resource to write back to its own spec, but as this is PersistentVolume in the canonical Kubernetes API, I wouldn't think myself able to put up a convincing argument for or against why it is this way. I think this is where I'm supposed to hold the phone up to my ear and accuse you of holding it wrong...

(⎈ )$ k get pvc -n minio-stage minio-stage
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
minio-stage   Bound    pvc-61decf71-3bed-406f-8836-18dcbb3ef89e   10Gi       RWO            local-path     8d

(⎈ )$ k get pvc -n minio-stage minio-stage -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
    volume.kubernetes.io/selected-node: msigaming
  creationTimestamp: "2021-12-19T21:59:19Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: minio-stage
    app.kubernetes.io/name: minio-stage
    kustomize.toolkit.fluxcd.io/name: 12-persistence
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: minio-stage
  namespace: minio-stage
  resourceVersion: "7392290"
  uid: caf3690d-3d2f-4de7-bb5c-48a05c1a41ee
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-path
  volumeMode: Filesystem
  volumeName: pvc-61decf71-3bed-406f-8836-18dcbb3ef89e
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

(⎈ )$ k get pv pvc-61decf71-3bed-406f-8836-18dcbb3ef89e
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
pvc-61decf71-3bed-406f-8836-18dcbb3ef89e   10Gi       RWO            Retain           Bound    minio-stage/minio-stage   local-path              8d

(⎈ )$ k get pv pvc-61decf71-3bed-406f-8836-18dcbb3ef89e -oyaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: rancher.io/local-path
  creationTimestamp: "2021-12-19T21:59:19Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    kustomize.toolkit.fluxcd.io/name: 12-persistence
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: pvc-61decf71-3bed-406f-8836-18dcbb3ef89e
  resourceVersion: "7392287"
  uid: 0e72847e-f54a-4c4a-b116-0f9adc6861a4
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: minio-stage
    namespace: minio-stage
    resourceVersion: "7392156"
    uid: caf3690d-3d2f-4de7-bb5c-48a05c1a41ee
  hostPath:
    path: /opt/local-path-provisioner/pvc-61decf71-3bed-406f-8836-18dcbb3ef89e_minio-stage_minio-stage
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - msigaming
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-path
  volumeMode: Filesystem
status:
  phase: Bound

So you can see what I actually include and don't include, here is the example in context, as a source (with the claimRef notably absent from my persistent volume spec, compare to above where the cluster has filled it in),

I didn't find any manual for how to harvest this kind of manifest from an existing cluster, I just guessed at how, and if I remember correctly my first guess was exactly what you did, I remember running into a similar "Lost" behavior although I am using local-path volumes so I'm not sure it's exactly comparable. The claimRef issue rings a bell for me, I think you should exclude it from your saved manifests and use volumeName in your pvc instead. Then let the cluster populate claimRef.

(Hope this helps!)

https://github.com/kingdonb/bootstrap-repo/blob/5ed47718684c2b47f2291bdd9d80428e8c0d6662/persistence/minio-stage-pvc.yaml#L19

@kingdonb
Copy link
Member

This thread may also help, it is from someone who specifically did use NFS as I guess you are, and the thread asserts you can and should be able to set claimRef, although I don't know for sure, it is definitely relevant material I'm not sure if it's very helpful:

kubernetes/kubernetes#23615 (comment)

It's not clear to me if my way is correct or incorrect, but it hasn't failed me so far and I have been testing this approach for a while.

@a-mcf
Copy link
Author

a-mcf commented Dec 28, 2021

@kingdonb - Thank you! I'll give your approach a whirl. It seems simpler than what I'm doing, so if it works, I'll be happy to roll with that approach.

FWIW, this was working flawlessly for months, so SOMETHING recently changed cause the issue. I didn't notice it right away because they initially show up as "bound" and only become "lost" after flux reconciliation.

Regardless, if I'm "holding it wrong" than I'm more than happy to try other approaches! I'm pretty new to all this and am always happy to learn. more. I'll report back with my results.

@a-mcf
Copy link
Author

a-mcf commented Dec 28, 2021

@kingdonb - Thank you for the help. Your approach does in fact work, and I think it's cleaner than what I was doing. I'm happy.

That said, I DO think that what I'm doing is supposed to work, especially given the other issue you linked above (it sounds like it's basically about getting what I was trying to do working) and my volumes DO stay bound absent flux reconciliation. This is true even if I kubectl apply -f ./config-pvc.yaml, so the behavior seems specific to flux and may be worth pursuing further, but I'll leave that up to your discretion. I'll close it out for now.

Thanks again!

@a-mcf a-mcf closed this as completed Dec 28, 2021
@hfreire
Copy link

hfreire commented Dec 28, 2021

I've observed the exact same behaviour on a similar setup as @a-mcf. The above fix seems to work. I would dare to pinpoint that this behaviour might have been introduced in flux v0.24.1.

@kingdonb
Copy link
Member

I'd like to reopen the issue so that someone sees it after the holiday, if it remains unexplained. However, due to a bug in GitHub right now I can only close the issue (even though it is already closed!)

Glad the issue is resolved for you 💭 👍

@kingdonb
Copy link
Member

I opened an issue to follow this up. In writing up #2279 I got somewhat more ambitious; we should explore "what is there to know" about persistent volumes. I would be happy if we had a document that explained just what we covered in this issue.

@kingdonb
Copy link
Member

If the behavior of Flux changed and it was a regression, let's address it with a separate issue. It can refer to this issue if the report here describes some symptoms that are important to the regression, but I haven't seen any direct evidence of that yet and it's unlikely to get much attention now, if perhaps there's an issue but nobody is blocked on it.

@tarioch
Copy link
Contributor

tarioch commented Jan 31, 2022

I'm seeing the same thing since I upgraded from kubernetes 1.21 to 1.22 (without chaning flux version). So I'm guessing this is something that is only present on 1.22+

One thing to note is that in my case, the PVCs are created by helm charts and in most of the cases I haven't found a way to set the volumeName (I know for some of the charts I can specify my own pvc, but in a lot of cases that's not possible). So until now it worked very well in my local env, to have the helm chart create a pvc that I would then just fill with my local pv.

@kingdonb
Copy link
Member

kingdonb commented Jan 31, 2022

@tarioch There are different behaviors with server-side apply from K8s 1.20.x to 1.23.x, and we've observed the variability between how Kubernetes behaves from one version to the next, and in turn how Flux behaves around it primarily here:

There are a number of related issues that may (should, hopefully will) cause many edge cases to reify and converge again, when the Flux 0.26.0 release is out. Please stay tuned and keep us updated if you notice any difference when that release is out (it should be within a couple of days at most.)

As far as, how to add volumeName to a volume that is created by a helmrelease, that is a tricky issue for cases where the Helm chart did not include any existingVolume provision. Usually, with Helm and at-large with respect to cloud providers, you don't create a PV manually: you might need to update it in order to set the reclaimPolicy or it might be set upstream in the storage class, but the PV is created dynamically by the PVC, and the important thing to note is that claimRef should never be filled out by a user. (The controller-manager will populate those fields when the persistent volume is being bound, unless things are not functioning as they should.)

The harbor chart is a good model: (https://github.com/goharbor/harbor-helm/blob/73248da3f25b024d27d4f395aacd116fc6770cbe/templates/redis/statefulset.yaml#L86-L104 and https://github.com/goharbor/harbor-helm/blob/73248da3f25b024d27d4f395aacd116fc6770cbe/templates/redis/statefulset.yaml#L61-L70) because it provides this existingClaim values parameter for every persistent volume claim that is used by the chart, so you can bring your own.

For cases where the chart doesn't help you at all, YSK that volumes like containers is a well-known type according to patchesStrategicMerge as I understand it, and you can use this to address the issue you described with those helm charts, by patching your own volume into the existing resources through the postRenderer kustomization feature of a HelmRelease.

Here's the best example I could find of that case in particular: https://github.com/kingdonb/bootstrap-repo/blob/698d43dab8d8a1cc151ac16bc4a9e3c9a87139ae/apps/minio/stage/minio-helmrelease.yaml#L18-L36

So in this case, the volume is an emptydir in the base chart, and I wanted to make it a permanent/persistent volume to use for cluster storage (that I'm using to back my Harbor instance.) I added a named persistent volume claim (you can create this however you want, use one that has been created by helm, etc.) and ensured the PV reclaimPolicy was set to Retain.

And that was it. I captured "cleaned up" versions (eg. kubectl neat and delete any claimRef as described before) of my PV and PVC into the git repo, so they will be able to be made available again if the cluster is destroyed and recreated, and then I write them into the resources, either using existingClaim or a bit more manually as shown above (for cases where that wasn't an option the original authors provided for us in the helm chart.)

The details around that workflow in particular are what I was hoping to explain in #2279. Off-the-cuff, that's about what I think the document that closes that issue will have to cover in a bit more depth.

@tarioch
Copy link
Contributor

tarioch commented Feb 2, 2022

Tried out with 0.26.0, unfortunately still fails.

At the moment I'm no longer managing the PV with flux (by adding kustomize.toolkit.fluxcd.io/reconcile=disabled), will see if I go the route with post renderers (will be a big change as I have to adjust lots and lots of configs).

geraldwuhoo added a commit to geraldwuhoo/homelab-iac that referenced this issue Feb 6, 2022
@jmriebold
Copy link

jmriebold commented Apr 26, 2022

Just ran into this same issue after upgrading to EKS 1.22, using Flux v0.29.3. Unfortunately removing the claimRef isn't an option, as we use a number of manually created PVs with matching PVCs (backed by AWS EFS).

At the moment we're unable to manage these volumes with Flux, because after Flux reconciles them, Kubernetes decides there are multiple PVCs referencing the same PV, and all PVCs change to status "lost", rendering them unusable.

EDIT: To add a little more detail, we're doing what's described here under "A Simpler Approach", since we have a relatively small number of static persistent volumes, and don't need to dynamically create them.

@Alan01252
Copy link

Same issue here, it feels like a regression in flux because this worked without problems when the cluster was running 1.21 but I understand the complications.

@kingdonb
Copy link
Member

kingdonb commented Apr 30, 2022

I have tried a number of different CSI providers and all of them seem to permit managing PV and PVC statically without setting ownerRef as a user. Those fields are meant to be set by the kubernetes controller manager at bind time.

In the PersistentVolumeClaim I make sure that spec.volumeName is set:

volumeName: pvc-1589d7e1-7ef2-458b-a74f-defc61caf13f

In the PV, which I captured through kubectl neat, there are a few annotations that still needed to be removed, those signifying that the controller manager has already taken ownership of this PV and bound it to the PVC.

pv.kubernetes.io/bind-completed: yes

I have an example up here: https://github.com/kingdonb/github-actions-demo/tree/main/intermission/synology/persistence in this example, Flux applies the PV and PVC alongside to any pods being created with an attempt to bind to a named PVC. It seems to work for me every time. No conflicts, as I'm not setting any fields that kube-controller-manager needs to own.

If EFS doesn't support this process for attaching existing persistent volumes, then it must be different. I am not sure because I don't have an EFS implementation that I can test with handy, I only have local-path storage, the synology CSI, and azure/gcp which I have tested all of these, and they seem to behave the same.

Sorry you are experiencing an issue here and are apparently blocked on this. If you are not able to work past this on your own, could you open a new issue and describe the process you are running, so we can try to help you through it there?

@kingdonb
Copy link
Member

Maybe the discussion can continue in #2456 as there seem to be others with similar issues, (please check in there or open a new issue.)

@jmriebold
Copy link

Thanks for the suggestions @kingdonb. I thought it was required to set owner, but maybe I was wrong. I'll do some testing and report back in the other issue thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants