Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't resize PVC associated with Standalone instance #558

Closed
vzabawski opened this issue Oct 28, 2021 · 9 comments
Closed

Can't resize PVC associated with Standalone instance #558

vzabawski opened this issue Oct 28, 2021 · 9 comments
Assignees

Comments

@vzabawski
Copy link

vzabawski commented Oct 28, 2021

I've created a standalone instance with a var disk too small, so I decided to resize it, but wasn't able to do it.

How to reproduce:

  1. Create Standalone instance:
apiVersion: enterprise.splunk.com/v1
kind: Standalone
metadata:
  name: indexer-z2
spec:
  etcVolumeStorageConfig:
    storageClassName: default
    storageCapacity: 10Gi
  varVolumeStorageConfig:
    storageClassName: default
    storageCapacity: 25Gi
  1. Wait until the instance is deployed and its volume is allocated
  2. Change PVC size:
apiVersion: enterprise.splunk.com/v1
kind: Standalone
metadata:
  name: indexer-z2
spec:
  etcVolumeStorageConfig:
    storageClassName: default
    storageCapacity: 10Gi
  varVolumeStorageConfig:
    storageClassName: default
    storageCapacity: 1.5Ti
  1. No changes to PVC after 10 minutes:
$ kubectl get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-etc-splunk-indexer-z2-standalone-0   Bound    pvc-281ca01c-e2ce-4332-a76b-3701bb9e7df8   15Gi       RWO            default        105m
pvc-var-splunk-indexer-z2-standalone-0   Bound    pvc-3a200242-9f03-4bc2-b707-e3f53d51742c   25Gi       RWO            default        105m

Expected result: PVC size is changed to the new value.
Actual result: PVC size stays the same.
Seems like the operator doesn't track this change. After manual modification of the PVC, its size change was detected, but it failed to resize due to Azure limitations.
Additional issue, but not directly related to PVC resizing:
I've tried to scale down standalone instances, but it didn't worked, the number of instances stayed the same (even after 10 minutes) after running the following command:

$ kubectl scale standalone/indexer-z2 --replicas=0
$ kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-899f4b66f-xzjkw   1/1     Running   0          104m
splunk-default-monitoring-console-0        1/1     Running   0          93m
splunk-indexer-z2-standalone-0             1/1     Running   0          5m42s
splunk-operator-845b9ccb45-f2p65           1/1     Running   0          96m
$ kubectl get standalone
NAME         PHASE   DESIRED   READY   AGE
indexer-z2   Ready   1         1       99m

At the same time, the number of replicas is set to 0 inside indexer-z2 object:

$ kubectl get standalone -o yaml
<...>
    etcVolumeStorageConfig:
      storageCapacity: 10Gi
      storageClassName: default
    replicas: 0
    varVolumeStorageConfig:
      storageCapacity: 1.5Ti
      storageClassName: default
<...>

So, additionally it would be nice to be able to scale down to zero Splunk replicas, so no pods are running that use PVC.

Environment:

  • splunk-operator 1.0.1
  • AKS cluster (1.20.7)
@pogdin
Copy link
Collaborator

pogdin commented Oct 28, 2021

Hi, can you do a kubectl get sc and post the output? Thanks.

@vzabawski
Copy link
Author

vzabawski commented Oct 29, 2021

Hi, sure. Here it is:

$ kubectl get sc
NAME                PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile           kubernetes.io/azure-file   Delete          Immediate              true                   2d17h
azurefile-premium   kubernetes.io/azure-file   Delete          Immediate              true                   2d17h
default (default)   kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   2d17h
managed-premium     kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   2d17h

Also, here's how default storage class looks like.

$ kubectl get sc/default -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2021-10-26T14:18:55Z"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    kubernetes.io/cluster-service: "true"
  managedFields:
  - apiVersion: storage.k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:allowVolumeExpansion: {}
      f:metadata:
        f:annotations:
          .: {}
          f:storageclass.kubernetes.io/is-default-class: {}
        f:labels:
          .: {}
          f:addonmanager.kubernetes.io/mode: {}
          f:kubernetes.io/cluster-service: {}
      f:parameters:
        .: {}
        f:cachingmode: {}
        f:kind: {}
        f:storageaccounttype: {}
      f:provisioner: {}
      f:reclaimPolicy: {}
      f:volumeBindingMode: {}
    manager: kubectl-create
    operation: Update
    time: "2021-10-26T14:18:55Z"
  name: default
  resourceVersion: "347"
  uid: 5c311fec-7681-40b2-a32b-3f6bb82f5970
parameters:
  cachingmode: ReadOnly
  kind: Managed
  storageaccounttype: StandardSSD_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

@vzabawski
Copy link
Author

vzabawski commented Oct 29, 2021

Just to clarify, I believe bug from operator's side is that it doesn't pick change, because if the change of volume size was picked, I could see an error with kubectl describe pvc/<name> command, like I did after a manual modification of PVC.
The error itself seems to be unrelated to splunk-operator, it seems like some Azure bug, that it can't extend the volume although the allowVolumeExpansion flag is set for the storage class. I can address this part with Azure support.

@vzabawski vzabawski reopened this Oct 29, 2021
@vzabawski
Copy link
Author

Hi @pogdin. Do you need any additional information?

@BGrasnick
Copy link

I am experiencing the same issue with a SearchHeadCluster where I adjusted the varVolumeStorageConfig => storageCapacity and updated the CR. The change gets picked up by the SearchHeadCluster but doesn't get propagated down to the StatefulSets and therefore no change is applied.

Manually changing the StatefulSet is not allowed with an error of: # * spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

I have to uninstall und reinstall the whole environment.
Is this a known issue and will hopefully be fixed in the future?
Or what is the best approach to increase volume sizes of CR like Standalone, SearchHeadCluster or IndexerCluster?

@akondur akondur self-assigned this Apr 6, 2022
@akondur
Copy link
Collaborator

akondur commented Apr 18, 2022

Hi @vzabawski @BGrasnick ,

CR spec and operator:
When the CR common spec parameters varVolumeStorageConfig and etcVolumeStorageConfig are configured, it is absorbed into statefulSet as a resource parameter in the VolumeClaimTemplate config of the statefulSet. On an update of the common spec for more storage the operator currently does not check for updates on the VolumeClaimTemplates to update the statefulSet.

StatefulSet limitation:
The statefulSet K8s resource currently does not allow modification of any parameters of the VolumeClaimTemplates. On an attempt to update the same it fails with the error:

"StatefulSet.apps \"splunk-test3-standalone\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden"}

There have been multiple attempts(kubernetes/enhancements#2842) from the open source community to allow the configuration of PVC storage size via the statefulset volumeClaimTemplate config but there has not been any progress here at the moment.

Alternatives with PVCs:
From Kubernetes 1.11 as per this article Kubernetes allows for expansion of PVC volumes after creation by updating the spec. However there are some block storage volume types such as GCE-PD, AWS-EBS etc which needs a restart of the pods in a statefulSet for the file system resize to happen. Network file systems are an exception here.
On managed K8S clusters like GKE the volume expansion still is a beta feature not fully supported(https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-expansion).

Conclusion:
Due to the multitude of different file storage types, managed K8s cluster types finding a one-stop solution to resize PVCs dynamically via the operator is not possible at the moment. We will follow further advancements on the statefulSet update limitation as well as standardization of the PVC expansion process with multiple file storage types, managed clusters to assess whether the operator can provide a solution.

@adamrushuk
Copy link

This limitation on updating the PVC size within a StatefulSet is a PITA, but there is a workaround. I did a blog post on two methods - one with downtime, and one without downtime: https://adamrushuk.github.io/increasing-the-volumeclaimtemplates-disk-size-in-a-statefulset-on-aks/

I imagine this could be orchestrated within the Splunk Operator, although I appreciate it may not be the easiest, esp across multiple cloud providers when it comes to the external disk resize operation. That being said, I know Azure has it's LiveResize feature in public preview. This worked very well during my testing.

@akondur
Copy link
Collaborator

akondur commented Apr 22, 2022

Hi @adamrushuk , thank you for sharing the blog post on resizing PVCs. While we have assessed a number of workaround solutions to resizing PVCs on multiple platforms there doesn't seem to be a one-stop solution which addresses all platforms. As mentioned here many managed K8s services like GKE still have their feature in beta where an orchestration from the operator might not work. Looking forward with standardization of the PVC expansion process with multiple file storage types a solution can be carved out by the operator.

@akondur
Copy link
Collaborator

akondur commented May 2, 2022

@BGrasnick We will keep an eye out for updates from cloud providers, K8s for dynamic PVC resizing and provide a solution via the operator at an appropriate time. For now closing the issue due to the reasons explained above.

@akondur akondur closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants