Velero fails to restore statefulsets.apps #4782

son-la · 2022-03-28T14:20:28Z

What steps did you take and what happened:

Create a backup including a statefulset
Verify backup is created successfully with velero describe --details: Statefulset object is in the resource list
Verify statefulset object exists in S3 folder
Create restore from backup.
All other resources are restored, except for the statefulset object. Even all of the pods are restored succesfully, but they are not grouped in the statefulset object. An error message is seen velero restore describe

What did you expect to happen:
Statefulset object is restored successfully
The following information will help us better understand what's going on:

Restore log seems normal. There's only one statefulset object and it is said to restore successfully: https://gist.github.com/son-la/f02d546f9e0d68cfdc9f4bfef279f480

Anything else you would like to add:
The restoration did happen successfully for other stateful sets but fails for this stateful set. I tried to spot the difference between the working on and the not working one but nothing special I can find. The error message here is too cryptic for me to know where to look next

Environment:

Velero version (use velero version):
Client:
Version: v1.8.1
Git commit: 18ee078
Server:
Version: v1.8.1
Velero features (use velero client config get features): No
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.11", GitCommit:"27522a29febbcc4badac257763044d0d90c11abd", GitTreeState:"clean", BuildDate:"2021-09-15T19:16:25Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes installer & version: Rancher RKE
Cloud provider or hardware configuration: VMWare
OS (e.g. from /etc/os-release): RHEL8.4

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

reasonerjt · 2022-03-31T06:16:26Z

@son-la
There should be some more error in velero's log to help us understand where the nil pointer error is thrown.

Please reproduce the issue, use velero debug to generate the log bundle and attach to the issue.

son-la · 2022-03-31T09:13:49Z

velero-bundle.zip
Thanks for the reply. I generate velero-bundle specifically for that failed restore.

I search and replace some sensitive information out (S3 endpoint, bucket name,) and zip it again. Otherwise, everything should be there.

In this bundle, the failed restoration is zeebe-restore

reasonerjt · 2022-04-05T16:47:25Z

Found such messages in velero's log:

time="2022-03-31T09:02:41Z" level=info msg="Executing ChangeStorageClassAction" cmd=/velero logSource="pkg/restore/change_storageclass_action.go:68" pluginName=velero restore=velero/zeebe-restore
time="2022-03-31T09:02:41Z" level=debug msg="Getting plugin config" cmd=/velero logSource="pkg/restore/change_storageclass_action.go:71" pluginName=velero restore=velero/zeebe-restore
time="2022-03-31T09:02:41Z" level=info msg="Done executing ChangeStorageClassAction" cmd=/velero logSource="/usr/local/go/src/runtime/panic.go:1038" pluginName=velero restore=velero/zeebe-restore

Did you setup a configmap to change the storage class, and does it have data field?

https://velero.io/docs/v1.8/restore-reference/#changing-pvpvc-storage-classes

son-la · 2022-04-06T12:06:39Z

Yes, there's a configmap change from src -> dst cluster. This extra config map is deployed when installing velero to the dst cluster

configMaps:
  change-storage-class-config:
    labels:
      velero.io/plugin-config: ""  
      velero.io/change-storage-class: RestoreItemAction
    data:
      vmware-volume: vmware

reasonerjt · 2022-04-07T06:15:55Z

@son-la
Thanks for the reply.
I think based on the logs, the nil pointer happens in this func:

velero/pkg/restore/change_storageclass_action.go

Line 67 in 69f6c8d

    
           func (a *ChangeStorageClassAction) Execute(input *velero.RestoreItemActionExecuteInput) (*velero.RestoreItemActionExecuteOutput, error) {

Unfortunately, the log does not contain the stack trace. I was thinking it happened in this line:

velero/pkg/restore/change_storageclass_action.go

Line 77 in 69f6c8d

if config == nil || len(config.Data) == 0 {

Is the snippet from the output of kubectl? Based on the indent the data should not be at the same level as labels. Could you double check?

If this is not the problem, the best thing we can do next step is to add more log in the func to find where the nil pointer is thrown.

son-la · 2022-04-07T06:55:53Z

Thanks for the troubleshooting effort.

The snippet is actually in the helm values.yml file when installing velero.

So does it mean that if I'm able to keep the storage class name to be the same, statefulsets.apps can be restored successfully?

reasonerjt · 2022-04-08T01:45:37Z

I don't use helm chart very much.

If you use kubectl to check the actual configmap, it will give us a better understanding how it looks.

If you choose not to change the storageclass, you will probably not hit the same nil pointer issue.

son-la · 2022-04-13T07:05:52Z

Thanks for the answer. Here's the configmap created by helm installation

➜ kubectl get configmap/velero-change-storage-class-config -n velero -o yaml
apiVersion: v1
data:
  vmware-volume: vmware
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: velero
    meta.helm.sh/release-namespace: velero
  creationTimestamp: "2022-04-13T07:03:39Z"
  labels:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: velero
    helm.sh/chart: velero-2.29.4
    velero.io/change-storage-class: RestoreItemAction
    velero.io/plugin-config: ""
  name: velero-change-storage-class-config
  namespace: velero
  resourceVersion: "387275"
  uid: 4c4e7501-b64f-40e9-afaa-a79615bdbd87

stale · 2022-06-12T13:19:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-06-27T23:10:29Z

Closing the stale issue.

Mitigate the issue mentioned in vmware-tanzu#4782 When there's a bug or misconfiguration that causes nil pointer there will be more stack trace information to help us debug. Signed-off-by: Daniel Jiang <[email protected]>

UristMcMiner · 2022-07-21T15:46:27Z

Hi, this issue still persists in 1.9.0, and it is a crucial feature to migrate applications to another storage.

stefanrepl · 2022-08-16T22:08:39Z

I am also running into this. I have ran it after the merge of the stack trace for panic and see this error log when trying to restore my StatefulSet:
time="2022-08-10T21:55:54Z" level=error msg="Namespace default, resource restore error: error preparing statefulsets.apps/default/kotsadm-postgres: rpc error: code = Aborted desc = plugin panicked: runtime error: invalid memory address or nil pointer dereference" logSource="pkg/controller/restore_controller.go:504" restore=velero/instance-fwkh5.kotsadm

I am going through the same steps posted from OP, and have confirmed my StatefulSet is present in my backup that is created.

For what it is worth, I do not see this issue when testing with version 1.7.1 and I do see the issue in versions 1.8.1, and 1.9.0, that is all I have tested so far.

Can we reopen this issue? @reasonerjt

divolgin · 2022-08-24T23:10:59Z

I have confirmed that deleting this config map allows the restore to complete successfully: change-storage-class-config. The problem must be in the ChangeStorageClassAction as pointed out earlier.

Tested with 1.9.0 and 1.9.1

@reasonerjt Please re-open this issue.

sseago · 2022-08-25T15:24:05Z

Reopening because it was never resolved (closed by stale bot) and theree's now a PR submitted to fix the issue.

Mitigate the issue mentioned in vmware-tanzu#4782 When there's a bug or misconfiguration that causes nil pointer there will be more stack trace information to help us debug. Signed-off-by: Daniel Jiang <[email protected]>

reasonerjt added the Needs info Waiting for information label Mar 31, 2022

reasonerjt self-assigned this Apr 1, 2022

reasonerjt added Needs investigation and removed Needs info Waiting for information labels Apr 1, 2022

reasonerjt added Needs info Waiting for information and removed Needs investigation labels Apr 5, 2022

reasonerjt mentioned this issue Apr 7, 2022

stacktrace is lost when panic happens in "ChangeStorageClassRestoreItemAction" #4815

Closed

reasonerjt added the Helm Issues related to Helm charts label Apr 8, 2022

stale bot added the staled label Jun 12, 2022

stale bot closed this as completed Jun 27, 2022

reasonerjt mentioned this issue Jul 10, 2022

Dump stack trace when the plugin server handles panic #5110

Merged

3 tasks

divolgin mentioned this issue Aug 25, 2022

Don't panic when storageClassName is not set in stateful sets #5247

Merged

3 tasks

sseago reopened this Aug 25, 2022

sseago closed this as completed in #5247 Aug 29, 2022

blackpiglet added the 1.9.2-candidate label Aug 30, 2022

qiuming-best removed the 1.9.2-candidate label Sep 5, 2022

This was referenced Sep 7, 2022

Don't panic when storageClassName is not set in stateful sets #5298

Closed

Don't panic when storageClassName is not set in stateful sets #5301

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero fails to restore statefulsets.apps #4782

Velero fails to restore statefulsets.apps #4782

son-la commented Mar 28, 2022 •

edited

Loading

reasonerjt commented Mar 31, 2022

son-la commented Mar 31, 2022

reasonerjt commented Apr 5, 2022

son-la commented Apr 6, 2022

reasonerjt commented Apr 7, 2022

son-la commented Apr 7, 2022

reasonerjt commented Apr 8, 2022

son-la commented Apr 13, 2022

stale bot commented Jun 12, 2022

stale bot commented Jun 27, 2022

UristMcMiner commented Jul 21, 2022

stefanrepl commented Aug 16, 2022 •

edited

Loading

divolgin commented Aug 24, 2022

sseago commented Aug 25, 2022

Velero fails to restore statefulsets.apps #4782

Velero fails to restore statefulsets.apps #4782

Comments

son-la commented Mar 28, 2022 • edited Loading

reasonerjt commented Mar 31, 2022

son-la commented Mar 31, 2022

reasonerjt commented Apr 5, 2022

son-la commented Apr 6, 2022

reasonerjt commented Apr 7, 2022

son-la commented Apr 7, 2022

reasonerjt commented Apr 8, 2022

son-la commented Apr 13, 2022

stale bot commented Jun 12, 2022

stale bot commented Jun 27, 2022

UristMcMiner commented Jul 21, 2022

stefanrepl commented Aug 16, 2022 • edited Loading

divolgin commented Aug 24, 2022

sseago commented Aug 25, 2022

son-la commented Mar 28, 2022 •

edited

Loading

stefanrepl commented Aug 16, 2022 •

edited

Loading