Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross cluster restore failed because backup only backups v1beta1 crd #6796

Closed
half-life666 opened this issue Sep 8, 2023 · 5 comments
Closed
Assignees

Comments

@half-life666
Copy link
Contributor

half-life666 commented Sep 8, 2023

What steps did you take and what happened:

In below setup:

Source cluster: k8s 1.18, istio 1.5.9
Target cluster: k8s 1.23, istio 1.14
Using application which has virtualservices.networking.istio.io CR (e.g., istio sample app bookinfo)

  1. Backup app with default namespace option from source cluster
  2. Restore app to target cluster, restore will fail with errors, but restored application state is OK

What did you expect to happen:
Restore should not fail

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

backup spec:

spec:
  defaultVolumesToRestic: false
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  hooks: {}
  includedNamespaces:
  - bookinfo
  labelSelector:
    matchExpressions:
    - key: agent.jibudata.com/exporthandler
      operator: DoesNotExist
  metadata: {}
  snapshotVolumes: false

restore spec:

spec:
  backupName: bookinfo-0-resource-r8ndq-4mwgp
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  hooks: {}
  includeClusterResources: true
  includedNamespaces:
  - bookinfo
  namespaceMapping:
    bookinfo: bookinfo
  preserveNodePorts: false
  restorePVs: true

backup logs:

time="2023-09-08T06:43:34Z" level=info msg="Executing RemapCRDVersionAction" backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp cmd=/velero logSource="/pkg/backup/remap_crd_version_action.go:61" pluginName=velero
time="2023-09-08T06:43:34Z" level=info msg="CustomResourceDefinition gateways.networking.istio.io appears to be v1beta1, fetching the v1beta version" CRD=gateways.networking.istio.io backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp cmd=/velero logSource="/pkg/backup/remap_crd_version_action.go:113" plugin=RemapCRDVersionAction pluginName=velero
time="2023-09-08T06:43:34Z" level=info msg="Found associated CRD virtualservices.networking.istio.io to add to backup" backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp logSource="/pkg/backup/backup.go:513"
time="2023-09-08T06:43:34Z" level=info msg="Backing up item" backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp logSource="/pkg/backup/item_backupper.go:98" name=virtualservices.networking.istio.io namespace= resource=customresourcedefinitions.apiextensions.k8s.io
time="2023-09-08T06:43:34Z" level=info msg="Executing custom action" backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp logSource="/pkg/backup/item_backupper.go:304" name=virtualservices.networking.istio.io namespace= resource=customresourcedefinitions.apiextensions.k8s.io
time="2023-09-08T06:43:34Z" level=info msg="Executing RemapCRDVersionAction" backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp cmd=/velero logSource="/pkg/backup/remap_crd_version_action.go:61" pluginName=velero
time="2023-09-08T06:43:34Z" level=info msg="CustomResourceDefinition virtualservices.networking.istio.io appears to be v1beta1, fetching the v1beta version" CRD=virtualservices.networking.istio.io backup=qiming-backend/bookinfo-0-resource-r8ndq-4mwgp cmd=/velero logSource="/pkg/backup/remap_crd_version_action.go:113" plugin=RemapCRDVersionAction pluginName=velero

restore logs:

time="2023-09-08T06:44:34Z" level=debug msg="APIGroupVersionsFeatureFlag Priority 1: Cluster preferred API group version v1beta1 found in backup for virtualservices.networking.istio.io" logSource="/pkg/restore/prioritize_group_version.go:98" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv
time="2023-09-08T06:44:34Z" level=info msg="Resource 'customresourcedefinitions.apiextensions.k8s.io' will be restored at cluster scope" logSource="/pkg/restore/restore.go:1947" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv
time="2023-09-08T06:44:34Z" level=info msg="object customresourcedefinitions.apiextensions.k8s.io/v1-preferredversion//gateways.networking.istio.io is parsed" logSource="/pkg/restore/restore.go:1980" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv
time="2023-09-08T06:44:34Z" level=info msg="object customresourcedefinitions.apiextensions.k8s.io/v1-preferredversion//virtualservices.networking.istio.io is parsed" logSource="/pkg/restore/restore.go:1980" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv

time="2023-09-08T06:44:34Z" level=info msg="Attempting to restore CustomResourceDefinition: virtualservices.networking.istio.io" logSource="/pkg/restore/restore.go:1256" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv
time="2023-09-08T06:44:34Z" level=error msg="error restoring virtualservices.networking.istio.io: the server could not find the requested resource" logSource="/pkg/restore/restore.go:1407" restore=qiming-backend/bookinfo-0-resource-r8ndq-sk5hv

backup tar:
bookinfo-0-resource-r8ndq-4mwgp.tar.gz

Anything else you would like to add:

By looking at both backup and restore logs, and the resources backed up, I think the problem is at backup phase while crd virtualservices is being backed up, v1beta1 is choosen, instead of v1 version. So at restore phase, the only version to be restored is v1beta1, which does not exist in target cluster.

Looking at remap logic:

	// This plugin will exit if the CRD was installed via v1beta1 but the cluster does not support v1beta1 CRD
	supportv1b1 := false
CheckVersion:
	for _, g := range a.discoveryHelper.APIGroups() {
		if g.Name == apiextv1.GroupName {
			for _, v := range g.Versions {
				if v.Version == apiextv1beta1.SchemeGroupVersion.Version {
					supportv1b1 = true
					break CheckVersion
				}
			}

		}
	}
	if !supportv1b1 {
		a.logger.Info("Exiting RemapCRDVersionAction, the cluster does not support v1beta1 CRD")
		return item, nil, nil
	}

The logic only checks if discovery APIGroups() has v1beta1, do we need to also check cluster preferred version? Or any other clue we can use here?

In summary, there are 2 approaches to fix this which might need more thinking:

  • Add more clue in remap logic, for remap to choose the right version, but I don't know if there is any clue to use here, and if that will break the original fix logic
  • Since APIGroupVersionsFeatureFlag is added, but backing up ns will only backup the CRD through backing up CR, should we consider backing up both versions of CRD here?

Environment:

  • Velero version (use velero version): 1.7 base with some back portings
  • Velero features (use velero client config get features): APIGroupVersionsFeatureFlag, CSI
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@draghuram
Copy link
Contributor

I haven't checked this in detail but see if this helps:
https://velero.io/docs/main/enable-api-group-versions-feature/

@half-life666
Copy link
Contributor Author

half-life666 commented Sep 10, 2023

I haven't checked this in detail but see if this helps: https://velero.io/docs/main/enable-api-group-versions-feature/

Thanks, APIGroupVersionsFeatureFlag is already added here

@half-life666
Copy link
Contributor Author

half-life666 commented Sep 10, 2023

Some other approaches:

  • No matter backing up which CRD version, at restore, add another conversion logic e.g., to convert v1beta1 CRD to v1 CRD, so when v1 CRD is restored, we should hit an already exists error, instead of the server could not find the requested resource error
  • Add a flag to indicate, when backing up CR, do not backup assosiated CRD, e.g., because the user know that the istio in target cluster can deal with v1beta1 CR (in our real example, istio 1.14 can handle virtualservices v1beta1 CR, which means the restore is actually successful in terms of application state)

@half-life666
Copy link
Contributor Author

Just saw same issue #5146

@qiuming-best qiuming-best added the Needs triage We need discussion to understand problem and decide the priority label Sep 12, 2023
@reasonerjt reasonerjt removed the Needs triage We need discussion to understand problem and decide the priority label Sep 13, 2023
@reasonerjt
Copy link
Contributor

Closing as this has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants