Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to skip backup of volumes based on type #5035

Closed
gsadhani opened this issue Jun 20, 2022 · 21 comments
Closed

Need a way to skip backup of volumes based on type #5035

gsadhani opened this issue Jun 20, 2022 · 21 comments
Assignees
Milestone

Comments

@gsadhani
Copy link

Describe the problem/challenge you have
When taking a backup, Velero tries to all pod volumes available. Some volumes can be too large or backed up using other solutions. It is desirable to skip backup of volumes based on its type.

Describe the solution you'd like
A way to configure the backup to skip certain types of volumes like NFS.

Anything else you would like to add:
None

Environment:

  • Velero version (use velero version): 1.8.1
  • Kubernetes version (use kubectl version): 1.22
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: vSphere
  • OS (e.g. from /etc/os-release): Photon

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@reasonerjt
Copy link
Contributor

@gsadhani Could you clarify why the opt-out approach can't solve the problem?
Is it b/c you don't want the user to update his workload to skip the backup

@reasonerjt reasonerjt added the 1.10-candidate The label used for 1.10 planning discussion. label Jun 21, 2022
@gsadhani
Copy link
Author

@gsadhani Could you clarify why the opt-out approach can't solve the problem? Is it b/c you don't want the user to update his workload to skip the backup

@reasonerjt yes, identify each pod in the workload containing the specific type of volume and annotating them could be cumbersome. It would be very convenient if we could skip the volumes based on volume type which can be specified as a backup configuration parameter.

@reasonerjt reasonerjt added the Needs Product Blocked needing input or feedback from Product label Jun 21, 2022
@reasonerjt reasonerjt self-assigned this Jun 21, 2022
@euclidsun
Copy link

there are many factors could differentiate volume type : storage class, or bigger, or nfs, or specific drivers.
is it support to add the label velero.io/exclude-from-backup=true to mark the pv as excluded?

@pradeepkchaturvedi
Copy link

pradeepkchaturvedi commented Jun 22, 2022

We can consider K8S supported volume types only for this scenario https://kubernetes.io/docs/concepts/storage/volumes/. Also, initially we can start with allowed skip volume type list for limited volume types. Usually some of the volume types can be backed using storage systems like nfs, fc, iscsi OR ephemeral/non-persistent volumes like emptyDir. hostPath, downwardAPI, local etc.

This will reduce manual effort for annotating all pods. If there is autoscaling defined for pods, user need to find way to add annotation automatically.

@euclidsun
Copy link

found more SKIP backup requirements, we should consider all cases and comeup with a generic way to handle skip:

#957 skip restic backup/restore: not backup the PV data, but backup the cluster state.
#4129 Exclude folders for backing up
#2413 Namespace skip (this could be enhanced as separate enhancement)

In a scenario where DBS excludes the Pods with volume from backup, then Pods will get stuck in pending state during the recovery phase. What is the better approach to handle and avoid manual effort? DBS tries to reduce the risk of failure or delay in backup & restore that occurred due to volume related workloads. E.g. 80% workloads running without volume and 20% Pods running with volume. During restore this 20% becomes risk for 80% of the workload recovery. The current approach the account team suggested is to create empty PV manually. We exclude all of the PVs while taking velero backup and create empty PV manually for the 20% otherwise, opt-in to backup the PV selectively.

@euclidsun
Copy link

so in summary, the requirements of skip:

  • option to skip all PV data
  • option to skip specified PV type (inline, nfs)
  • option to skip specified PV size
  • option to skip folders ( I think this could be lower priority than others given we will evolve to snapshot/cbt based backup)

@reasonerjt
Copy link
Contributor

I understand the requirement that we may want to introduce a way for user to skip backup of certain PV but not having to update the spec of the pods.

However, I'm not sure if skipping backup PV based on type is a good idea, this may be handy for a certain user but is it generic enough that users have the similar usage pattern that for some type of PV they have a separate backup process and velero does not need to handle them? Is it normal he has separate backup process for two of the nfs PVs but for other nfs PVs he still needs velero's help?

I think skipping PVs based on type is do-able but we should avoid adding features that are only feasible to a very small group of users.

@euclidsun
Copy link

euclidsun commented Jul 8, 2022

it does not have to be a specific type defined here: https://kubernetes.io/docs/concepts/storage/volumes/

typical user story:

  • as a platform operator, i know what backend storage i am using, so i want an easy way to skip the certain type storage being backup by velero.

Would it be better by using name of StorageClass? Platform operator should know well how many storage class they offer to applications. That way, we do not worry about the specific type of PV, and also give the flexility to admin.

A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called "profiles" in other storage systems.

@jerry-jibu
Copy link

@euclidsun In our case, some PVCs are only used for logging while another PVCs are for DBs and configurations. I think It's difficult to only classify them by StorageClass. Personally I think we may need a more flexible way from velero point of view, such as included/excluded arrays to have PVC/volume (in-tree) names or kind of "GVNR" style resource selector.

Then upper application can configure or calculate required volume lists to velero.

@ywk253100
Copy link
Contributor

@blackpiglet Let's start the design in 1.10 timeframe

@blackpiglet
Copy link
Contributor

blackpiglet commented Sep 1, 2022

By far, I thought out of two scenarios of skipping volume backing up.

  • Volumes are mounted multiple times in a backup. It should be handled only once. - I think this will reduce the work for opt-out volumes like NFS, when the same volume mounted many times.
  • "backup.velero.io/backup-volumes" and "backup.velero.io/backup-volumes-excludes" should be handled on controller level, then pod created from controller scaling can also be handled.

@blackpiglet
Copy link
Contributor

I think filtering volume by type also has some value, but there may be not many suitable use cases.
This is useful when volume's dataSource is not PVC.
For example, if customer want to skip all emptyDir type of volumes, this would work. For nfs volumes, not sure whether customer using NFS volume by this way.
Filtering by StorageClass name should be useful for PVC case.

@blackpiglet
Copy link
Contributor

blackpiglet commented Sep 2, 2022

Another scenario is stated here #957 (comment).
Support only backing up one replica's volumes in a controller case.

@blackpiglet
Copy link
Contributor

By far, I thought out of two scenarios of skipping volume backing up.

  • Volumes are mounted multiple times in a backup. It should be handled only once. - I think this will reduce the work for opt-out volumes like NFS, when the same volume mounted many times.
  • "backup.velero.io/backup-volumes" and "backup.velero.io/backup-volumes-excludes" should be handled on controller level, then pod created from controller scaling can also be handled.

Another scenario is stated here #957 (comment). Support only backing up one replica's volumes in a controller case.

@gsadhani @pradeepkchaturvedi @reasonerjt
What's your opinion on this proposals?

@danfengliu danfengliu removed the 1.10-candidate The label used for 1.10 planning discussion. label Sep 7, 2022
@reasonerjt
Copy link
Contributor

I think the higher priority is to design a mechanism and format of the filter in bakup CR so user can skip the PVs during backup, we will also need to decide if this filter applies to restic only or both restic and snapshot.
We should also consider the case when the filter conflicts with the opt-in/out annotations.

A design proposal will be drafted in v1.10 timeframe.

@sseago
Copy link
Collaborator

sseago commented Sep 7, 2022

@reasonerjt It was mentioned on yesterday's community call that we may want to also use the filter for deciding which PVs to back up with restic/kopia vs. snapshots.

@blackpiglet
Copy link
Contributor

@sseago
Does that mean this filter is wanted to backup PVs by both uploader and snapshot?

@sseago
Copy link
Collaborator

sseago commented Sep 8, 2022

@blackpiglet I'm not sure what design/UX is best here -- separate filters for "what back up via kopia/restic" vs. "what to back up via snapshot", or one filter for "include these volumes", and another (depending on opt-in vs. opt-out for kopia) for "among the volumes backed up, these use kopia". If we're not careful, this can get very confusing for users. But it was brought up on the call this week that some users wanted a way to identify restic/kopia vs. csi/snapshot volumes without having to individually annotate one category or the other.

@blackpiglet
Copy link
Contributor

@sseago
Thanks for clarification. I will try to unify both in design.

@blackpiglet
Copy link
Contributor

There is some more requirements related to this topic from #5340.

@qiuming-best
Copy link
Contributor

close it for the implement pr is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests