-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify BackupStoreGetter to avoid BSL spec changes #5122
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5122 +/- ##
=======================================
Coverage 41.33% 41.33%
=======================================
Files 211 211
Lines 18443 18443
=======================================
Hits 7624 7624
Misses 10247 10247
Partials 572 572
Continue to review full report at Codecov.
|
@blackpiglet #5101 is what caused issue with OADP. Each time revalidation is called objectStoreGetter.Get is called inside Reconcile function
location's spec.Config is modified velero/pkg/persistence/object_store.go Lines 134 to 135 in c46fe71
OADP will try to undo any unexpected spec changes with BSL inside our custom resource. If BSL inside OADP custom resource do not contain This PR has to be looked at carefully since it can break some plugins that may expect to read |
@sseago @shubham-pampattiwar a better fix might be from OADP side where we put the following logic into OADP itself. This would not break anything upstream or third party plugins and fixes our problem. Since the location.Spec.Config code is from 17 months ago I would expect there to be some niche plugins relying on it already. velero/pkg/persistence/object_store.go Lines 130 to 140 in c46fe71
|
@kaovilai It's not really only an OADP issue -- this will happen any time a BSL is managed by an external controller/operator, since the Velero BSL validation code was actually modifying spec and updating the resource. The fix in this PR is to validate a copy of the BSL resource, not the same in-memory object that we're going to update status on and make a patch API call. This PR shouldn't affect actually calling the plugin, since I only modified the call to BackupStoreGetter used for validation here. But yes, we should confirm that the plugin only ever uses in-memory BSL modified by this method rather than making a separate Get call for it later, but if the latter were the case, then OADP would already be broken with respect to BSLs, since any time this is done, the OADP controller will un-do the change. Passing a deep copy of the BSL into the getter feels like the right thing to do here, as Velero shouldn't need to modify the spec like that -- if it did, then everyone who manages BSLs automatically would have to hack around this like you suggested OADP do. That feels like an awful hack. |
I could be completely wrong here and prior versions of Velero did end up modifying the BSL in-cluster, and prior versions of OADP ended up in a continual modify-the-BSL battle between OADP operator and Velero controller. It would be easy enough to verify with OADP 1.0.3 by doing a watch on the BSL resource yaml. |
@kaovilai ok, so basically every time we need an objectstore to work with (backup controller, restore controller, etc.) we call ObjectBackupStoreGetter.Get() which modifies the passed-in BSL object and then uses that modified config to initialize the plugin. Of note is that the Init call takes a map with the contents of config in it (so it comes from memory, not the cluster), and that's used to initialize the plugin. The plugin doesn't work with the BSL kube resource directly, only the map of config values passed in. Also of note is that in every other controller that calls ObjectBackupStoreGetter.Get() does not issue a patch or update API call on the BSL afterwards. Only this one does that. This one also only uses the initialize object store for validation, so I do think that not modifying the resource on disk is the original intent of the code here, since none of the controllers that use the object store for real work modify the in-cluster resource -- and the one that does is only initializing the object store for validation, so it seems like we shouldn't be putting this change into the cluster. There is one remaining part of the BSL controller reconcile that could still trip OADP up here -- the controller also updates |
@@ -122,7 +122,7 @@ func (r *BackupStorageLocationReconciler) Reconcile(ctx context.Context, req ctr | |||
} | |||
}() | |||
|
|||
backupStore, err := r.BackupStoreGetter.Get(&location, pluginManager, log) | |||
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment on why we are doing DeepCopy() here. Alternatively do deep copy in the objectStoreGetter Get()
function and add a comment there instead. Fixing Get function (so that it actually just get something and not setting anything else) rather than fixing Reconcile function which innocently called Get not expecting change.
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log) | |
// Using DeepCopy to avoid modifying in-cluster BackupStorageLocation. | |
// Get function modifies the location.Spec.Config which we don't want updated to the in-cluster version | |
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaovilai Yeah, I'm leaning towards the latter. The only aspect of the BSL that's passed into the ObjectStore plugin is actually the map of config values. Lets copy that map instead of deep copying the whole BSL, and pass it in. No side effect on the kube object. As for 3rd party plugins relying on the BSL in-cluster mod, while I don't think any are doing that now, if they are, it's a dangerous assumption even with the released 1.9 code, since as we've seen with OADP, the change isn't reliably made. Better to ensure it's never made so that breakage is obvious rather than nondeterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "Get" function should not generally have side effects. If it does have side effects, they had better be well-documented and made with good reason. In this case, they are neither documented nor made with any reason to do so.
I agree with using deepcopy to avoid overwritten, but |
May take more time to evaluate how to create a modification for all places, so agree with @kaovilai, adding more comments will be good enough for this case. |
@kaovilai Update PR leaves controller alone and modifies BackupStoreGetter @blackpiglet So in the original PR, we didn't need the DeepCopy elsewhere because the BSL passed in was never saved after modification. Still, it's confusing, so the updated PR simply makes a deep copy of BSL.Spec.Config and passes that into the object store, so there's no need for anyone to DeepCopy the entire BSL object before passing it into BackupStoreGetter. |
Pass in a new copy of the map of config values rather than modifying the BSL Spec.Config and then pass in that field. Signed-off-by: Scott Seago <[email protected]>
@@ -131,19 +131,25 @@ func (b *objectBackupStoreGetter) Get(location *velerov1api.BackupStorageLocatio | |||
return nil, errors.Errorf("backup storage location's bucket name %q must not contain a '/' (if using a prefix, put it in the 'Prefix' field instead)", location.Spec.ObjectStorage.Bucket) | |||
} | |||
|
|||
// Pass a new map into the object store rather than modifying the passed-in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Scott Seago [email protected]
Thank you for contributing to Velero!
Modify BackupStoreGetter to avoid BSL spec changes
Pass in a new copy of the map of config values rather than
modifying the BSL Spec.Config and then pass in that field.
Those extra fields in spec.config are only needed for the in-memory copy passed into plugins. Without this change to the BackupStoreGetter, the BSL spec is modified on disk and in the OADP case, the OADP controller then noticed the change and removed the fields, causing constant BSL spec updates, causing constant validation reconciles.
Does your change fix a particular issue?
Fixes #(issue)
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.