-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to take GCS-based backups #302
Allow users to take GCS-based backups #302
Conversation
This commit adds first-pass support for exposing Solr's GcsBackupRepository through our operator configuration. This WIP support has a number of caveats and downsides: - GCS backups eschew the "persistence" step that currently follows normal backups - GCS backups are only included in Solr 8.9+, but there's no check for this currently. - operator logic currently assumes that exactly 1 type of backup config will be provided on a given solrcloud object (i.e. GCS backups and 'local' PV backups are mutually exclusive for a solrcloud. - no automated tests have been added - no documentation of has been added, beyond the examples on issue apache#301
- changes CRDs to have explicit names for each backup repository - multiple backup repositories now supported - added "managed" vs unmanaged backup distinction
I've updated this to use different CRD's (see the latest comment on the related issue #301). This also adds support for defining multiple backup repositories in your solrcloud definition which is pretty cool. Still needs tests, docs for the new CRD fields, and there's currently a bug where GCS backups succeed but this success is never persisted in the kube status. Hope to fix those shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gerlowskija , I think this is an awesome start and pretty far along.
I've made some feedback, but I will definitely be helping more when the time permits. I'm not sure this will get into the v0.4.0 release, but hopefully we can have another release soon after.
Thanks for the review Houston. Just getting back from some time off now; gonna try to address your comments in the next day or two, as well as getting the docs and tests in order. |
Haven't touched this in a few weeks. Both because of some competing priorities where I work, and because I've been scrambling to get things done before I disappear for an upcoming paternity leave. But just wanted to say that this is still on my radar and I'm hoping to return to it as soon as I can. |
Congrats!!! 🎉 No worries, I've been busy on the S3 Repo stuff and the v0.4.0 release. I do plan on taking this forward over the next few weeks. Mind if I make some changes while you are working on other stuff? |
Thanks!
Not at all, thanks for the help! My plate cleared up a bit after posting the other day, so I'll actually be picking this up again in the short term (until paternity takes me away again). So if I'm unable to finish, hopefully I can at least whittle down what'll be left when you turn your attention to it. |
This fixes a bug where 'managed' backups that didn't make use of their persistence capabilities would never be marked as 'finished'. Also contains some comment removal and typo corrections.
Covers local backups (with and without persistence), GCS backups, and deprecated-syntax local backups.
This adds a creation and deletion example similar to that in `docs/solr-cloud/README.md`, as well as covering a trickier some details of the new GCS backup support.
Alright, quick update on my progress here. Since I last posted, I've
That's everything I had planned to do here, unless @HoustonPutman has other feedback or gets a chance to see my responses from his earlier review. But otherwise I think this is ready to go? |
@gerlowskija you can run ‘make fmt’ to fix the formatting issues or ‘make prepare’ to auto-fix all possible linting issues that can be auto-fixed |
Thanks Houston - I saw that 'make lint' advises using 'go fmt' to autocorrect issues as well. Lots of options it looks like! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
I love the documentation and the provided examples.
A few overall things:
- Most of the testing asserts don't have messages which makes it harder to debug.
- A bigger idea:
Instead of lists for each individual repository type, we structure it more like Volumes:
repositories:
- name: cloud
gcs:
location: ...
- name: default
managed:
volume: ....
Where it’s a list of BackupRepositories, but each item has an option for each repository type.
Makes it easier to add new repository types and look for repos in the backup code.
This is mainly a cosmetic issue, so other than some code changes, it won't affect the bulk of the functionality here which looks really good.
A few of my comments can be done in separate tickets, as there's no need for this to be 100% in the first PR. (Watching for GCS Secret updates, Better exposure of which backup repositories are ready in the SolrCloud status)
@@ -89,6 +90,7 @@ func (r *SolrBackupReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) | |||
|
|||
solrCloud, allCollectionsComplete, collectionActionTaken, err := reconcileSolrCloudBackup(r, backup) | |||
if err != nil { | |||
// TODO Should we be failing the backup for some sub-set of errors here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah since we know the Solr errors, it would probably be good to do stuff here in case. But I think we can do it in a future Issue/PR.
controllers/solrcloud_controller.go
Outdated
@@ -699,6 +694,41 @@ func reconcileCloudStatus(r *SolrCloudReconciler, solrCloud *solr.SolrCloud, log | |||
return outOfDatePods, outOfDatePodsNotStarted, availableUpdatedPodCount, nil | |||
} | |||
|
|||
func isPodReadyForBackup(pod corev1.Pod, backupOptions *solr.SolrBackupRestoreOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of all this complex logic, when creating a podTemplate, we add an annotation for all the backupRepositories that pod is ready to handle... That will make it a lot easier to determine what repositories the cloud is "ready" for (just take an intersection of the lists from the pod annotations).
We can do this in a different PR though, don't want to complicate this one further.
controllers/util/solr_util.go
Outdated
fals := false | ||
solrVolumes = append(solrVolumes, corev1.Volume{ | ||
Name: gcsRepository.GetVolumeName(), | ||
VolumeSource: corev1.VolumeSource{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to worry about restarting the Solr Pods when these secrets update. But again we can have a separate issue/PR for that.
Houston and I discussed this a bit offline and I'm 👍 on the alternate syntax he proposed. All of his inline comments look good to me as well. (I'm mostly away from my computer on some paternity leave, but I'll try to stay up to date here. If I go quiet though, anyone should feel free to make whatever changes seem necessary and merge when finished) |
Ok I have refactored to use a new more kube-style way of listing repositories. I've also updated the documentation and examples. One big difference is we are no longer supporting the legacy location for the backupRepository information. The SolrCloud.withDefaults() will auto-move this information to the I have added a warning for this in the upgrade-notes for I think there are a few things left to do:
Will try to wrap this up tomorrow. And hopefully do some integration tests afterwards. Would love help with the integration tests if anyone has time for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think this is good to go for now.
Will try to do some integration testing early next week, but there is definitely time to let this bake in and fix things before the next release.
Awesome work @gerlowskija , thanks for the great contribution! This makes the SolrBackup CRD far more usable, and will hopefully make the migration of SolrClouds to Kubernetes that much easier.
Thanks a ton for seeing this through in my absence @HoustonPutman . Will be awesome to see this in 0.5.0! |
This commit adds first-pass support for exposing Solr's
GcsBackupRepository through our operator configuration. This WIP
support has a number of caveats and downsides that'll need agreed
on or fixed prior to merging :
normal backups
this currently.
config will be provided on a given solrcloud object (i.e. GCS
backups and 'local' PV backups are mutually exclusive for a
solrcloud.
Add support for GCS storage to 'solrbackup' #301