Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to take GCS-based backups #302

Merged
merged 23 commits into from
Sep 27, 2021

Conversation

gerlowskija
Copy link
Contributor

This commit adds first-pass support for exposing Solr's
GcsBackupRepository through our operator configuration. This WIP
support has a number of caveats and downsides that'll need agreed
on or fixed prior to merging :

  • GCS backups eschew the "persistence" step that currently follows
    normal backups
  • GCS backups are only included in Solr 8.9+, but there's no check for
    this currently.
  • operator logic currently assumes that exactly 1 type of backup
    config will be provided on a given solrcloud object (i.e. GCS
    backups and 'local' PV backups are mutually exclusive for a
    solrcloud.
  • no automated tests have been added
  • no documentation of has been added, beyond the examples on issue
    Add support for GCS storage to 'solrbackup' #301

This commit adds first-pass support for exposing Solr's
GcsBackupRepository through our operator configuration.  This WIP
support has a number of caveats and downsides:

  - GCS backups eschew the "persistence" step that currently follows
    normal backups
  - GCS backups are only included in Solr 8.9+, but there's no check for
    this currently.
  - operator logic currently assumes that exactly 1 type of backup
    config will be provided on a given solrcloud object (i.e. GCS
    backups and 'local' PV backups are mutually exclusive for a
    solrcloud.
  - no automated tests have been added
  - no documentation of has been added, beyond the examples on issue
    apache#301
- changes CRDs to have explicit names for each backup repository
- multiple backup repositories now supported
- added "managed" vs unmanaged backup distinction
@gerlowskija
Copy link
Contributor Author

I've updated this to use different CRD's (see the latest comment on the related issue #301). This also adds support for defining multiple backup repositories in your solrcloud definition which is pretty cool.

Still needs tests, docs for the new CRD fields, and there's currently a bug where GCS backups succeed but this success is never persisted in the kube status. Hope to fix those shortly.

Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gerlowskija , I think this is an awesome start and pretty far along.

I've made some feedback, but I will definitely be helping more when the time permits. I'm not sure this will get into the v0.4.0 release, but hopefully we can have another release soon after.

api/v1beta1/solrbackup_types.go Outdated Show resolved Hide resolved
api/v1beta1/solrbackup_types.go Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
controllers/util/backup_util.go Outdated Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
controllers/solrcloud_controller.go Outdated Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
controllers/solrbackup_controller.go Outdated Show resolved Hide resolved
@gerlowskija
Copy link
Contributor Author

Thanks for the review Houston. Just getting back from some time off now; gonna try to address your comments in the next day or two, as well as getting the docs and tests in order.

@gerlowskija
Copy link
Contributor Author

Haven't touched this in a few weeks. Both because of some competing priorities where I work, and because I've been scrambling to get things done before I disappear for an upcoming paternity leave.

But just wanted to say that this is still on my radar and I'm hoping to return to it as soon as I can.

@HoustonPutman
Copy link
Contributor

before I disappear for an upcoming paternity leave

Congrats!!! 🎉

No worries, I've been busy on the S3 Repo stuff and the v0.4.0 release. I do plan on taking this forward over the next few weeks. Mind if I make some changes while you are working on other stuff?

@gerlowskija
Copy link
Contributor Author

Thanks!

Mind if I make some changes while you are working on other stuff?

Not at all, thanks for the help! My plate cleared up a bit after posting the other day, so I'll actually be picking this up again in the short term (until paternity takes me away again). So if I'm unable to finish, hopefully I can at least whittle down what'll be left when you turn your attention to it.

This fixes a bug where 'managed' backups that didn't make use of their
persistence capabilities would never be marked as 'finished'.

Also contains some comment removal and typo corrections.
Covers local backups (with and without persistence), GCS backups, and
deprecated-syntax local backups.
This adds a creation and deletion example similar to that in
`docs/solr-cloud/README.md`, as well as covering a trickier some details
of the new GCS backup support.
@gerlowskija
Copy link
Contributor Author

Alright, quick update on my progress here. Since I last posted, I've

  • addressed all of Houston's requests (except two where I asked for clarification on what he wanted done)
  • fixed a bug mentioned earlier around updating backup status when persistence was skipped.
  • added some unit tests
  • added examples (under examples/) and re-wrote the backup docs (in docs/solr-backup/README.md)
  • done misc cleanup (e.g. removing personal debug logging, etc.)

That's everything I had planned to do here, unless @HoustonPutman has other feedback or gets a chance to see my responses from his earlier review. But otherwise I think this is ready to go?

@HoustonPutman
Copy link
Contributor

@gerlowskija you can run ‘make fmt’ to fix the formatting issues or ‘make prepare’ to auto-fix all possible linting issues that can be auto-fixed

@gerlowskija
Copy link
Contributor Author

gerlowskija commented Sep 16, 2021

Thanks Houston - I saw that 'make lint' advises using 'go fmt' to autocorrect issues as well. Lots of options it looks like!

Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

I love the documentation and the provided examples.

A few overall things:

  • Most of the testing asserts don't have messages which makes it harder to debug.
  • A bigger idea:
    Instead of lists for each individual repository type, we structure it more like Volumes:
    repositories:
  - name: cloud
    gcs:
      location: ...
  - name: default
    managed:
      volume: ....

Where it’s a list of BackupRepositories, but each item has an option for each repository type.
Makes it easier to add new repository types and look for repos in the backup code.

This is mainly a cosmetic issue, so other than some code changes, it won't affect the bulk of the functionality here which looks really good.

A few of my comments can be done in separate tickets, as there's no need for this to be 100% in the first PR. (Watching for GCS Secret updates, Better exposure of which backup repositories are ready in the SolrCloud status)

api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Show resolved Hide resolved
api/v1beta1/solrcloud_types.go Outdated Show resolved Hide resolved
@@ -89,6 +90,7 @@ func (r *SolrBackupReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error)

solrCloud, allCollectionsComplete, collectionActionTaken, err := reconcileSolrCloudBackup(r, backup)
if err != nil {
// TODO Should we be failing the backup for some sub-set of errors here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah since we know the Solr errors, it would probably be good to do stuff here in case. But I think we can do it in a future Issue/PR.

@@ -699,6 +694,41 @@ func reconcileCloudStatus(r *SolrCloudReconciler, solrCloud *solr.SolrCloud, log
return outOfDatePods, outOfDatePodsNotStarted, availableUpdatedPodCount, nil
}

func isPodReadyForBackup(pod corev1.Pod, backupOptions *solr.SolrBackupRestoreOptions) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of all this complex logic, when creating a podTemplate, we add an annotation for all the backupRepositories that pod is ready to handle... That will make it a lot easier to determine what repositories the cloud is "ready" for (just take an intersection of the lists from the pod annotations).

We can do this in a different PR though, don't want to complicate this one further.

controllers/util/backup_util_test.go Outdated Show resolved Hide resolved
fals := false
solrVolumes = append(solrVolumes, corev1.Volume{
Name: gcsRepository.GetVolumeName(),
VolumeSource: corev1.VolumeSource{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to worry about restarting the Solr Pods when these secrets update. But again we can have a separate issue/PR for that.

@gerlowskija
Copy link
Contributor Author

gerlowskija commented Sep 23, 2021

Instead of lists for each individual repository type, we structure it more like Volumes

Houston and I discussed this a bit offline and I'm 👍 on the alternate syntax he proposed. All of his inline comments look good to me as well.

(I'm mostly away from my computer on some paternity leave, but I'll try to stay up to date here. If I go quiet though, anyone should feel free to make whatever changes seem necessary and merge when finished)

@HoustonPutman
Copy link
Contributor

HoustonPutman commented Sep 23, 2021

Ok I have refactored to use a new more kube-style way of listing repositories. I've also updated the documentation and examples.

One big difference is we are no longer supporting the legacy location for the backupRepository information. The SolrCloud.withDefaults() will auto-move this information to the backupRepositories list. However, we lose the old location where these directories were mounted. This is not an issue, since the Solr Operator will use this new location after it's upgrade. The only issue is when the Solr Operator has upgraded, but the SolrClouds are still in a rolling-restart to update the mountPath of the legacy backupRepo. Users will have to hold off of doing any backups until the SolrClouds they are using have been fully updated. Once this is done, there should be no issues with backwards-compatibility.

I have added a warning for this in the upgrade-notes for v0.5.0. I think it's an acceptable tradeoff to not need a lot of custom code. Also we will be able to remove the deprecated option completely in v0.6.0 without creating an issue for existing clouds (since the deprecated option will have been moved to backupRepositories).

I think there are a few things left to do:

  • Add helm chart options for the backupRepositories, update Helm documentation
  • Add a changelog entry.
  • Add assert messages in new tests.

Will try to wrap this up tomorrow. And hopefully do some integration tests afterwards. Would love help with the integration tests if anyone has time for that.

Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think this is good to go for now.

Will try to do some integration testing early next week, but there is definitely time to let this bake in and fix things before the next release.

Awesome work @gerlowskija , thanks for the great contribution! This makes the SolrBackup CRD far more usable, and will hopefully make the migration of SolrClouds to Kubernetes that much easier.

@HoustonPutman HoustonPutman linked an issue Sep 27, 2021 that may be closed by this pull request
@HoustonPutman HoustonPutman merged commit f9002d4 into apache:main Sep 27, 2021
@gerlowskija
Copy link
Contributor Author

Thanks a ton for seeing this through in my absence @HoustonPutman . Will be awesome to see this in 0.5.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for GCS storage to 'solrbackup'
2 participants