Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solr cloud backup gcs #506

Open
vipul-06 opened this issue Dec 16, 2022 · 4 comments · May be fixed by #569
Open

solr cloud backup gcs #506

vipul-06 opened this issue Dec 16, 2022 · 4 comments · May be fixed by #569
Labels
backup question Further information is requested

Comments

@vipul-06
Copy link

I have created solrcloud using the yaml defined below

`apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  name: example
  namespace: dev-backend
spec:
  backupRepositories:
    - name: "gcs-backups-1"
      gcs:
        bucket: "vipul-test-bucket" # Required
        gcsCredentialSecret:
          name: "newsecret"
          key: "my-key.json"
  replicas: 3
  solrImage:
    tag: 9.0.0`

I am taking backup of my solr collections in gcs bucket

My solr backup yaml is as below

apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: gcs-backup
  namespace: dev-backend
spec:
  solrCloud: example
  repositoryName: "gcs-backups-1"
  collections:
    - demo
    - test
    - new

My backup is starting but not getting anything on my bucket it is empty.
Also my solrbackup is not getting completed it is as follows

NAME           CLOUD     STARTED   FINISHED   SUCCESSFUL   NEXTBACKUP   AGE
gcs-backup     example   60m                                            60m
local-backup   example   4h20m     true       true                      4h20m

I firstly created a pvc backup it got completed but my gcs backup is started but not finishing also no data in my bucket

@HoustonPutman
Copy link
Contributor

As mentioned in your slack thread, you are seeing the following error:

ERROR	controller-runtime.manager.controller.solrbackup	Error while taking SolrCloud backup	{"reconciler group": "[solr.apache.org](http://solr.apache.org/)", "reconciler kind": "SolrBackup", "name": "gcs-backup", "namespace": "dev-backend", "error": "Recieved bad response code of 500 from solr with response: {\n  \"responseHeader\":{\n    \"status\":500,\n    \"QTime\":258},\n  \"error\":{\n    \"metadata\":[\n      \"error-class\",\"org.apache.solr.common.SolrException\",\n      \"root-error-class\",\"org.apache.solr.common.SolrException\"],\n    \"msg\":\"specified location / does not exist.\",\n

The Backup command requires a location field, and the operator uses "/" as the default location. This works nicely with S3, since "/" can be used as the root node. With GCS "/" and "" are both valid starts to paths, so you would need to create the "/" path yourself.

Another option to move forward is to specify a real "location" in the backup, or the GCS repo spec, and manually create that path in GCS before starting everything.

@HoustonPutman HoustonPutman added question Further information is requested backup labels Dec 22, 2022
@pbackup12345
Copy link

pbackup12345 commented Apr 8, 2023

I encountered the same problem on GCS backup. I solved the first part as you said with first creating a path and then pointing the baseLocation there. Counter-intuitively it needs to be written without a preceding slash. The backup is now created (files and folders) but the backup still doesn't complete. It remains in a pending status indefinitely.
I see no errors in the log. But the created files do not seem to be a complete backup.
The installation is on an autoPilot cluster on GKE.
No TLS between pods but TLS on ingress.

This is my backup file:

kind: SolrBackup
metadata:
  name: local-backup9
  namespace: sop030
spec:
  repositoryName: "gcs-backups-1"
  solrCloud: explore
  collections:
    - dsearch

And this is my gcs repository from the original setup:

spec:
  backupRepositories:
    - name: "gcs-backups-1"
      gcs:
        bucket: "backupbx"
        gcsCredentialSecret:
          name: "gcssecret1"
          key: "service-account-key.json"
        baseLocation: "d"

And this is the describe result on the backup:

Name:         local-backup9
Namespace:    sop030
Labels:       <none>
Annotations:  <none>
API Version:  solr.apache.org/v1beta1
Kind:         SolrBackup
Metadata:
  Creation Timestamp:  2023-04-08T12:37:18Z
  Generation:          1
  Managed Fields:
    API Version:  solr.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:collections:
        f:repositoryName:
        f:solrCloud:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-04-08T12:37:18Z
    API Version:  solr.apache.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:collectionBackupStatuses:
        f:solrVersion:
        f:startTimestamp:
    Manager:         solr-operator
    Operation:       Update
    Subresource:     status
    Time:            2023-04-08T12:37:18Z
  Resource Version:  330672
  UID:               23ea0f0b-0db7-412d-943f-e3ca58bda906
Spec:
  Collections:
    dsearch
  Repository Name:  gcs-backups-1
  Solr Cloud:       explore
Status:
  Collection Backup Statuses:
    Async Backup Status:  notfound
    Backup Name:          local-backup9-dsearch
    Collection:           dsearch
    In Progress:          true
    Start Timestamp:      2023-04-08T12:37:18Z
  Solr Version:           8.11.0
  Start Timestamp:        2023-04-08T12:37:18Z
Events:                   <none>```

@cesarfm
Copy link

cesarfm commented May 11, 2023

Hi, I am observing exactly the same symptoms described here, and in the same order (I am using GCS and first I was missing a location, after adding it I got stuck in the same next step as the OP). I believe this is the same issue than #547 (which also fully matches with my situation).

@HoustonPutman
Copy link
Contributor

To sum up the issue, when using backups users sometimes have backups that succeed in Solr, but the SolrBackup status has the following two properties:

  • In Progress: true
  • Async Backup Status: notfound

From what I can discern, there are two possible reasons why the backup isn't being "finished" by the Solr Operator:

  • The error was never handled when the backup was started. Thus an asyncId was never actually created, and why we get "notfound" when querying the status of the async command. (This can happen and needs to be fixed)
  • The backup was finished and the Solr operator deleted the asyncId in Solr, however the backup status failed to update, and thus on the next iteration of the reconcile loop, it could not find the backup status anymore.

Given that the backup succeeded in Solr, the second option is more likely for the failures that we are seeing listed here.
There are issues with the status updates failing because of conflicts.
But this issue should happen much, much less starting in v0.7.0 because of #544.
However, given that this is happening to users every single time, I am less confident.
This should happen sporadically, since its a race condition.

I will create a PR that starts to address these issues, so we can have y'all test it out and see what works.

@HoustonPutman HoustonPutman linked a pull request May 19, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backup question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants