Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: backup and restore sqlite database #21584

Merged
merged 10 commits into from
Jun 2, 2021
Merged

Conversation

williamhbaker
Copy link
Contributor

@williamhbaker williamhbaker commented Jun 1, 2021

Closes #21568
Partially closes #21567

Adds the following endpoints to the API:

  • api/v2/restore/sql: This endpoint will receive a new sqlite database to replace the existing sqlite database, intended to act as a mechanism for restoring backed up data.
  • api/v2/backup/metadata (partial): Will return a multipart/form-data response containing the bolt and sqlite database files. There is also a placeholder in the response to include a manifest describing the buckets & shards in the bolt metadata.

Future work will include updating the openapi specification to include these new endpoints, and completing the implementation for the manifest response at api/v2/backup/metadata. Once that is complete, the existing api/v2/backup/kv endpoint will be updated to return an error response and direct the client to the api/v2/backup/metadata endpoint.

@williamhbaker williamhbaker force-pushed the wb-metadata-backup-21567 branch from 8f9308c to f5771f9 Compare June 2, 2021 01:13
Copy link
Contributor

@danxmoran danxmoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome progress! Some initial thoughts

baseName := time.Now().UTC().Format(influxdb.BackupFilenamePattern)

formWriter := multipart.NewWriter(w)
w.Header().Set("Content-Type", formWriter.FormDataContentType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use multipart/mixed instead, to match the API doc? All the descriptions I can find of multipart/form-data make it sounds like it's intended for uploads (i.e. this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That makes more sense. Just have to be a little more explicit since the multipart package seems to really be set up for multipart/form-data by default. I updated the content type to specifically be multipart/mixed and everything still seems to work alright.

writeFn func(io.Writer) error
}{
{
"bolt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API doc currently has this as boltdb, so we'll need to match it for codegen.

Higher-level Q: I'm wondering if we should use kv and sql as the part names instead of bolt and sqlite, to match the API endpoints & pseudo-abstract over the specific DB formats used. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling, but having the parts named kv and sql seems like an improvement. I went ahead and updated the part names to kv and sql...it might make things a little more clear when parsing the response based on the endpoints, like you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've got a WIP branch to finish the openapi spec for the manifest & the restore endpoints, I'll update the part names as part of it.

"sqlite",
fmt.Sprintf("%s.sqlite", baseName),
func(fw io.Writer) error {
return h.SqlBackupRestoreService.BackupSqlStore(ctx, fw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we could take a lock across both KV and SQL before running these commands? If not there's still a window where inconsistencies could creep in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added locking mechanisms. It required a fair amount of modifications, but seems to work.

@williamhbaker williamhbaker marked this pull request as ready for review June 2, 2021 13:28
@@ -318,13 +320,22 @@ func (e *Engine) DeleteBucketRangePredicate(ctx context.Context, orgID, bucketID
return e.tsdbStore.DeleteSeriesWithPredicate(bucketID.String(), min, max, pred)
}

// LockKVStore locks the KV store as well as the engine in preparation for doing a backup.
func (e *Engine) LockKVStore() {
e.mu.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking the engine here might be overkill, but it was easy enough to do and might be prevent some edge issues

},
},
{
"buckets",
Copy link
Contributor Author

@williamhbaker williamhbaker Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this part to be named "buckets". It will have a file name with a .json extension like 20210602T171102Z.json

sqlite/sqlite.go Outdated
}
defer os.RemoveAll(tempDir)

destPath := tempDir + "/" + DefaultFilename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this will cause problems on Windows, could we use filepath.Join instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, made the change!

}

// perform the backup
_, err = bk.Step(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step(-1) means "backup everything"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there's a pretty good explanation here: https://www.sqlite.org/backup.html

It's possible to use specific numbers instead of -1 if you want be able to pause the backup and give other writes a chance to work in while you are doing the backup, which we definitely don't want to do, so -1 just locks the db for writes and does the entire backup in one shot.

sqlite/sqlite.go Outdated
}
defer os.RemoveAll(tempDir)

tempFileName := fmt.Sprintf("%s/%s", tempDir, DefaultFilename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, could we filepath.Join?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change as well. Also found it funny how I did the exact same thing two different ways in the same file 😆 - now it's consistent if nothing else.

h.BackupService.LockKVStore()
defer h.BackupService.UnlockKVStore()

h.SqlBackupRestoreService.LockSqlStore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance we could end up w/ a deadlock here where the KV store is locked & waiting on sql, and some other process has sql locked & is waiting on KV?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - at least not right now. The only other thing that uses the sql database is the notebooks service. Notebooks will get the write lock when modifying a notebook. The only way the modification of a notebook should interact with the KV is reading data from it during the request to get permissions info. This requires getting a read lock, and the KV store is locked here with a read lock as well. There currently isn't anything you can do with the sql database that would require a write to the KV. Except for backups, I guess!

So I can't think of a scenario where there would be a deadlock, really due to the fact that the two databases are almost completely decoupled. At least for notebooks...and annotations should work the same way. If we start doing things which do writes to both databases simultaneously I could see it being a problem. Then there could be a deadlock if the external request starts right between these two locks. So...hopefully we don't ever do that 😸.

@williamhbaker williamhbaker merged commit b0ea3b6 into master Jun 2, 2021
@williamhbaker williamhbaker deleted the wb-metadata-backup-21567 branch June 2, 2021 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement new restore API to include sqlite Implement new backup API to include sqlite
2 participants