-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: backup and restore sqlite database #21584
Conversation
8f9308c
to
f5771f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome progress! Some initial thoughts
http/backup_service.go
Outdated
baseName := time.Now().UTC().Format(influxdb.BackupFilenamePattern) | ||
|
||
formWriter := multipart.NewWriter(w) | ||
w.Header().Set("Content-Type", formWriter.FormDataContentType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! That makes more sense. Just have to be a little more explicit since the multipart
package seems to really be set up for multipart/form-data
by default. I updated the content type to specifically be multipart/mixed
and everything still seems to work alright.
http/backup_service.go
Outdated
writeFn func(io.Writer) error | ||
}{ | ||
{ | ||
"bolt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API doc currently has this as boltdb
, so we'll need to match it for codegen.
Higher-level Q: I'm wondering if we should use kv
and sql
as the part names instead of bolt
and sqlite
, to match the API endpoints & pseudo-abstract over the specific DB formats used. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong feeling, but having the parts named kv
and sql
seems like an improvement. I went ahead and updated the part names to kv
and sql
...it might make things a little more clear when parsing the response based on the endpoints, like you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've got a WIP branch to finish the openapi spec for the manifest & the restore endpoints, I'll update the part names as part of it.
"sqlite", | ||
fmt.Sprintf("%s.sqlite", baseName), | ||
func(fw io.Writer) error { | ||
return h.SqlBackupRestoreService.BackupSqlStore(ctx, fw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we could take a lock across both KV and SQL before running these commands? If not there's still a window where inconsistencies could creep in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added locking mechanisms. It required a fair amount of modifications, but seems to work.
@@ -318,13 +320,22 @@ func (e *Engine) DeleteBucketRangePredicate(ctx context.Context, orgID, bucketID | |||
return e.tsdbStore.DeleteSeriesWithPredicate(bucketID.String(), min, max, pred) | |||
} | |||
|
|||
// LockKVStore locks the KV store as well as the engine in preparation for doing a backup. | |||
func (e *Engine) LockKVStore() { | |||
e.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locking the engine here might be overkill, but it was easy enough to do and might be prevent some edge issues
}, | ||
}, | ||
{ | ||
"buckets", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this part to be named "buckets". It will have a file name with a .json
extension like 20210602T171102Z.json
sqlite/sqlite.go
Outdated
} | ||
defer os.RemoveAll(tempDir) | ||
|
||
destPath := tempDir + "/" + DefaultFilename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this will cause problems on Windows, could we use filepath.Join
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, made the change!
} | ||
|
||
// perform the backup | ||
_, err = bk.Step(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step(-1)
means "backup everything"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, there's a pretty good explanation here: https://www.sqlite.org/backup.html
It's possible to use specific numbers instead of -1
if you want be able to pause the backup and give other writes a chance to work in while you are doing the backup, which we definitely don't want to do, so -1
just locks the db for writes and does the entire backup in one shot.
sqlite/sqlite.go
Outdated
} | ||
defer os.RemoveAll(tempDir) | ||
|
||
tempFileName := fmt.Sprintf("%s/%s", tempDir, DefaultFilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, could we filepath.Join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change as well. Also found it funny how I did the exact same thing two different ways in the same file 😆 - now it's consistent if nothing else.
h.BackupService.LockKVStore() | ||
defer h.BackupService.UnlockKVStore() | ||
|
||
h.SqlBackupRestoreService.LockSqlStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance we could end up w/ a deadlock here where the KV store is locked & waiting on sql, and some other process has sql locked & is waiting on KV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - at least not right now. The only other thing that uses the sql database is the notebooks service. Notebooks will get the write lock when modifying a notebook. The only way the modification of a notebook should interact with the KV is reading data from it during the request to get permissions info. This requires getting a read lock, and the KV store is locked here with a read lock as well. There currently isn't anything you can do with the sql database that would require a write to the KV. Except for backups, I guess!
So I can't think of a scenario where there would be a deadlock, really due to the fact that the two databases are almost completely decoupled. At least for notebooks...and annotations should work the same way. If we start doing things which do writes to both databases simultaneously I could see it being a problem. Then there could be a deadlock if the external request starts right between these two locks. So...hopefully we don't ever do that 😸.
Closes #21568
Partially closes #21567
Adds the following endpoints to the API:
api/v2/restore/sql
: This endpoint will receive a new sqlite database to replace the existing sqlite database, intended to act as a mechanism for restoring backed up data.api/v2/backup/metadata
(partial): Will return amultipart/form-data
response containing the bolt and sqlite database files. There is also a placeholder in the response to include a manifest describing the buckets & shards in the bolt metadata.Future work will include updating the
openapi
specification to include these new endpoints, and completing the implementation for the manifest response atapi/v2/backup/metadata
. Once that is complete, the existingapi/v2/backup/kv
endpoint will be updated to return an error response and direct the client to theapi/v2/backup/metadata
endpoint.