Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup restore is slow. #5087

Closed
martinmr opened this issue Apr 2, 2020 · 2 comments
Closed

Backup restore is slow. #5087

martinmr opened this issue Apr 2, 2020 · 2 comments
Assignees
Labels
area/enterprise/backup Related to binary backups priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.

Comments

@martinmr
Copy link
Contributor

martinmr commented Apr 2, 2020

What version of Dgraph are you using?

master

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

Daniel was trying to restore a backup and it took a while. I also thought restoring was a bit slow while doing some testing for online restores (but not as bad). But Santo was able to restore in a reasonable time so I am not 100% if this is a problem

While investigating ,I noticed that restores are being written to the database with a class called KVLoader. The comments in the code says this struct was specifically created to deal with backups. This was done before I started working on backups so I don't know the motivation for using KVLoader instead of the write batcher or the stream writer.

Assigning to @jarifibrahim for preliminary investigation. Do you have any idea why this class is there? is it necessary or can it be replace with another loader that is hopefully more performant? If something else can replace it, I can make the change.

@martinmr martinmr added area/enterprise/backup Related to binary backups priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. labels Apr 2, 2020
@jarifibrahim
Copy link
Contributor

@martinmr, the KvLoader struct is used to load up unsorted data in badger. It is the fastest way to load unsorted data. Even if the data was sorted, we couldn't use the stream writer because stream writer cannot work with a DB with existing data (stream writer clears the DB before loading data).

So the batch set API (kv loader) is the fastest way to load data into badger right now. The slow writes we're seeing during restore might be related to hypermodeinc/badger#1283

@martinmr
Copy link
Contributor Author

martinmr commented Apr 8, 2020

Ok. This could be causes by that issue since Santo didn't have any issues restoring the data. Thanks for the reply. I'll close the issue now since we are already tracking the deadlocks in hypermodeinc/badger#1283

@martinmr martinmr closed this as completed Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/enterprise/backup Related to binary backups priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

No branches or pull requests

2 participants