-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise createTable in stream_writer.go #1132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a couple of comments, looks alright to me. Get @jarifibrahim to approve this.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, and @manishrjain)
level_handler.go, line 148 at r2 (raw file):
// and after all addTables calls, we can sort table list(check sortTable method). // NOTE: addTables and sortTables duplicate some code from replaceTables(). func (s *levelHandler) addTables(toAdd []*table.Table) {
Say that this is being used only by stream writer.
stream_writer.go, line 460 at r2 (raw file):
// We are not calling lhandler.replaceTables() here, as it sorts tables on every addition. // We can sort all tables only once during Flush() call. lhandler.addTables([]*table.Table{tbl})
lhandler.addTable() singular. This is the only usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the profile and benchmarks. It surely helps in reviewing PR. Looks good to me!
Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, and @jarifibrahim)
level_handler.go, line 147 at r2 (raw file):
// this can be avoided(such as stream writer). We can just add tables to levelHandler's table list // and after all addTables calls, we can sort table list(check sortTable method). // NOTE: addTables and sortTables duplicate some code from replaceTables().
Please add a comment here saying "levelhandler.Sort()
should be called after all addTables
calls are done."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @balajijinnah, @jarifibrahim, and @manishrjain)
level_handler.go, line 147 at r2 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
Please add a comment here saying "
levelhandler.Sort()
should be called after alladdTables
calls are done."
Done.
level_handler.go, line 148 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Say that this is being used only by stream writer.
Modified function comment to mention, its a special case just for stream writer.
stream_writer.go, line 460 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
lhandler.addTable() singular. This is the only usage.
Done.
Addressed all comments from Manish.
In createTable method of StreamWriter we are calling levelHandler.replaceTables method. This method adds table to leveHandler tables and sorts table based on table.Smallest. This sorting is required if we are adding tables and also querying Badger. In StreamWriter we just write data and hence we can avoid sorting on every addition of table. After we are done adding all tables, we can sort tables on all levels based on table.Smallest. This creates huge difference in case of large number of streams. I tested it on 100,000 streams time, to completely run stream writer on master was ~38 minutes vs ~6 minuntes on this PR. (cherry picked from commit 407e5bd)
Found out this while running dgraph bulk loader.
In
createTable
method ofStreamWriter
we are callinglevelHandler.replaceTables
method. This method adds table toleveHandler
tables and sorts table based ontable.Smallest
. This sorting is required if we are adding tables and also queryingBadger
. In StreamWriter we just write data and hence we can avoid sorting on every addition of table. After we are done adding all tables, we can sort tables on all levels based ontable.Smallest
.Ran below program on
master
andThis PR
.Time to run:
Master:
2143.86s user 25.89s system 107% cpu 33:47.75 total
This PR:
63.28s user 22.79s system 23% cpu 6:13.93 total
Note: Time difference to complete above program will increase more between
master
andThis PR
with increase in number of streams.This change is