-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast add stuff #2039
Fast add stuff #2039
Conversation
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
added QmaowqjedBkUrMUXgzt9c2ZnAJncM9jpJtkFfgdFstGr5a planets/.charon.txt | ||
added QmU4zFD5eJtRBsWC63AvpozM9Atiadg9kPVTuTrnCYJiNF planets/.pluto.txt | ||
added QmZy3khu7qf696i5HtkgL2NotsCZ8wzvNZJ1eUdA5n8KaV planets/mars.txt | ||
added QmQnv4m3Q5512zgVtpbJ9z85osQrzZzGRn934AGh6iVEXz planets/venus.txt | ||
added Qmf6rbs5GF85anDuoxpSAdtuZPM9D2Yt3HngzjUVSQ7kDV planets/.asteroids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should sort the files, otherwise non-determinism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are sorted on input, but yeah. probably makes sense to sort it in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the outputs are based on concurrent behavior then I guess we have to give up sorting in the output and that's probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, but theres no concurrency here at this point. Its pretty fast without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's sort on tests. we should probably warn in ipfs add --help
that the output order is not deterministic.
License: MIT Signed-off-by: Jeromy <[email protected]>
@jbenet could use some CR here. I'll squash and rework the commits while addressing CR |
const notRecursiveFmtStr = "'%s' is a directory, use the '-%s' flag to specify directories" | ||
const dirNotSupportedFmtStr = "Invalid path '%s', argument '%s' does not support directories" | ||
|
||
func appendFile(fpath string, argDef *cmds.Argument, recursive bool) (files.File, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 thank you very much for the simplification
License: MIT Signed-off-by: Jeromy <[email protected]>
@@ -29,11 +29,6 @@ test_expect_success "'ipfs repo gc' succeeds" ' | |||
ipfs repo gc >gc_out_actual | |||
' | |||
|
|||
test_expect_success "'ipfs repo gc' looks good (patch root)" ' | |||
PATCH_ROOT=QmQXirSbubiySKnqaFyfs5YzziXRB5JEVQVjU6xsd7innr && | |||
grep "removed $PATCH_ROOT" gc_out_actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed? not understanding why it no longer removes this root node
@whyrusleeping some comments above -- otherwise this all LGTM. fantastic improvement! |
} | ||
if file == nil { | ||
break | ||
} | ||
|
||
node, err := params.AddFile(file) | ||
err = params.addFile(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so i think directories should add every file in a new goroutine (with rate limiting). i had started on this earlier, but aborted. the general idea was every dir -> files fan out uses goroutines, like:
ctx, cancel := context.WithCancel(params.ctx)
defer cancel()
// this context will cancel out all the children if we error out early.
// would need to wire the addFile funcs with a ctx. OR we _could_ cancel
// the entire context of thea adder instead...
errs := make(chan error, 10) // some room to avoid waiting for errors
// dont think we have access to read len(dir) though
for {
file, err := dir.NextFile()
if err != nil && err != io.EOF {
return err // defer cancel unblocks other children
}
if file == nil {
break // done with files
}
<-params.concurrency // adder-global parameter
go func(file) {
defer func() {
params.concurrency<- struct{}{} // done
}()
err := params.addFile(ctx, file) // ctx cancel should unblock addFile
// return even nil errors, so we force the parent to wait
// until _all_ children are done during non-error ops.
// but bail on ctx.Done.
select {
case errs<- err:
case ctx.Done():
}
}(file)
}
for err := range errs {
if err == nil {
continue
}
if _, ok := err.(*hiddenFileError); ok {
continue // hidden file error, skip file
}
// error!
return err // defer cancel unblocks + terminates all children.
}
// all children done.
return nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hard to do right, this code will deadlock when it encounters a directory deeper than your concurrency factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add something like:
if file.IsDir() {
params.addFile(ctx, file)
} else {
// concurrency thing
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i realized that after posting, we can fix that by bumping up the concurrency level one notch in dirs (i.e. semaup at the top of inside addDir
, and defer semadown) -- better than this one o/ because this doesn't allow concurrent add of dirs-- and complicates the code by doing two different things (double the error handling, etc).
@whyrusleeping general comment: i think adding files should be concurrent because a single add should leverage as much concurrent io + hashing as possible (right now it sequentially trades off between io and hashing). I think adding https://github.com/ipfs/go-ipfs/pull/2039/files#r46767388 will fix it |
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
@jbenet would you be opposed to having the concurrency stuff done in a separate PR? This one is already getting pretty large |
@whyrusleeping my thought is "it's almost there!" -- but sure, if it's easier for you |
added QmQkib3f9XNX5sj6WEahLUPFpheTcwSRJwUCSvjcv8b9by _jo7 | ||
added Qme987pqNBhZZXy4ckeXiR7zaRQwBabB7fTgHurW2yJfNu 4r93 | ||
added QmVPwNy8pZegpsNmsjjZvdTQn4uCeuZgtzhgWhRSQWjK9x gnz66h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still needs sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, i guess it doesn't matter what order they're in? maybe just restore the original order to make sure/
good to merge? |
@whyrusleeping Still missing: |
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
It sure is fast However, it is still possible to get to ~O(git) soon if intermediate root nodes are not created (even if the hashings are delayed in this PR) #1964 (comment). |
great stats, thanks @rht. would be awesome to get these auto-generated by running one script. could On Tue, Dec 15, 2015 at 10:28 PM rht [email protected] wrote:
|
This is based on the flushing disable PR i put up the other day since I needed some changes from there.
This makes ipfs add use the mfs code under the hood to make adds a bit faster.