Fast add stuff #2039

whyrusleeping · 2015-12-05T01:46:15Z

This is based on the flushing disable PR i put up the other day since I needed some changes from there.

This makes ipfs add use the mfs code under the hood to make adds a bit faster.

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet · 2015-12-05T03:07:38Z

test/sharness/t0042-add-skip.sh

 			added QmaowqjedBkUrMUXgzt9c2ZnAJncM9jpJtkFfgdFstGr5a planets/.charon.txt
 			added QmU4zFD5eJtRBsWC63AvpozM9Atiadg9kPVTuTrnCYJiNF planets/.pluto.txt
 			added QmZy3khu7qf696i5HtkgL2NotsCZ8wzvNZJ1eUdA5n8KaV planets/mars.txt
 			added QmQnv4m3Q5512zgVtpbJ9z85osQrzZzGRn934AGh6iVEXz planets/venus.txt
+			added Qmf6rbs5GF85anDuoxpSAdtuZPM9D2Yt3HngzjUVSQ7kDV planets/.asteroids


we should sort the files, otherwise non-determinism.

they are sorted on input, but yeah. probably makes sense to sort it in the tests

If the outputs are based on concurrent behavior then I guess we have to give up sorting in the output and that's probably ok.

right, but theres no concurrency here at this point. Its pretty fast without it

yeah let's sort on tests. we should probably warn in ipfs add --help that the output order is not deterministic.

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping · 2015-12-05T05:37:21Z

@jbenet could use some CR here. I'll squash and rework the commits while addressing CR

jbenet · 2015-12-05T22:26:29Z

commands/cli/parse.go

+const notRecursiveFmtStr = "'%s' is a directory, use the '-%s' flag to specify directories"
+const dirNotSupportedFmtStr = "Invalid path '%s', argument '%s' does not support directories"
+
+func appendFile(fpath string, argDef *cmds.Argument, recursive bool) (files.File, error) {


👏 thank you very much for the simplification

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet · 2015-12-06T05:18:39Z

test/sharness/t0080-repo.sh

@@ -29,11 +29,6 @@ test_expect_success "'ipfs repo gc' succeeds" '
 	ipfs repo gc >gc_out_actual
 '

-test_expect_success "'ipfs repo gc' looks good (patch root)" '
-	PATCH_ROOT=QmQXirSbubiySKnqaFyfs5YzziXRB5JEVQVjU6xsd7innr &&
-	grep "removed $PATCH_ROOT" gc_out_actual


why is this removed? not understanding why it no longer removes this root node

jbenet · 2015-12-06T05:41:47Z

@whyrusleeping some comments above -- otherwise this all LGTM. fantastic improvement!

jbenet · 2015-12-06T07:07:10Z

core/coreunix/add.go

 		}
 		if file == nil {
 			break
 		}

-		node, err := params.AddFile(file)
+		err = params.addFile(file)


ok so i think directories should add every file in a new goroutine (with rate limiting). i had started on this earlier, but aborted. the general idea was every dir -> files fan out uses goroutines, like:

ctx, cancel := context.WithCancel(params.ctx) defer cancel() // this context will cancel out all the children if we error out early. // would need to wire the addFile funcs with a ctx. OR we _could_ cancel // the entire context of thea adder instead... errs := make(chan error, 10) // some room to avoid waiting for errors // dont think we have access to read len(dir) though for { file, err := dir.NextFile() if err != nil && err != io.EOF { return err // defer cancel unblocks other children } if file == nil { break // done with files } <-params.concurrency // adder-global parameter go func(file) { defer func() { params.concurrency<- struct{}{} // done }() err := params.addFile(ctx, file) // ctx cancel should unblock addFile // return even nil errors, so we force the parent to wait // until _all_ children are done during non-error ops. // but bail on ctx.Done. select { case errs<- err: case ctx.Done(): } }(file) } for err := range errs { if err == nil { continue } if _, ok := err.(*hiddenFileError); ok { continue // hidden file error, skip file } // error! return err // defer cancel unblocks + terminates all children. } // all children done. return nil

this is hard to do right, this code will deadlock when it encounters a directory deeper than your concurrency factor.

we could add something like:

if file.IsDir() { params.addFile(ctx, file) } else { // concurrency thing }

yeah i realized that after posting, we can fix that by bumping up the concurrency level one notch in dirs (i.e. semaup at the top of inside addDir, and defer semadown) -- better than this one o/ because this doesn't allow concurrent add of dirs-- and complicates the code by doing two different things (double the error handling, etc).

jbenet · 2015-12-06T07:08:24Z

@whyrusleeping general comment: i think adding files should be concurrent because a single add should leverage as much concurrent io + hashing as possible (right now it sequentially trades off between io and hashing).

I think adding https://github.com/ipfs/go-ipfs/pull/2039/files#r46767388 will fix it

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping · 2015-12-06T23:26:05Z

@jbenet would you be opposed to having the concurrency stuff done in a separate PR? This one is already getting pretty large

jbenet · 2015-12-06T23:39:18Z

@whyrusleeping my thought is "it's almost there!" -- but sure, if it's easier for you

jbenet · 2015-12-06T23:43:47Z

test/sharness/t0043-add-w.sh

 added QmQkib3f9XNX5sj6WEahLUPFpheTcwSRJwUCSvjcv8b9by _jo7
-added Qme987pqNBhZZXy4ckeXiR7zaRQwBabB7fTgHurW2yJfNu 4r93
+added QmVPwNy8pZegpsNmsjjZvdTQn4uCeuZgtzhgWhRSQWjK9x gnz66h


still needs sorting?

or, i guess it doesn't matter what order they're in? maybe just restore the original order to make sure/

whyrusleeping · 2015-12-07T01:22:27Z

good to merge?

jbenet · 2015-12-07T05:22:42Z

@whyrusleeping Still missing:

License: MIT Signed-off-by: Jeromy <[email protected]>

Fast add stuff

rht · 2015-12-16T03:28:56Z

It sure is fast

(filesize 1KB each)

Memory growth:

However, it is still possible to get to ~O(git) soon if intermediate root nodes are not created (even if the hashings are delayed in this PR) #1964 (comment).

jbenet · 2015-12-16T09:56:09Z

great stats, thanks @rht.

would be awesome to get these auto-generated by running one script. could
add it under tests/bench/ or something

On Tue, Dec 15, 2015 at 10:28 PM rht [email protected] wrote:

It sure is fast
[image: outdata]
https://cloud.githubusercontent.com/assets/395821/11830563/f2d7ad10-a3d8-11e5-868b-f43bce3e4441.png
(filesize 1KB each)

Memory growth:
[image: memory]
https://cloud.githubusercontent.com/assets/395821/11830592/3a881550-a3d9-11e5-877d-a85d6b11f485.png

However, it is still possible to get to ~O(git) soon if intermediate root
nodes are not created (even if the hashings are delayed in this PR) #1964
(comment)
#1964 (comment).

—
Reply to this email directly or view it on GitHub
#2039 (comment).

whyrusleeping added 6 commits December 3, 2015 16:29

add option to disable flushing files structure on writes

a49e020

License: MIT Signed-off-by: Jeromy <[email protected]>

compute add size in background to not stall add operation

a1dca8c

License: MIT Signed-off-by: Jeromy <[email protected]>

use mfs for adds

e5c27e1

License: MIT Signed-off-by: Jeromy <[email protected]>

enfastify mfs

07e20d2

License: MIT Signed-off-by: Jeromy <[email protected]>

fix some tests

e81235d

License: MIT Signed-off-by: Jeromy <[email protected]>

slight cleanup

559860c

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet added the status/in-progress In progress label Dec 5, 2015

jbenet reviewed Dec 5, 2015
View reviewed changes

fixify tests

742f6da

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet reviewed Dec 5, 2015
View reviewed changes

Allow for gc during adds

7341486

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet reviewed Dec 6, 2015
View reviewed changes

whyrusleeping added 2 commits December 5, 2015 23:42

Add test for running gc during an add

32cbdac

License: MIT Signed-off-by: Jeromy <[email protected]>

sort output in tests

c25386e

License: MIT Signed-off-by: Jeromy <[email protected]>

cleanup and more testing

6af342c

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping mentioned this pull request Dec 6, 2015

add option to disable flushing files structure on writes #2025

Closed

jbenet reviewed Dec 6, 2015
View reviewed changes

jbenet mentioned this pull request Dec 7, 2015

Sprint Nov 30 ipfs/team-mgmt#60

Closed

14 tasks

whyrusleeping mentioned this pull request Dec 8, 2015

flatten multipart transfers #2046

Merged

whyrusleeping added 2 commits December 7, 2015 22:19

feedback from CR

f04a791

License: MIT Signed-off-by: Jeromy <[email protected]>

log failure to check file size

9fc1a1a

License: MIT Signed-off-by: Jeromy <[email protected]>

jbenet added a commit that referenced this pull request Dec 8, 2015

Merge pull request #2039 from ipfs/fast-add-stuff

fba5fca

Fast add stuff

jbenet merged commit fba5fca into dev0.4.0 Dec 8, 2015

jbenet deleted the fast-add-stuff branch December 8, 2015 07:10

rht mentioned this pull request Dec 18, 2015

cdnjs ipfs-inactive/archives#35

Open

jbenet mentioned this pull request Jul 29, 2016

Unite the Files API 🗂 ipfs/specs#98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast add stuff #2039

Fast add stuff #2039

whyrusleeping commented Dec 5, 2015

jbenet Dec 5, 2015

whyrusleeping Dec 5, 2015

jbenet Dec 5, 2015

whyrusleeping Dec 5, 2015

jbenet Dec 6, 2015

whyrusleeping commented Dec 5, 2015

jbenet Dec 5, 2015

jbenet Dec 6, 2015

jbenet commented Dec 6, 2015

jbenet Dec 6, 2015

whyrusleeping Dec 6, 2015

whyrusleeping Dec 6, 2015

jbenet Dec 6, 2015

jbenet commented Dec 6, 2015

whyrusleeping commented Dec 6, 2015

jbenet commented Dec 6, 2015

jbenet Dec 6, 2015

jbenet Dec 6, 2015

whyrusleeping commented Dec 7, 2015

jbenet commented Dec 7, 2015

rht commented Dec 16, 2015

jbenet commented Dec 16, 2015

Fast add stuff #2039

Fast add stuff #2039

Conversation

whyrusleeping commented Dec 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Dec 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Dec 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Dec 6, 2015

whyrusleeping commented Dec 6, 2015

jbenet commented Dec 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Dec 7, 2015

jbenet commented Dec 7, 2015

rht commented Dec 16, 2015

jbenet commented Dec 16, 2015