Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

feat: Add option to specify chunking algorithm when adding files #1469

Merged
merged 7 commits into from
Aug 24, 2018

Conversation

dordille
Copy link
Contributor

This allows the chunking algorithm, and options to be specified when using the adding files.

Specifying chunker and options are identical to go-ipfs and support the following formats:

  • default
  • size-{size}
  • rabin
  • rabin-{avg}
  • rabin-{min}-{avg}-{max}

Example usage via command line as follows

$ ./src/cli/bin.js add --chunker rabin-48-96-192 LICENSE
added QmZH1VRMjDD48A7uqzaEU6qdfk4ddNVKZLnBMN1Lc1iEic LICENSE
$ ./src/cli/bin.js object links QmZH1VRMjDD48A7uqzaEU6qdfk4ddNVKZLnBMN1Lc1iEic
QmamX3HDpMkE6NNun2pvz1UmBubA1nsuEzMEnBqUSVXy5E 180
QmdH3L1zu7hWu7nrpoCe33zfn3hDm6baYpTSENbRitpcFu 140
Qmaw7roFAKSj3V3QxmFUfpqU3dqAEMPMqHynqXxJmhhGhB 64
Qmewm77gB4V7cgnZLsuikXgrG8Q38M1TX8qPRBcKrw4yJ4 90
QmRd6EoLAyp6TXZmSRTByaAgEn5632J8u9NwFSiHfXurLx 95
QmeLhozoTP1z3RiLRBRUuLJ7xpmDh5oi1ShN91B4UPjRwA 172
QmdxhoXpK74gGbm2G8bAoCiuqDfbq44rN6RZKefCAFEhoZ 99
QmZNEhqU6uVjMc7p7xfqfwQK5YGxe1bdUNdG6Hq81149iw 139
Qmd1yFj2ZuNd28r5MaQaWkr3vcUwPF61i8oBdfRjaUC4V5 64
QmeJhcwipWbjXA5DdXVGe6Sh8kwBzADNnw3Zd3HKcPtPkm 82
Qme1G8Mk1PLThzCr11NzChQVy1FJoHCQkjK7rB2iTWKkaJ 58

Fixes #1283

@dordille dordille changed the title Add option to specify chunking algorithm when adding files (WIP) feat: Add option to specify chunking algorithm when adding files Jul 26, 2018
@vmx
Copy link
Member

vmx commented Aug 6, 2018

This PR depends on ipfs-inactive/js-ipfs-unixfs-engine#223.

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. Most of my comments are pretty minor ones.

@@ -135,6 +135,10 @@ module.exports = {
default: false,
describe: 'Only chunk and hash, do not write'
},
chunker: {
default: 'default',
describe: 'Chunking algorithm to use, formatted like [default, size-{size}, rabin, rabin-{avg}, rabin-{min}-{avg}-{max}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go IPFS CLI uses size-262144 as default. Would it make sense to remove the default value from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, will make that change.

const sizeStr = chunker.split('-')[1]
const size = parseInt(sizeStr)
if (isNaN(size)) {
throw new Error('Parameter avg must be an integer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is wrong. The parameter isn't the average, but the fixed size.

}
break
case 4:
options.minChunkSize = parseSub(parts[1].split(':'), 'min')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a port of the Go implementations. It seems to support something like rabin-min:123-avg:456-max:789. This isn't documented (also not in the Go version). I lean towards not supporting it, i.e. less code, less docs, less testing, less bugs :)

Though it would be good to check with the Go team, what the original reason is for having support for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with @whyrusleeping over irc and he said that we didn't have to worry too much about the alternative format, so I will remove.

}
break
case 4:
options.minChunkSize = parseSub(parts[1].split(':'), 'min')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a port of the Go implementations. It seems to support something like rabin-min:123-avg:456-max:789. This isn't documented (also not in the Go version). I lean towards not supporting it, i.e. less code, less docs, less testing, less bugs :)

Though it would be good to check with the Go team, what the original reason is for having support for that.

@@ -157,4 +157,51 @@ describe('utils', () => {
})
})
})

describe('parseChunkerString', () => {
it('handles an empty string', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases for error cases like an unsupported chunker.

@alanshaw alanshaw changed the title (WIP) feat: Add option to specify chunking algorithm when adding files [WIP] feat: Add option to specify chunking algorithm when adding files Aug 7, 2018
@@ -135,6 +135,10 @@ module.exports = {
default: false,
describe: 'Only chunk and hash, do not write'
},
chunker: {
default: 'size-262144',
describe: 'Chunking algorithm to use, formatted like [default, size-{size}, rabin, rabin-{avg}, rabin-{min}-{avg}-{max}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a minor thing: I've seen that the Go implementation can parse default as value, but also the CLI help of ipfs add --help doesn't show this option. I'd remove default from this list and also the code below. Less options, less bugs :)

* @return {Object} Chunker options for DAGBuilder
*/
function parseChunkerString (chunker) {
if (!chunker || chunker === '') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever pass in an empty string (or undefined)? If not, this if case could be removed (and also removed from the JSDocs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks safe to remove

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! LGTM, I think it's ready to merged. I'd prefer if someone from the js-ipfs would do the merge and hence confirm that it's good to go.

@alanshaw alanshaw self-requested a review August 9, 2018 11:20
@alanshaw alanshaw changed the title [WIP] feat: Add option to specify chunking algorithm when adding files feat: Add option to specify chunking algorithm when adding files Aug 9, 2018
@alanshaw
Copy link
Member

alanshaw commented Aug 9, 2018

This looks great @dordille. Do you have time to submit a PR to https://github.com/ipfs/interface-ipfs-core/blob/master/SPEC/FILES.md#filesadd to document this new option?

@alanshaw alanshaw mentioned this pull request Aug 9, 2018
22 tasks
@dordille
Copy link
Contributor Author

dordille commented Aug 9, 2018

Yeah, I’ll update the docs

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you! ✨

There's just a few issues to iron out before we can merge this. Comments inline.

Additionally, the option needs to also be added to the HTTP API here https://github.com/ipfs/js-ipfs/blob/master/src/http/api/resources/files.js#L152

@@ -133,12 +134,13 @@ class AddHelper extends Duplex {
}

module.exports = function files (self) {
function _addPullStream (options) {
function _addPullStream (options = {}) {
const chunkerOptions = parseChunkerString(options.chunker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseChunkerString can throw. add*Stream methods should never throw when called, but should return a stream that immediately errors. You need to do something like:

let chunkerOptions
try {
  chunkerOptions = parseChunkerString(options.chunker)
} catch (err) {
  return pull.map(() => { throw err })
}

@@ -233,7 +233,8 @@ exports.add = {
onlyHash: request.query['only-hash'],
hashAlg: request.query['hash'],
wrapWithDirectory: request.query['wrap-with-directory'],
pin: request.query.pin
pin: request.query.pin,
chunker: request.query['chunker']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.query['chunker'] => request.query.chunker

@dordille
Copy link
Contributor Author

dordille commented Aug 13, 2018

@alanshaw Updated that http spec to include the option, can't retrigger a build though, looks like there wasn't enough space to run the build.

@ghost ghost assigned alanshaw Aug 15, 2018
@ghost ghost added the status/in-progress In progress label Aug 15, 2018
@alanshaw
Copy link
Member

I've rebased this against master - there were test failures due to changes in dependencies that have now been resolved. Lets see what CI says now 🤞

@alanshaw alanshaw force-pushed the rabin-chunker branch 2 times, most recently from a5eb1e7 to f9c9d13 Compare August 23, 2018 19:19
This allows the chunking algorithm, and options to be specified when using the adding files.
Specifying chunker and options are identical to go-ipfs and support the following formats:
default, size-{size}, rabin, rabin-{avg}, rabin-{min}-{avg}-{max}
This is required to achieve parity with go-ipfs.

Fixes ipfs#1283

License: MIT
Signed-off-by: Dan Ordille <[email protected]>
@alanshaw
Copy link
Member

Tests all passing, only commitlint that failed - am merging! Thanks @dordille ❤️

@alanshaw alanshaw merged commit 4f805d3 into ipfs:master Aug 24, 2018
@ghost ghost removed the status/in-progress In progress label Aug 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants