Hyperspark is a decentralized data processing tool for Dat. Inspired by Spark
Basically, it's just a fancy wrapper around Dat Archive
This is a work-in-progress. Any idea/suggestion is welcome
- Reuse intermediate data.
- Minimize bandwidth usage.
- Share computation power.
It's simple! Just share your data with dat: dat .
Define your ideas with transforms and actions without worrying about fetching and storing data.
Run transformations defined by researchers. Cache and share intermediate data so everyone can re-use the knowledge without having their own computation cluster.
define RDD on dat with dat-transform
word-counting:
const hs = require('hyperspark')
var rdd = hs(<DAT-ARCHIVE-KEY>)
// define transforms
var result = rdd
.splitBy(/[\n\s]/)
.filter(x => x !== '')
.map(word => kv(word, 1))
// actual run(action)
result.reduceByKey((x, y) => x + y)
.toArray(res => {
console.log(res) // [{bar: 2, baz: 1, foo: 1}]
})
- RDD-style data transformation with js. dat-transform
- Analyze data inside dat archive with RDD-style API. dat-ipynb, using nel
- Convert iPython Notebook to Markdown. ipynb2md
- Attach file to markdown with dat. markdown-attachment-p2p