Hyperspark

Hyperspark is a decentralized data processing tool for Dat. Inspired by Spark

Basically, it's just a fancy wrapper around Dat Archive

This is a work-in-progress. Any idea/suggestion is welcome

Goal

Reuse intermediate data.
Minimize bandwidth usage.
Share computation power.

How to use

Data owner

It's simple! Just share your data with dat: dat .

Data Scientist

Define your ideas with transforms and actions without worrying about fetching and storing data.

Computation Provider

Run transformations defined by researchers. Cache and share intermediate data so everyone can re-use the knowledge without having their own computation cluster.

Synopsis

define RDD on dat with dat-transform

word-counting:

const hs = require('hyperspark')
var rdd = hs(<DAT-ARCHIVE-KEY>)

// define transforms
var result = rdd
  .splitBy(/[\n\s]/)
  .filter(x => x !== '')
  .map(word => kv(word, 1))

// actual run(action)
result.reduceByKey((x, y) => x + y)
  .toArray(res => {
    console.log(res) // [{bar: 2, baz: 1, foo: 1}]
  })

Related Modules

RDD-style data transformation with js. dat-transform
Analyze data inside dat archive with RDD-style API. dat-ipynb, using nel
Convert iPython Notebook to Markdown. ipynb2md
Attach file to markdown with dat. markdown-attachment-p2p

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperspark

Goal

How to use

Data owner

Data Scientist

Computation Provider

Synopsis

Related Modules

About

Releases

Packages

Languages

poga/hyperspark

Folders and files

Latest commit

History

Repository files navigation

Hyperspark

Goal

How to use

Data owner

Data Scientist

Computation Provider

Synopsis

Related Modules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages