GitHub - UBOdin/mimir-iskra: A lightweight spark-compatible runtime

Iskra

Iskra: (Polish) The diminutive form of "spark"

Iskra is local-only runtime for Apache Spark SQL, optimized for small data.

Dear god, why?

Spark has a lot to offer small data workloads (examples follow), but has a HUGE startup cost (cold queries on a few dozen rows can take upwards of 10 seconds). The intent of Iskra is to provide low-overhead access to most of the features offered by Spark, while retaining the same comfortable programming environment.

Infrastructure re-use

Spark is a fantastic JVM-based environment for machine learning and big-data processing. However, in getting to this point it's also built up a ton of infrastructure around itself, including:

Support for reading all sorts of formats like Excel and Google spreadsheets
A complex typesystem involving hierarchical types like Structures and Arrays
Streamlined support for UDFs in Scala, Python, etc...

Automatic scaling

By using the same programming metaphors, you can rapidly iterate on a small sample of your data and immediately run the same code on a larger dataset deployed to a full Spark cluster.

Precedent

See GraphChi

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
project		project
src		src
test_data		test_data
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iskra

Dear god, why?

Infrastructure re-use

Automatic scaling

Precedent

About

Releases

Packages

Languages

UBOdin/mimir-iskra

Folders and files

Latest commit

History

Repository files navigation

Iskra

Dear god, why?

Infrastructure re-use

Automatic scaling

Precedent

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages