Shiny Computing Machine

Next level

After experimenting with Polynote, I realized that it completely supersedes this project. It covers all the necessary features that I have aimed for originally plus more. I urge you to check out their website, the provided example notebooks and this blog post.

If you are bound to using jupyter - please go ahead and continue using this project. I will not deprecate it as such. However, I do consider Polynote to be the right way to go as far as EDA with Scala is concerned.

You like Scala and you want to do some exploratory data analysis (EDA)? You want to find the quickest way to get you on par with your python, pandas and matplotlib fellows?

Look no futher, my dear - the shiny computing machine is here!

Description

So what does one need to do fast iterative data exploration? In my mind the necessary steps are:

data processing - enables ingestion of data from any source and allows for conversion into a format that can be passed to an SQL engine
SQL engine - enables issuing queries about your data given that it already is in the right format
visualization - gives visual representations that aid the understanding of the numerical results
documentation - gives the ability to supplement the provided analysis with explanations in natural language

What's inside?

Spark
- robust and elegant APIs that give you the ability to ingest and manipulate data from any relevant source or format
- seamless conversation between powerful abstractions and SQL
- ability to import custom Scala code
Scala
- functional programming
- mature ecosystem
- typesafety
Plotly
- interactive embedded graphs that allow for dynamic zooming, export and navigation
Notebooks
- fast iterative interaction with the data
- IDE like capabilities
- Markdown

How to set it up?

Install almond - I used docker for this task - docker run -it --rm -p 8888:8888 almondsh/almond:latest
Open the 127.0.0.1/... link presented in the terminal once you start almond
Drag and drop the .ipynb notebook in this repo to the folder structure in your browser and upload your file
Open the notebook, adjust to your linking and start exploring

Example

As a real world example I have spent some time exploring a dataset of a Danske Bank account. The two simple questions that I got answers to are:

what's the overall trend of the transactions in for the whole dataset
what are the most expensive transactions per year and month in the dataset

If you feel like you want to explore your data simply:

save a given period as a CSV file with comma as a separator (otherwise you need to set the separator explicitly in the spark code to whatever you choose)
import the file and the .ipynb in the root of the folder structure that you find when navigating to localhost the address to which you will find when you start the docker container

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
scala-eda-danske.ipynb		scala-eda-danske.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shiny Computing Machine

Next level

Description

What's inside?

How to set it up?

Example

About

Releases

Packages

Languages

z4f1r0v/shiny-computing-machine

Folders and files

Latest commit

History

Repository files navigation

Shiny Computing Machine

Next level

Description

What's inside?

How to set it up?

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages