GitHub - freeman-lab/spark-ml-streaming: Visualize streaming machine learning in Spark

Visualize streaming machine learning in Spark

About

This Python app generates data, analyzes it in Spark Streaming, and visualizes the results with Lightning. The analyses use streaming machine learning algorithms included with Spark as of version 1.2. The demos are designed for local use, but the same algorithms can run at scale on a cluster with millions of records.

How to use

To run these demos, you need:

A working installation of Spark
A running Lightning server
An installation of Python with standard scientific computing libraries (NumPy, SciPy, ScikitLearn)

With those three things in place, install using:

pip install spark-ml-streaming

Then set SPARK_HOME to your Spark installation, and start an executable:

streaming-kmeans -l <lighting_host>

Where lightning_host is the address of your Lightning server. After it starts, your browser will open, and you should see data appear shortly.

Try running with different settings, for example, to run a 1-d version with 4 clusters and a half-life of 10 points:

streaming-kmeans -p <temporary_path> -l <lighting_host> -nc 4 -nd 1 -hl 10 -tu points

Where temporary_path is where data will be written / read, if not specified the current tmp directory will be used (See Python tempfile.gettempdir())

2D data will make a scatter plot and 1D data will make a line plot. You can set this with -nd.

To see all options type:

streaming-kmeans -h

Build

The demo relies on a Scala package included pre-built inside python/mlstreaming/lib. To rebuild it, use sbt:

cd scala
sbt package

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
animations/databricks-blog-post		animations/databricks-blog-post
python		python
scala		scala
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualize streaming machine learning in Spark

About

How to use

Build

About

Releases

Packages

Contributors 3

Languages

License

freeman-lab/spark-ml-streaming

Folders and files

Latest commit

History

Repository files navigation

Visualize streaming machine learning in Spark

About

How to use

Build

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages