Apache Zeppelin is a web-based notebook that enables you to do interactive data analytics. For example, you can use it as a front-end for Apache Spark.
- Estimated time for completion: 8 minutes
- Target audience: Data scientists and data engineers that want an interactive data analytics tool.
- Scope: Install and use Zeppelin in DC/OS.
Table of Contents:
- A running DC/OS 1.8 cluster with 2 agents (one private, one public) each with 2 CPUs and 2 GB of RAM available.
- DC/OS CLI installed.
You typically want to access Zeppelin via a web browser outside of the DC/OS cluster. To access the Zeppelin UI from outside of the DC/OS cluster you can use Marathon-LB, which is recommended for production usage.
In the following we will use the DC/OS Admin Router to provide access to the Zeppelin UI, which is fine for dev/test setups:
$ dcos package install zeppelin
Installing Marathon app for package [zeppelin] version [0.5.6]
DC/OS Zeppelin is being installed!
Documentation: https://docs.mesosphere.com/zeppelin/
Issues: https://docs.mesosphere.com/support/
After this, you should see the Zeppelin service running via the Services
tab of the DC/OS UI:
In the DC/OS UI, clicking on the Open Service
button in the right upper corner leads to the Zeppelin UI:
To get started with Zeppelin you can create a new Notebook and paste the following Spark snippet in Scala:
val rdd = sc.parallelize(1 to 5)
rdd.sum()
After you've pressed the Run all paragraphs
button (the triangle/play button in the menu), you should see something like the following
Next, you can check out the built-in tutorial in form of a Notebook called Zeppelin Tutorial:
To uninstall Zeppelin:
$ dcos package uninstall zeppelin