This project contains examples of how to perform end-to-end analysis on mailing lists. Check out the project overview for an overview of this project.
There are interactive notebooks for this project available for anyone to start using on the public JupyterHub instance on the Massachusetts Open Cloud (MOC) right now!
- To get started, access JupyterHub, select log in with
moc-sso
and sign in using your Google Account. - After signing in, on the spawner page, please select the
Mailing list analysis toolkit
image in the JupyterHub Notebook Image section from the dropdown and select aMedium
container size and hitStart
to start your server. - Once your server has spawned, you should see a directory titled
mailing-list-analysis-toolkit-<current-timestamp>
. - Go to
notebooks/
and to look into the data collection and pre-processing steps explore the notebooks in the01_collect_data
directory and to explore examples of analyses on mailing lists go through the notebooks in the02_analyses
directory. - Here's a video that that can help familiarize you with the project.
video: https://www.youtube.com/watch?v=arvpVoTXgZg
If you need more help navigating the Operate First environment, we have a few short videos to help you get started.
This repository assumes that you have an existing Open Data Hub deployed on OpenShift that includes, JupyterHub, Argo, Ceph, Hive, Cloudera Hue and Apache Superset.
- Take a look at opendatahub.io for details on the open data hub platform.
- Details of our existing public deployment can be found at operate-first.cloud.
The primary output and user interface for this application is a Superset dashboard. This tool allows us to define certain data visualization elements from our analysis that we would like to publish and share with others, while also including enough flexibility and interactivity to allow users to explore the data themselves.
Our application is designed to automatically re-run the analyses on regular basis and ensure that the dashboard and all its plots are current and up to date.
- Current Superset Dashboard can be found here. This is currently accesible internally at Red Hat. However, there are also plans to make the analysis publicly accessible on Operate first (see issue)
If you'd like to automate your Jupyter notebooks using Argo, please follow the steps outlined in this guide. By following the steps in the document, your application can be fully set and ready to be deployed via Argo CD.