Add steps to run the analysis on Helix Nebula

JavierCVilla · Dec 12, 2018 · d39d11a · d39d11a
1 parent 8ecae41
commit d39d11a
Showing 1 changed file with 19 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,25 @@
 # RDataFrame-Totem
 ---
 
+## How to run the analysis on Helix Nebula
+
+1. Log in to SWAN Helix Nebula
+2. On the CERNBOX tab, open a new terminal (icon `>_` on the top right corner)
+3. Clone this repo:
+
+   ```
+   git clone https://github.com/JavierCVilla/RDataFrame-Totem.git
+   ```
+
+4. Open the python notebook (`DistillDistibution-AllDatasets.ipynb`) from the SWAN Interface:
+5. Start the Spark cluster connection, the default configuration is ready to run the analysis
+6. Once connected, execute cells 1 to 7, this should be fairly fast since no computation will be triggered yet
+7. Cell number 8 initializes the Spark job and starts the event loop:
+   - It may take some minutes for the creation of ranges
+   - After a couple of minutes, you will see the Spark monitoring with the job progress
+8. Once finished, the rest of cells will show some results and save them to disk
+
+
 ## How to run `distill.py`
 
 **Requirements**