Skip to content

Commit

Permalink
Add steps to run the analysis on Helix Nebula
Browse files Browse the repository at this point in the history
  • Loading branch information
JavierCVilla committed Dec 12, 2018
1 parent 8ecae41 commit d39d11a
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
# RDataFrame-Totem
---

## How to run the analysis on Helix Nebula

1. Log in to SWAN Helix Nebula
2. On the CERNBOX tab, open a new terminal (icon `>_` on the top right corner)
3. Clone this repo:

```
git clone https://github.com/JavierCVilla/RDataFrame-Totem.git
```

4. Open the python notebook (`DistillDistibution-AllDatasets.ipynb`) from the SWAN Interface:
5. Start the Spark cluster connection, the default configuration is ready to run the analysis
6. Once connected, execute cells 1 to 7, this should be fairly fast since no computation will be triggered yet
7. Cell number 8 initializes the Spark job and starts the event loop:
- It may take some minutes for the creation of ranges
- After a couple of minutes, you will see the Spark monitoring with the job progress
8. Once finished, the rest of cells will show some results and save them to disk


## How to run `distill.py`

**Requirements**
Expand Down

0 comments on commit d39d11a

Please sign in to comment.