Event specific cluster setup and job information

These instructions are meant to be used on the day of the HackReduce event. The servers will not be accessible except at the venue.

{CLUSTER NUMBER}: Will be assigned to your team at the event

Getting Started

git clone https://github.com/hackreduce/Hackathon.git

The team folders will be used for storing your code and data on the cluster's master node.

ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net
Create the code folder: mkdir -p ~/users/{team name}. This is where you will be storing all your team's files.

Starting on your local system:

cd {HackReduce project}
Compile your code with the following commands depending on whether you're using Gradle or Ant:

Copy your jar to the cluster's master node:

scp -i ~~/.ssh/hackreduce-cambridge.pem build/libs/{HackReduce custom}.jar hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net:~~/users/{team name}
Log onto the cluster:

ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net
Launch your job:

hadoop jar ~/users/{team name}/{HackReduce custom}.jar {Java job class} /datasets/{dataset chosen} /users/{team name}/job/

e.g. hadoop jar ~/users/hopper/myjar.jar org.hackreduce.examples.bixi.RecordCounter /datasets/bixi-montreal-2011/bixi.xml /users/hopper/bixi_recordcounts
Track the progress of your job on the Hadoop MapReduce job tracker:

http://cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net:50030
When the job is finished, you can download the output from HDFS into the local file system:

hadoop dfs -copyToLocal /users/{team name}/job ~/users/{team name}/

Simply pressing CTRL+C in the command line won't kill the running job in MapReduce. Follow these steps to kill the job:

Find the jobid of your MR job (from the job tracker), e.g. job_201108131339_0001
Log onto the cluster:

ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net
Kill the job with the hadoop command line utility:

hadoop job -kill job_201108131339_0001