Skip to content

Event specific cluster setup and job information

vladgp edited this page Nov 17, 2012 · 12 revisions

These instructions are meant to be used on the day of the HackReduce event. The servers will not be accessible except at the venue.

{CLUSTER NUMBER}: Will be assigned to your team at the event

Getting Started

Download the project

git clone https://github.com/hackreduce/Hackathon.git

Setup SSH key for accessing the cluster

  1. cd ~/.ssh

  2. Obtain the key:

  • OSX: curl -O http://manager.hackreduce.org/hackreduce.tar
  • Linux: wget http://manager.hackreduce.org/hackreduce.tar
  1. tar xvf hackreduce.tar

  2. chmod 600 hackreduce-cambridge.pem

Create your team folders

The team folders will be used for storing your code and data on the cluster's master node.

  1. ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net

  2. Create the code folder: mkdir -p ~/users/{team name}. This is where you will be storing all your team's files.

Executing jobs on the HackReduce clusters

Compile your jar (for Java participants)

Starting on your local system:

  1. cd {HackReduce project}

  2. Compile your code with the following commands depending on whether you're using Gradle or Ant:

  • Gradle: gradle
  • Ant: ant
  1. Copy your jar to the cluster's master node:

    scp -i /.ssh/hackreduce-cambridge.pem build/libs/{HackReduce custom}.jar hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net:/users/{team name}

  2. Log onto the cluster:

    ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net

  3. Launch your job:

    hadoop jar ~/users/{team name}/{HackReduce custom}.jar {Java job class} /datasets/{dataset chosen} /users/{team name}/job/

    e.g. hadoop jar ~/users/hopper/myjar.jar org.hackreduce.examples.bixi.RecordCounter /datasets/bixi-montreal-2011/bixi.xml /users/hopper/bixi_recordcounts

  4. Track the progress of your job on the Hadoop MapReduce job tracker:

    http://cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net:50030

  5. When the job is finished, you can download the output from HDFS into the local file system:

    hadoop dfs -copyToLocal /users/{team name}/job ~/users/{team name}/

Killing a MapReduce job

Simply pressing CTRL+C in the command line won't kill the running job in MapReduce. Follow these steps to kill the job:

  1. Find the jobid of your MR job (from the job tracker), e.g. job_201108131339_0001

  2. Log onto the cluster:

    ssh -i ~/.ssh/hackreduce-cambridge.pem hackreduce@cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net

  3. Kill the job with the hadoop command line utility:

    hadoop job -kill job_201108131339_0001

Browsing HDFS data

Web GUI

  1. Visit http://cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net:50070

  2. Go to "Browse the filesystem"

Command line

  1. Log onto your namenode (cluster-{CLUSTER_NUMBER}-master.gg.hackreduce.net)

  2. Run the command hadoop dfs and see the commands

Important notes