This project uses Apache Spark run on a Single Node Hadoop/Yarn.
This install will play with your ~/.ssh folder, more specifically the .ssh/authorized_keys file
It will allow hadoop to run the ssh localhost command using a DSA PassPhraseLess key
- Hadoop HDFS (Hadoop Distributed File System)
- Yarn, MapReduce 2.0
- Spark general engine for large-scale data processing
- Scala and Python
- wget
- java (JVM)
- *nix - Darwin, Cygwin (not yet)
- Python (if running Python)
- sbt to run the Scala examples
- Set the Hadoop Home
- Starting the services
Note: On MacOS, make sure SSH is started. System Preferences/Sharing/Remote Login [ON]
- Checking Services are running
13049 NameNode (HDFS Name Node) -- Make sure this is running
13241 DataNode (HDFS Data Node)
22752 ResourceManager (Yarn Resource)
22894 NodeManager (Yarn Node)
Browsing the File System's health
- Start ResourceManager daemon and NodeManager daemon:
If you want to look at the running jobs or already executed (Jobwatch Equivalent)
- Create and Mount a new Hadoop DFS
${HDFS_HOME}/bin/hdfs namenode -format
Note: You need to restart HDFS
- Create a directory in Hadoop DFS
Create the user directory along with the owner directory
${HDFS_HOME}/bin/hdfs dfs -mkdir -p /user/${USER}
sbt package
... [success] Total time: ...
${SPARK_HOME}/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/egoeconometrics_2.10-0.1-SNAPSHOT.jar
... Lines with a: 41, Lines with b: 17
- Scala
- Python
${SPARK_HOME}/bin/bin/pyspark --master local[4]
- You will first need to run Netcat (a small utility found in most Unix-like systems) as a data server by using
nc -lk 9999
- Then, in a different terminal, you can start the example by using
${SPARK_HOME}/bin/spark-submit --class "QuickStreamingApp" --master local[4] target/scala-2.10/egoeconometrics_2.10-0.1-SNAPSHOT.jar localhost 9999
When you're done, stop the daemons with:
- installing SBT - Install Java:
- Add JAVA_HOME in profile (i.e. export JAVA_HOME=/usr/lib/jvm/java-8-oracle in .bashrc)
- After the install change the ${JAVA_HOME} with /usr/lib/jvm/java-8-oracle
in ~/bin/local/bigdata/hadoop/etc/hadoop/
- Add a new spark user and allow remote connection
- Micro Instance out of memory - add swap
Here are the commands to add a 1GB swap
sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
sudo /sbin/mkswap /var/swap.1
sudo /sbin/swapon /var/swap.1
To turn off the swap do the following:
sudo /sbin/swapoff /var/swap.1
Recommended reading for AWS
- Not working under Cygwin
- Install and run SSH Daemon on Cygwin
Copyleft © 2014 EgoOyiri [AfricaCoin]
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.