Strata-2017-SAN-JOSE

This tutorial can either be run in spark-shell or in an IDE (IntelliJ or Scala IDE for Eclipse)

Below are the steps for the setup.

Pre-requisites for Installation

Java/JDK 1.7+ has to be installed on the laptop before proceeding with the steps below.

Running in spark-shell

Download Spark 2.1.0

Download Spark 2.1.0 from here : http://spark.apache.org/downloads.html

Direct Download link : http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

Install Spark 2.1.0 on Mac

tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz

export PATH=$PATH:/Users/path_to_downloaded_spark/spark-2.1.0-bin-hadoop2.7/bin

Running spark-shell on mac

spark-shell

Install Spark 2.1.0 on Windows

Unzip spark-2.1.0-bin-hadoop2.7.tgz

Add the spark bin directory to Path : ...\spark-2.1.0-bin-hadoop2.7\bin

Set up winutils.exe on Windows (not needed on mac)

download winutils.exe from https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin
move it to c:\hadoop\bin
set HADOOP_HOME in your environment variables
- HADOOP_HOME = C:\hadoop
run from command prompt:
- C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
run spark-shell from command prompt with extra conf parameter
- spark-shell --driver-memory 2G --executor-memory 3G --executor-cores 2 -conf spark.sql.warehouse.dir=file:///c:/tmp/spark-warehouse

Pasting code in spark-shell

When pasting larger sections of the code in spark-shell, use the below:

scala> :paste

Running in IDE

If you prefer to use IDE over spark-shell, below are the steps.

You can either use IntelliJ or Scala IDE for Eclipse.

IntelliJ

Install IntelliJ from https://www.jetbrains.com/idea/download/
Add the scala language plugin
Import the code as a maven project and let it build

Scala IDE for Eclipse

If using Eclipse, do use Scala IDE for Eclipse available at : http://scala-ide.org/download/sdk.html
Import the code as a maven project and let it build

Summary of Downloads needed

Have the following downloaded before the session

JDK installed (> 1.7.x)
Spark binaries
https://github.com/WhiteFangBuck/strata-sanjose-2017

Git

Nice to have

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
data		data
scala-scripts/workshop		scala-scripts/workshop
src/main		src/main
README.md		README.md
STRATA-SPARK-ML.pdf		STRATA-SPARK-ML.pdf
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strata-2017-SAN-JOSE

Pre-requisites for Installation

Running in spark-shell

Download Spark 2.1.0

Install Spark 2.1.0 on Mac

Running spark-shell on mac

Install Spark 2.1.0 on Windows

Set up winutils.exe on Windows (not needed on mac)

Pasting code in spark-shell

Running in IDE

IntelliJ

Scala IDE for Eclipse

Summary of Downloads needed

Git

About

Releases

Packages

Languages

dusts66/strata-sanjose-2017

Folders and files

Latest commit

History

Repository files navigation

Strata-2017-SAN-JOSE

Pre-requisites for Installation

Running in spark-shell

Download Spark 2.1.0

Install Spark 2.1.0 on Mac

Running spark-shell on mac

Install Spark 2.1.0 on Windows

Set up winutils.exe on Windows (not needed on mac)

Pasting code in spark-shell

Running in IDE

IntelliJ

Scala IDE for Eclipse

Summary of Downloads needed

Git

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages