MinneAnalytics Big Data Tech 2016
Chad Dvoracek
Data Engineer, The Nerdery
R Studio Server on Amazon EMR
Explore the convenience of the popular IDE for R while harnessing the power of SparkR (R on Spark) for distributed processing. See how to quickly set up R Studio Server on an EMR cluster and access the IDE via any web browser.
R Studio, R Studio Server on EMR, Distributed Data Frames, Machine Learning, SQL Context, fast aggregation.
Use for bootstrap step to initialize rstudio user on each node in the cluster
Bash script example on how to load data, install R Studio Server and scripts.
R script used for tutorial.
Hive script used to create an external table on the cleaned data set.