- Author: Yue You
- Tutorial Up-to-Date as of: 2021
- Usage: For MAST30034 students only ...
This repository will house all R workshops.
The python stream is available here.
On Campus:
- Monday 13:15 - 15:15 (R - Yue)
- Tuesday: 14:15 - 16:15 (Python - Akira)
- Wednesday: 11:00 - 13:00 (Python - Akira)
- Thursday: 10:00 - 12:00 (Python - Calvin)
Online:
- Tuesday: 16:15 - 18:15 (R - Yue)
- Wednesday: 14:15 - 16:15 (Python - Calvin)
- Thursday: 13:00 - 15:00 (Python - Akira), 15:15 - 17:15 (Python - Akira)
The first few tutorials will have content, with the remainder of the semester treated as consultations or additional tutorials as outlined:
-
Introduction and Project 1 Overview:
- Using the JupyterHub server
- Using GitHub Desktop vs Git CLI (Command Line Interface)
- Project 1 Overview
- R Revision
- Data Serialization
- Downloading Files using R
- Advanced:
spark
Installation
-
Geospatial Visualization and Analysis:
- HexBins (vs SquareBins), Choropleths.
- Descriptive statistics
- Advanced:
sparklyr
data analysis
-
Regression and Discussion:
- Linear Regression
- MSE vs R-Squared
- Penalized Regression (LASSO and Ridge)
- Generalized Linear Model example (Poisson for count data)
- Advanced:
sparklyr
modeling
-
Machine Learning and Working as a Team:
- Discussion: Overfitting, Curse of Dimensionality, Feature Engineering, etc.
- Dimensionality Reduction
- Agile Methodology + Standups
-
Project 2 Overview
- Introduction of themes
- Getting into teams
- Assessment Overview
- Attendance is mandatory. Groups are excused one absence only.
- The last 2 weeks of tutorials will be Presentations, all groups must attend a designated tutorial.
- The remainder of tutorials will act as checkpoints, consultation, and a chance for your group to conduct standups at a fixed time slot.
Statistical Modeling / Machine Learning:
glmnet
Data Engineering / End-to-End Pipelines:
dplyr
,sparklyr
Visualizations:
ggplot2
,pheatmap
,corrplot
,ggmap
,tmap
.....