Repository for UniSA INFS5098 Kaggle Titanic Machine Learning Challenge
This WORKING repository contains my contributions to the Kaggle Titanic Machine Learning competition:
https://www.kaggle.com/c/titanic-gettingStarted
This repository uses R code to audit, cleanse, rework and model predictions for Titanic survivors. There are three R scripts in this repository:
- '01_Titanic_Audit.rmd': R Markdown file using "knitr" package to explore and describe the Titanic dataset [in progress]
- '02_Titanic_FeatureEngine.R': R script to audit, cleanse and create new variables [in progress]
- '03_Titanic_Model.R': R script with various models to predict surivivors of the Titanic disaster [in progress]
Submissions using this code form part of the Kaggle team below:
https://www.kaggle.com/t/90000/unisa-masters
Note on the variables created for this exercise:
- below is the structure of the data frame after the '02_Titanic_FeatureEngine.R' script has been run.
- newly created variables are in lowercase
- refer attached R script for objectives of each variable
'data.frame': 891 obs. of 27 variables:
$ PassengerId : int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : Factor w/ 3 levels "0","1","model": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 187 levels "A10","A11","A14",..: 187 107 187 71 187 187 164 187 187 187 ...
$ Embarked : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
$ title : Factor w/ 18 levels " Capt"," Col",..: 13 14 10 14 13 13 13 9 14 14 ...
$ titlegroup : Factor w/ 6 levels "Master","Miss",..: 4 4 6 4 4 4 4 6 4 4 ...
$ lastname : chr "Braund" "Cumings" "Heikkinen" "Futrelle" ...
$ nickname : int 0 0 0 0 0 0 0 0 0 0 ...
$ altname : int 0 1 0 1 0 0 0 0 1 1 ...
$ iceberg : int 0 0 0 0 0 0 0 0 0 0 ...
$ deck : Factor w/ 9 levels "A","B","C","D",..: 9 3 9 3 9 9 5 9 9 9 ...
$ subclass : Factor w/ 15 levels "1A","1B","1C",..: 15 3 15 3 15 15 5 15 15 11 ...
$ side : Factor w/ 3 levels "port","starboard",..: 3 2 3 2 3 3 1 3 3 3 ...
$ familysize : Factor w/ 9 levels "1","2","3","4",..: 2 2 1 2 1 1 1 5 3 2 ...
$ farepp : num 3.62 35.64 7.92 26.55 8.05 ...
$ marriagelength: Factor w/ 3 levels "Long","Short",..: 2 1 3 1 3 3 3 3 3 2 ...
$ childage : Factor w/ 3 levels "Old","Unknown",..: 1 1 2 1 2 2 2 3 1 3 ...
$ faregroup : Factor w/ 4 levels "<10","10-20",..: 1 4 1 4 1 1 4 3 2 4 ...
$ classregion : Factor w/ 9 levels "C1","C2","C3",..: 9 1 9 7 9 6 7 9 9 2 ...
References:
https://github.com/wehrley/wehrley.github.io/blob/master/SOUPTONUTS.md
http://trevorstephens.com/post/72916401642/titanic-getting-started-with-r
http://www.slideshare.net/michellebanzondarling/final-pink-panthers0331
https://www.kaggle.com/c/titanic/prospector