This repository has been archived by the owner on Aug 29, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Practical Machine Learning Course Project.Rmd
117 lines (89 loc) · 2.69 KB
/
Practical Machine Learning Course Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
author: "Olga Smolyar"
date: "December 15, 2016"
installtitle: Practical Machine Learning Course Project
output: html_document
---
###Background
In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. The goal of the project is to predict the manner in which participants in the testing set did the exercise. Machine learning algorithms including Random Forest and Decision Tree will be used on the Testing set, which is further partitioned into a 'testing' and 'validation' set for cross-validation.
###Load packages and libraries
```{r}
install.packages("lazyeval")
install.packages("lattice")
install.packages("caret")
install.packages("randomForest")
install.packages("rpart.plot")
install.packages("rattle")
library(lattice)
library(caret)
library(randomForest)
library(rpart)
library(rpart.plot)
library(rattle)
```
###Load training and test data sets
```{r}
training<-read.csv("pml-training.csv", na.strings=c("NA","#DIV/0!",""))
testing<-read.csv("pml-testing.csv", na.strings=c("NA","#DIV/0!",""))
```
###Preliminary exploration of data sets
```{r}
summary(training)
summary(testing)
```
###Cleaning data sets
```{r}
training[ training == '' | training == 'NA'] <- NA
indx <-which(colSums(is.na(training))!=0)
training<-training[, -indx]
training<-training[,-(1:7)]
testing[ testing == '' | testing == 'NA'] <- NA
indx <-which(colSums(is.na(testing))!=0)
testing<-testing[, -indx]
testing<-testing[,-(1:7)]
```
###Partition the training data set into training and validation sets
```{r}
InTraining <- createDataPartition(y=training$classe,p=0.60,list=FALSE)
training <- training[InTraining,]
validation <- training[-InTraining,]
```
###Prediction using Machine Learning Algorith 1: Random Forest
Fit model:
```{r}
RFmodel <- train(classe~., data=training, method = "rf", tuneLength = 1, ntree = 30)
print(RFmodel)
```
Prediction:
```{r}
RFPrediction <- predict(RFmodel, testing, type = "raw")
```
Display important variables in model:
```{r}
varImp(RFmodel)
```
Confusion matrix:
```{r}
confusionMatrix(predict(RFmodel, validation), validation$classe)
```
###Prediction using Machine Learning Algorith 2: Decision Tree
Fit model:
```{r}
DTModel <- rpart(classe ~ ., data=training, method="class")
print(DTModel)
fancyRpartPlot(DTModel)
```
Prediction:
```{r}
DTprediction <- predict(DTModel, testing, type = "class")
```
Summary:
```{r}
summary(DTprediction)
```
Display important variables in model:
```{r}
varImp(DTModel)
```
###Conclusions
With a Kappa value of .9997 and a p-value of 2.2e-16, the Random Forest model yielded very accurate results.