-
Notifications
You must be signed in to change notification settings - Fork 29
/
19-solutions-decision-trees.Rmd
80 lines (61 loc) · 1.81 KB
/
19-solutions-decision-trees.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Solutions ch. 9 - Decision trees and random forests {#solutions-decision-trees}
Solutions to exercises of chapter \@ref(decision-trees).
## Exercise 1
**Load the necessary packages**\
readr to read in the data\
dplyr to process data\
party and rpart for the classification tree algorithms
```{r}
library(readr)
library(dplyr)
library(party)
library(rpart)
library(rpart.plot)
library(ROCR)
set.seed(100)
```
**Select features that may explain survival**
Each row in the data is a passenger. Columns are features:
survived: 0 if died, 1 if survived\
embarked: Port of Embarkation (Cherbourg, Queenstown,Southampton)\
sex: Gender\
sibsp: Number of Siblings/Spouses Aboard\
parch: Number of Parents/Children Aboard\
fare: Fare Payed
**Make categorical features should be made into factors**
```{r}
titanic3 <- "https://goo.gl/At238b" %>%
read_csv %>% # read in the data
select(survived, embarked, sex,
sibsp, parch, fare) %>%
mutate(embarked = factor(embarked),
sex = factor(sex))
#load("/Users/robertness/Downloads/titanic.Rdata")
```
**Split data into training and test sets**
```{r}
.data <- c("training", "test") %>%
sample(nrow(titanic3), replace = T) %>%
split(titanic3, .)
```
**Recursive partitioning is implemented in "rpart" package**
```{r}
rtree_fit <- rpart(survived ~ .,
.data$training, model=TRUE)
rpart.plot(rtree_fit)
```
**Conditional partitioning is implemented in the "ctree" method**
```{r}
tree_fit <- ctree(survived ~ .,
data = .data$training)
plot(tree_fit)
```
**Use ROCR package to visualize ROC Curve and compare methods**
```{r}
tree_roc <- tree_fit %>%
predict(newdata = .data$test) %>%
prediction(.data$test$survived) %>%
performance("tpr", "fpr")
plot(tree_roc)
```
Acknowledgement: the code for this excersise is from http://bit.ly/2fqWKvK