forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
111 lines (96 loc) · 3.33 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
# Reproducible Research: Peer Assessment 1
```{r}
#Load required libraries
library(lattice)
```
## Loading and preprocessing the data
```{r}
if (!file.exists("activity.csv"))
unzip("activity.zip")
activity <- read.csv("activity.csv")
activity$date <- as.Date(activity$date, "%Y-%m-%d")
```
## What is mean total number of steps taken per day?
```{r}
steps_per_day <- tapply(activity$steps,
activity$date,
sum,
na.rm = TRUE,
simplify = TRUE)
hist(steps_per_day,
xlab = "Number of steps",
main = "Histogram of the total number of steps taken each day",
col = "red")
```
```{r}
steps_per_day_mean <- mean(steps_per_day)
steps_per_day_median <- median(steps_per_day)
```
The mean of total number steps per day is
`r format(steps_per_day_mean, scientific = FALSE)`
The median of total number steps per day is
`r format(steps_per_day_median, scientific = FALSE)`
## What is the average daily activity pattern?
```{r}
steps_per_interval <- tapply(activity$steps, activity$interval, mean, na.rm = TRUE, simplify = TRUE)
plot(steps_per_interval, type = 'l', col = "red")
max_interval_index <- names(which.max(steps_per_interval))
max_interval_value <- steps_per_interval[[as.character(max_interval_index)]]
```
Interval with index `r max_interval_index` contains the maximum number of
steps - `r max_interval_value`
## Imputing missing values
```{r}
missing_values_count <- sum(is.na(activity))
```
Total number of missing values is `r missing_values_count`
```{r}
activity_without_nas <- activity
for (i in 1:nrow(activity_without_nas)) {
obs <- activity_without_nas[i, ]
if (is.na(obs$steps)) {
obs$steps <- steps_per_interval[[as.character(obs$interval)]]
}
activity_without_nas[i, ] <- obs
}
```
```{r}
steps_per_day_no_nas <- tapply(activity_without_nas$steps,
activity$date,
sum,
simplify = TRUE)
hist(steps_per_day_no_nas,
xlab = "Number of steps",
main = "Histogram of the total number of steps taken each day",
col = "red")
```
```{r}
steps_per_day_mean_no_nas <- mean(steps_per_day_no_nas)
steps_per_day_median_no_nas <- median(steps_per_day_no_nas)
```
The mean of total number steps per day without NAs is
`r format(steps_per_day_mean_no_nas, scientific = FALSE)`
The median of total number steps per day without NAs is
`r format(steps_per_day_median_no_nas, scientific = FALSE)`
## Are there differences in activity patterns between weekdays and weekends?
```{r}
activity_without_nas$day_level <- as.factor(
ifelse(weekdays(activity_without_nas$date) %in% c("ñóááîòà", "âîñêðåñåíüå"),
"Weekend",
"Weekday"))
steps_per_interval <- aggregate(steps ~ interval + day_level,
data = activity_without_nas,
mean)
names(steps_per_interval) <- c("interval", "day_level", "steps")
xyplot(steps ~ interval | day_level,
steps_per_interval,
type = "l",
layout = c(1, 2),
xlab = "Interval", ylab = "Number of steps")
```