-
Notifications
You must be signed in to change notification settings - Fork 68
/
03-Tidy-Data.Rmd
104 lines (69 loc) · 2.13 KB
/
03-Tidy-Data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: "Tidy Data"
output: html_notebook
---
```{r setup}
library(tidyverse)
library(babynames)
# Toy data
cases <- tribble(
~Country, ~"2011", ~"2012", ~"2013",
"FR", 7000, 6900, 7000,
"DE", 5800, 6000, 6200,
"US", 15000, 14000, 13000
)
pollution <- tribble(
~city, ~size, ~amount,
"New York", "large", 23,
"New York", "small", 14,
"London", "large", 22,
"London", "small", 16,
"Beijing", "large", 121,
"Beijing", "small", 121
)
x <- tribble(
~x1, ~x2,
"A", 1,
"B", NA,
"C", NA,
"D", 3,
"E", NA
)
# To avoid a distracting detail during class
names(who) <- stringr::str_replace(names(who), "newrel", "new_rel")
```
## Your Turn 1
On a sheet of paper, draw how the cases data set would look if it had the same values grouped into three columns: **country**, **year**, **n**
## Your Turn 2
Use `gather()` to reorganize `table4a` into three columns: **country**, **year**, and **cases**.
```{r}
```
## Your Turn 3
On a sheet of paper, draw how this data set would look if it had the same values grouped into three columns: **city**, **large**, **small**
## Your Turn 4
Use `spread()` to reorganize `table2` into four columns: **country**, **year**, **cases**, and **population**.
```{r}
```
## Your Turn 5
Gather the 5th through 60th columns of `who` into a key column: value column pair named **codes** and **n**. Then select just the `county`, `year`, `codes` and `n` variables.
```{r}
```
## Your Turn 6
Separate the `sexage` column into **sex** and **age** columns.
```{r}
```
## Your Turn 7
Reshape the layout of this data. Calculate the percent of male (or female) children by year. Then plot the percent over time.
```{r}
babynames %>%
group_by(year, sex) %>%
summarise(n = sum(n))
```
***
# Take Aways
Data comes in many formats but R prefers just one: _tidy data_.
A data set is tidy if and only if:
1. Every variable is in its own column
2. Every observation is in its own row
3. Every value is in its own cell (which follows from the above)
What is a variable and an observation may depend on your immediate goal.