-
Notifications
You must be signed in to change notification settings - Fork 0
/
01_Basics.Rmd
95 lines (76 loc) · 3.57 KB
/
01_Basics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
title: "01_Basics"
author: "Elizabeth Bowman"
date: "2/16/2020"
output:
pdf_document: default
html_document: default
---
# Create a new project
We are going to create a new project in RStudio for this workshop on your desktop. Projects allow you to divide your data and scripts into multiple workflows.
1. Go to file.
2. Click 'New Project'.
3. Click 'New Directory'.
4. Click 'New Project'.
5. In 'Directory Name', type Tidyverse WS.
6. In 'Create project as a subdirectory of:', type ~/Desktop
Now if you look at your desktop, you will see a new folder which will include our new RSstudio project.
# Core tidy principles
1. Each variables has its own columns.
2. Each observation has its own rows.
3. Each value must has its own cell.
| | Dry_weight_g | Seeds | Treatment_type |
---------|--------------|-------|----------------|
| $F45A$ | $1.4$ | $13$ | $Heat$ |
## Importance of consistency
An important principles is to keep all values for a single variable consistent. That is all values for a variable have to be numerical or categorical. You can't mix values (e.g. 9, three, four, 49) as this will cause errors when your code is run. If a column in a data frame has a mix of numerical and categorical data (e.g. 9, three, four, 49), R will automatically classify the whole column as categorical. A simple way to check how your variables are categorized within R is to use the str() command.
```{r structure}
str(9.4)
str('hello')
```
str() can also be used with data frames. We will do this in a moment.
You can think of values as having a physical location within a data table. In the table above, the **Treatment_type** 'Heat' is in the intersection of column 3 and row 1. Notice that the column names and row names are not included in the location.
Tidyverse employs a specific type of data frame called a tibble, but also works with the standard data frame that base R uses. We do not have time to cover tibble tables here, but if you would like to learn more, see this link[linked phrase](http://tibble.tidyverse.org/).
# Load and install tidyverse
```{r install packages}
#install.packages('tidyverse')
#library(tidyverse)
#library(magrittr) # package is needed for what are called 'pipes'.
```
Some of the packages loaded:
* reader: for reading data
* ggplot2: for making plots
* tibble: tidyverse version of data frames
* dplyr: for manipulating data frames; creating new variables, calculating summary statistics, etc...
* tidyr: for reshaping data
# Loading data
We are going to download a dataset that contains data from an anaimal survey. Below is a list of all the variables in the dataset. Remember these variables will be the columns of our data frame.
Variables | Description
----------|------------
record_id | Unique id for observation
month | month of observation
day | day of observation
year | year of observation
plot_id | ID of a particular plot
species_id | 2-letter coe
sex | sex of animal ('M','F')
hindfood_length | length of the hindfoot in mm
weight | weight of the animals in grams
genus | genus of animal
species | species of animal
taxon | e.g. Rodent, Reptile, Bird, Rabbit
plot_type | type of plot
### Download the sample data from online
```{r download}
download.file(url="https://ndownloader.figshare.com/files/2292169",
destfile = '~/Desktop/TidyverseWS/portal_data_joined.csv')
```
### Load data into RStudio environment
```{r read in data}
surveys <- read.csv('~/Desktop/TidyverseWS/portal_data_joined.csv')
```
Now lets take a look at the data to ensure that it loaded correctly.
```{r check out data}
head(surveys)
str(surveys)
```