-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathExtra_content---The_Basics_of_R.Rmd
400 lines (300 loc) · 12.4 KB
/
Extra_content---The_Basics_of_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
---
title: "The basics of R"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = NULL, prompt = TRUE)
source("functions.R")
gloss <- make_glossary_function()
vector_def <- "a sequence of data elements of the same type"
function_def <- "a self-contained set of code that can be used many times"
package_def <- "a set of functions and data sets that are bundled together for distribution and download"
library_def <- "a folder/directory on your computer where your R packages are stored"
prompt_def <- "a symbol that R uses to let you know it's ready for you to enter commands"
assign_def <- "assigns the result of whatever's to the right of the operator to whatever is on the left"
```
Introduction
===============
This is a quick introduction to R. It's not meant to be a comprehensive
tutorial, but rather as a small, sturdy foothold to get you started. We will
cover concepts such as `r gloss("vectors", vector_def)`,
`r gloss("functions", function_def)`, and `r gloss("packages", package_def)`.
Fancy Calculator
===============
R is a programming language, but above all, it's a statistical programming
language that's interactive. The basic usage of R is to use it like a
calculator.
When you open R, the first thing you'll see is the version, license and then the
`r gloss('R prompt (>)', prompt_def)`:
```
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
```
The R prompt tells us that it's ready for us to type commands. Unlike other
programming languages, it's designed to be interactive. As an example, we'll do
some simple calculations showing the relationship between 47 and 2. In your R
console, type `47 + 2` and then hit <kbd>Enter</kbd>. After that, type `47 * 2`
and hit <kbd>Enter</kbd>. You should see something like this:
```{r calculate_good_times}
47 + 2
47 * 2
```
Variables
===============
Unlike a calculator, you can save the results of your calculations into
variables by using the `r gloss("assignment operator (<-)", assign_def)`. This
looks like an arrow pointing the results of the calculation on the right to the
variable on the left.
```{r its_magic}
# 49 - 2 and 94/2 are both 47. It's magic!
its <- 49 - 2
magic <- 94 / 2
```
> The '#' symbol is a "comment". R will ignore anything that comes after a
> comment, allowing you to write notes to yourself in your R script
Now that we saved the results into a variable, how do we look at it? All we have
to do is type the variable:
```{r magic_revealed}
its
magic
```
Variables can be used in calculations.
```{r multiple_magic}
oh <- its + 2
oh
wow <- magic * 2
wow
```
You can even overwrite variables.
```{r written_magic}
oh <- "oh"
oh
wow <- "wow"
wow
```
But single variables are not all that you can do In R. Read more to find out
about vectors!
Vectors
===============
R can store more than single values. It has `r gloss("vectors", vector_def)`.
These are sequences of data that are all of the same type (i.e. integers,
decimals, text (characters), and logical values (TRUE/FALSE)). To construct a
vector, you can use the `c()` function (more about functions later):
```{r vector_me_this}
some_numbers <- c(10.0, pi, 47.5362, 3.50, 1.1111)
some_numbers
some_integers <- c(NA, 1, 1, 2, 3) # NA is code for "missing" in R
some_integers
more_integers <- 1:5 # same as c(1, 2, 3, 4), but a lot easier to type!
more_integers
some_characters <- c("work it", "harder", "make it", "better", # You can wrap commands on
"do it", "faster", "makes us", "stronger") # multiple lines.
some_characters
some_logic <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
some_logic
```
### Vectorization
You can think about these as being similar to a column in an excel spreadhseet.
For example, if you wanted to take column A and square each number, you would
start by setting cell **B1** to `=A1^2` and then you would drag that down to
apply to the entire column.
Math works with vectors the same way in R:
```{r v_add_o}
some_integers ^ 2
```
> Note: "NA" is a missing value in R.
Again, if you tried to add two columns together in excel, you would type
something like `=A1+B1` and then drag to apply to the column. R behaves the
same way:
```{r v_drag}
some_integers + some_numbers
```
### Subsetting vectors
#### Using integers
Notice that each vector starts with `[1]` when you print it? This is telling you
what position this number is in the vector. You can get a specific element of a
vector by supplying a specific integer or sequence of integers:
```{r get_number}
some_numbers[2]
some_characters[c(1, 3, 5, 7)]
some_logic[1:3]
fib <- some_integers[-1] # remove the first element
fib
some_characters[fib] # you can also use other vectors!
```
Notice in the last example, we took our integer vector with the Fibbonacci
sequence and used it to subset the character vector.
#### Using logical values
We can also use logical values to subset a vector:
```{r truth_and_consequences}
some_numbers[some_logic]
```
This is a very powerful method of subsetting a vector because we can use logical comparison such as `>`, `<`, and `==`. We can get the same result as above by asking R to only return numbers less than ten:
```{r less_than_ten}
some_numbers # the first and second elements are not less than 10.
some_numbers[some_numbers < 10]
```
Data Import
===============
Of course, no one expects you to enter all of your data in R by hand. When you
have data, it's usually in [tabular format](03--Data_formatting.html). In this
example, we'll use data from the *agricolae* package assessing potato varieties
for resistance to late blight in locations in Peru (see
`help("ComasOxapampa", package = "agricolae")` for details)
To read these data into R, you can use `read.table()`:
```{r}
ComasOxapampa <- read.table("ex_data/ComasOxapampa.csv", sep = ",", head = TRUE)
```
Here we are telling R to read a table from the file
"[ComasOxapampa.csv](https://github.com/grunwaldlab/Reproducible-science-in-R/blob/master/ex_data/ComasOxapampa.csv)"
that is in the folder called "ex\_data" using the
`r gloss("function", function_def)` `read.table()` (more on functions in the
next section). What we get back is a
`r gloss("data frame", "a representation of a table in R")`. A data frame is
made up of one or more vectors that are all the same length in columns.
We can look at the structure of a data frame using the `str()` function:
```{r}
str(ComasOxapampa)
```
This output is telling us that we have 168 rows with 4 columns named "cultivar",
"replication", "comas", and "oxapampa". We can see that "cultivar" and
"replication" are both factors (a way of representing categorical variables in
R) and comas and oxapampa are both numeric values of the AUDPC. We can access
each column using the "$":
```{r}
ComasOxapampa$comas # vector of AUDPC values for Comas
```
You can even create new columns with the "$" symbol:
```{r}
ComasOxapampa$difference <- ComasOxapampa$comas - ComasOxapampa$oxapampa
head(ComasOxapampa) # Look at the top of the data (6 rows by default)
```
Functions
===============
One other feature of R is the fact that it has
`r gloss("functions", function_def)`. These are self-contained pieces of code
that can be run over and over. Functions can do anything from reading and
writing files, graphics, and calculations. We've seen a couple of them in the
previous section, `read.table()` and `str()`. Functions always take the form of `do_this(to_this_thing)`, so in the case of `read.table()`, it's "**read** the
**table** in the file called **ex\_data/ComasOxapampa.csv**".
There are other functions that help you assess your data. For example, with
data frames and matrices, you can ask R to give you the number of rows and
number of columns with `nrow()` and `ncol()`:
```{r}
nrow(ComasOxapampa)
ncol(ComasOxapampa)
```
You can also nest functions together using one as input for the other. Here, we
can find out how many varieties we have by first finding the unique cultivars
and then reporting the length
```{r}
length(unique(ComasOxapampa$cultivar))
```
The most useful functions in R are the functions for calculating statistics. For
example, you can test if there is a significan difference in disease between the
two regions with a t-test by using the function `t.test()`:
```{r}
region_t <- t.test(ComasOxapampa$comas, ComasOxapampa$oxapampa)
region_t
```
> Note: we are using a t-test for this for the sake of simplicity. This is not
> the appropriate test for these data.
Packages
===============
R would not be so widely used without `r gloss("packages", package_def)`. These
are a set of functions, documentation, and data sets that can be distributed to
anyone using R. If you can think of something you want R to do, there's probably
a package for that. Packages can be downloaded from online repositories such as
[CRAN](https://cran.r-project.org/) or [BioConductor](http://bioconductor.org/).
The easiest way to install a package is through your R console with the function
`install.packages()`:
```{r install_package, eval = FALSE}
install.packages("agricolae", repos = "https://cran.rstudio.org")
```
```{r install_package_shadow, echo = FALSE}
library('printr')
message("
Installing package into ‘/Users/zhian/R’
(as ‘lib’ is unspecified)
trying URL 'https://cran.r-project.org/bin/macosx/mavericks/contrib/3.3/agricolae_1.2-4.tgz'
Content type 'application/x-gzip' length 923645 bytes (901 KB)
==================================================
downloaded 901 KB
")
cat("
The downloaded binary packages are in
/var/folders/qd/dpdhfsz12wb3c7wz0xdm6dbm0000gn/T//RtmpWCcCSI/downloaded_packages
")
```
> You should only have to install a package once and then you can use it as many
> times as you need.
This package is stored in your `r gloss("R library", library_def)`, which is a
folder on your computer where your downloaded packages live. When you want to
use the functions in a package, you load it with the `library()` function. This
function tells R to look in your library for a package and load it.
```{r}
library("agricolae")
```
> Pro Tip! For better organization: load all of the packages you need at the
> beginning of your R script
Getting help
===============
One way that R shines above other languages is that R packages in CRAN are all
documented and easy to install. Help files are written in HTML and give the user
a brief overview of:
- the purpose of a function
- the parameters it takes
- the output it yields
- examples demonstrating its usage
To get help on any R function, type a question mark before the empty function.
Here's an example of how to get help about the `audpc()` function from the
*agricolae* package:
```{r}
library("agricolae") # The package with the audpc() function.
?audpc # open the R documentation of the function audpc()
```
Other ways of getting help:
```{r, eval = FALSE}
help(package = "agricolae") # Get help for a package.
help("audpc") # Get help for the audpc function
?audpc # same as above
??disease # Search for help with the keyword 'disease' in all packages
```
Examples in help
----------------
If you want to run the examples, you can either copy and paste the commands to
your R console, or you can run them all with:
```{r}
example("audpc", package = "agricolae")
```
Vignettes
---------
Some packages include vignettes that can have different formats such as being
introductions, tutorials, or reference cards in PDF format. You can look at a
list of vignettes in all packages by typing:
```{r, eval=FALSE}
browseVignettes() # see vignettes from all packages
browseVignettes(package = "tidyr") # see vignettes from a specific package.
```
and to look at a specific vignette you can type:
```{r, eval=FALSE}
vignette('mlg') # Multilocus Genotype vignette from the poppr package
```
```{r, echo = FALSE}
detach("package:printr", unload = TRUE)
```
# Glossary
```{r glossary, results = 'asis', echo = FALSE}
gloss(display = TRUE)
```