-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1_reading_in_data.Rmd
91 lines (57 loc) · 3.2 KB
/
1_reading_in_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# R Bootcamp Tutorial
## Methods for reading in data
In this tutorial we'll cover how to read in data with:
1. The `cansim` packages
2. From a csv file
3. Getting to know your data
---
## Introduction
Rmarkdown files (`Rmd`) consist of blocks or chunks of code written in `R` and text written in `markdown`. You can run the code chunk by chunk or by knitting the entire document at once. Knitting a document entails taking all the text and code and creating a nicely formatted document in either HTML, PDF, or Word.
In the following chunk we load packages we will need and set preferences for knitting the document. Anything behind a "#" symbol is "commented code" and will be ignored by the compiler.
```{r, include = FALSE}
# Load libraries
library(cansim) # read in CODR/NDM tables
```
## 1. `cansim` package
Today we will be working with [Stocks of specified dairy products](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210000101#tables). This data is stored in a CODR (Common Output Database Repository) table and used synonymnously with the term New Dissemination Model (NDM). These tables were formerly called CANSIM which stands for Canadian Socio-Economic Information Management System.
To load the data we use a [package called `cansim`](https://cran.r-project.org/web/packages/cansim/index.html) which contains predefined functions and methods to access cansim/CODR tables. The number is called the Product ID (PID) and can be found on the data webpage.
```{r}
# Load the cansim package
library(cansim)
# Load our data
df <- get_cansim("32-10-0001-01")
```
> Best Practice: Using an API method is typically better than downloading and reading in data
## 2. CSV
Note that on the data [webpage](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210000101#tables) you have the option to download the data in CSV format. This isn't a recommended method but is a reasonable option with smaller data.
```{r}
# Load our data from the static folder
df_csv <- read.csv("../static/prov_data.csv")
```
This is a small, hand-made dataset to illustrate reading in as a CSV. We'll use it later on in our analysis.
> If you need more information on a function use the `?` to find our more! E.g., run `?read.csv` in your console and see what happens.
## 3. Getting to know your data
The data webpage gives us a nice glance at how our data looks, but let's uncover some R methods to explore further.
```{r}
# print a summary of our data
print(summary(df))
```
Let's also look at our csv data
```{r}
# print a summary of our data
print(summary(df_csv))
```
Let's delve into some of the columns more using square brackets, which can *index* a dataset. Here we index for a column in `df` called `UOM`.
```{r}
print(unique(df["UOM"])) #prints the unique values in our "UOM" column
```
We could write lines one-by-one to explore the unique values in each column, but let's try to be clever! Here we use a *for loop* that places the name of each column in `df` into the previous command.
```{r}
for (col in colnames(df)){
print(unique(df[col]))
}
```
What are some things you notice about the data? Take a look at the outputs surrounded by carats (`< >`).
```{r}
```
<!-- Is there supposed to be something in that code chunk ^ -->