-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path5_outputting_data.Rmd
139 lines (90 loc) · 4.41 KB
/
5_outputting_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: R Bootcamp Tutorial
output:
ioslides_presentation
---
# R Bootcamp Tutorial
## Outputting data
In this tutorial, we'll explore some options for outputting or presenting data. Please be aware that some of these topics can become quite advanced.
1. Writing out data
2. Rmarkdown
3. Quarto
---
To get started let's load in the data we've been working with throughout the bootcamp.
```{r, include = FALSE}
# Possible output of cleaned data from lesson 3
# Merge types
# Group_by
# summary stats
# Add one more viz to this one
library(cansim) # read in CODR/NDM tables
library(ggplot2)
library(plotly)
library(tidyverse)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# Load cansim data
df <- get_cansim("32-10-0001-01")
df <- filter(df,Commodity == "Creamery butter", GEO != "Canada")
# Load province data
prov_data_csv <- read.csv("../static/prov_data.csv", fileEncoding = 'UTF-8-BOM')
# left join province info in
df_regions <- merge(x = df, y = prov_data_csv,
by.x = "GEO",
by.y = "Province",
all.x = TRUE)
# Fix any NA columns
for (i in 1:nrow(df_regions)){
if(is.na(df_regions$Region[i])){
df_regions$Region[i] <- df_regions$GEO[i]
}
}
# Make a year column
df_regions$year <- substr(df_regions$REF_DATE,1,4)
df_regions$Region <- str_replace_all(df_regions$Region, c("Atlantic provinces" = "Atlantic Provinces"))
# Avg value by year
df_summary <- df_regions %>% group_by(year, Region) %>%
summarize(mean = mean(VALUE, na.rm = TRUE))
```
### 1. Writing out data
Recall in our analysis tutorial we used a function to write a dataframe to a csv:
`write.csv()` comes from base R and by default includes row names
```{r}
write.csv(df_summary, file = "../static/df_summary.csv")
```
`write.csv2()` uses ';' as delimiters
```{r}
write.csv2(df_summary, file = "../static/df_summary2.csv")
```
`write_csv()` comes from the `readr` package. `readr` is included in the `tidyverse` collection. THe main differences are that this is about twice as fast as `write.csv()` and it doesn't write row names.
```{r}
write_csv(df_summary, file = "../static/df_summary3.csv")
```
To specify your own delimiters you need to use `write.table()`
```{r}
write.table(df_summary, file="../static/df_summary_table.csv", sep=",",dec = " ")
```
CSVs are great to work with because they're easy to read and can largely be treated like excel files, which is intuitive for many. However, for large data they are not the most efficient choice.
#### Other file types
For managing large data you can check out the surge team's [big dataset guide](https://gitlab.k8s.cloud.statcan.ca/surge-team/learning-material/big-datasets-guide).
#### SAS
We won't cover this here but you can read a guide on using SAS with R or Python [here](https://rpug.pages.cloud.statcan.ca/en/setup/r/saspy/).
<!-- This might need to be updated with the new website -->
### 2. Rmarkdown
As you know, we've been working with .Rmd files throughout the bootcamp.
#### Knitting an RMarkdown file
Press the `Knit` button near the top of RStudio and select Knit to HTML. Some notes:
- Typically HTML files don't go on Gitlab (*.html is in our .gitignore)
- You can open HTML files in browser for presentations or sharing
- Note the file sizes are a lot bigger than .Rmd files
#### Creating a slideshow
In the front matter at the top of this document, add the following lines:
```{}
output:
ioslides_presentation
```
Looks like it needs some work but it's an ok start! See [here](https://bookdown.org/yihui/rmarkdown/ioslides-presentation.html>) for further customization.
### 3. Quarto
Quarto is an open-source scientific and technical publishing system built on `Pandoc` and uses its variation of markdown as its underlying document syntax. Quarto is able to run Python, R, Julia, and Observable code. You can use Quarto to create book documents, presentations, websites, etc. Find guides on how to make some of these kinds of projects [here](https://quarto.org/docs/guide/).
This is a new tool that comes with v2022.07 of RStudio--which to date (2023-03-07) is not available to all employees at StatCan as we typically are a year behind in R and RStudio versions.
Find a cloneable repo for books [here](https://gitlab.k8s.cloud.statcan.ca/EDLP/documentation/cloneable-quarto-and-github-pages-repo) and for bilingual websites [here](https://gitlab.k8s.cloud.statcan.ca/surge-team/reusable-code/bilingual-websites-in-quarto).
More details on Quarto to come!