generated from carpentries/workbench-template-rmd
-
-
Notifications
You must be signed in to change notification settings - Fork 9
/
cache.Rmd
155 lines (112 loc) · 5.12 KB
/
cache.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: 'Loading Workflow Objects'
teaching: 10
exercises: 2
---
```{r}
#| label: setup
#| echo: FALSE
#| message: FALSE
#| warning: FALSE
library(targets)
source("https://raw.githubusercontent.com/joelnitta/targets-workshop/main/episodes/files/functions.R?token=$(date%20+%s)") # nolint
```
:::::::::::::::::::::::::::::::::::::: questions
- Where does the workflow happen?
- How can we inspect the objects built by the workflow?
::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: objectives
- Explain where `targets` runs the workflow and why
- Be able to load objects built by the workflow into your R session
::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: instructor
Episode summary: Show how to get at the objects that we built
:::::::::::::::::::::::::::::::::::::
## Where does the workflow happen?
So we just finished running our first workflow.
Now you probably want to look at its output.
But, if we just call the name of the object (for example, `penguins_data`), we get an error.
```{r}
#| label: error
penguins_data
```
Where are the results of our workflow?
::::::::::::::::::::::::::::::::::::: instructor
- To reinforce the concept of `targets` running in a separate R session, you may want to pretend trying to run `penguins_data`, then feigning surprise when it doesn't work and using it as a teaching moment (errors are pedagogy!).
::::::::::::::::::::::::::::::::::::::::::::::::
We don't see the workflow results because `targets` **runs the workflow in a separate R session** that we can't interact with.
This is for reproducibility---the objects built by the workflow should only depend on the code in your project, not any commands you may have interactively given to R.
Fortunately, `targets` has two functions that can be used to load objects built by the workflow into our current session, `tar_load()` and `tar_read()`.
Let's see how these work.
## tar_load()
`tar_load()` loads an object built by the workflow into the current session.
Its first argument is the name of the object you want to load.
Let's use this to load `penguins_data` and get an overview of the data with `summary()`.
```{r}
#| label: targets-run-hide
#| echo: FALSE
# When building the Rmd, each instance of the workflow is isolated, so need
# to re-run
tar_dir({
write_example_plan(1)
tar_make(reporter = "silent")
penguins_csv_file_hide <- tar_read(penguins_csv_file)
penguins_data_hide <- tar_read(penguins_data)
})
```
```{r}
#| label: targets-load-show
#| eval: FALSE
tar_load(penguins_data)
summary(penguins_data)
```
```{r}
#| label: targets-load-hide
#| echo: FALSE
summary(penguins_data_hide)
```
Note that `tar_load()` is used for its **side-effect**---loading the desired object into the current R session.
It doesn't actually return a value.
## tar_read()
`tar_read()` is similar to `tar_load()` in that it is used to retrieve objects built by the workflow, but unlike `tar_load()`, it returns them directly as output.
Let's try it with `penguins_csv_file`.
```{r}
#| label: targets-read-show
#| eval: FALSE
tar_read(penguins_csv_file)
```
```{r}
#| label: targets-read-hide
#| echo: FALSE
penguins_csv_file_hide
```
We immediately see the contents of `penguins_csv_file`.
But it has not been loaded into the environment.
If you try to run `penguins_csv_file` now, you will get an error:
```{r}
#| label: error-2
penguins_csv_file
```
## When to use which function
`tar_load()` tends to be more useful when you want to load objects and do things with them.
`tar_read()` is more useful when you just want to immediately inspect an object.
## The targets cache
If you close your R session, then re-start it and use `tar_load()` or `tar_read()`, you will notice that it can still load the workflow objects.
In other words, the workflow output is **saved across R sessions**.
How is this possible?
You may have noticed a new folder has appeared in your project, called `_targets`.
This is the **targets cache**.
It contains all of the workflow output; that is how we can load the targets built by the workflow even after quitting then restarting R.
**You should not edit the contents of the cache by hand** (with one exception).
Doing so would make your analysis non-reproducible.
The one exception to this rule is a special subfolder called `_targets/user`.
This folder does not exist by default.
You can create it if you want, and put whatever you want inside.
Generally, `_targets/user` is a good place to store files that are not code, like data and output.
Note that if you don't have anything in `_targets/user` that you need to keep around, it is possible to "reset" your workflow by simply deleting the entire `_targets` folder. Of course, this means you will need to run everything over again, so don't do this lightly!
::::::::::::::::::::::::::::::::::::: keypoints
- `targets` workflows are run in a separate, non-interactive R session
- `tar_load()` loads a workflow object into the current R session
- `tar_read()` reads a workflow object and returns its value
- The `_targets` folder is the cache and generally should not be edited by hand
::::::::::::::::::::::::::::::::::::::::::::::::