forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
guides.Rmd
455 lines (330 loc) · 20.4 KB
/
guides.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
```{r guides-chap, include = FALSE}
source("common.R")
columns(1, 2 / 3)
```
# Guides: legends and axes {#guides}
In Chapter \@ref(scales) I discussed the way a scale maps a variable to an aesthetic. This chapter is a natural extension of the last and discusses the role scales play in controlling the __guide__ (the axis or legend associated with the scale). Guides allow you to read observations from the plot and map them back to their original values. In ggplot2, guides are produced automatically based on the layers in your plot. This is very different to base R graphics, where you are responsible for drawing the legends by hand. In ggplot2, you don't directly control the legend; instead you set up the data so that there's a clear mapping between data and aesthetics, and a legend is generated for you automatically. This can be frustrating when you first start using ggplot2, but once you get the hang of it, you'll find that it saves you time, and there is little you cannot do. If you're struggling to get the legend you want, it's likely that your data is in the wrong form.
You might find it surprising that axes and legends are the same type of thing, but while they look very different there are many natural correspondences between the two, as shown in table below and in Figure \@ref(fig:guides). \index{Guides} \index{Legend} \index{Axis}
```{r guides, echo = FALSE, out.width = "100%", fig.cap="Axis and legend components."}
knitr::include_graphics("diagrams/scale-guides.png", dpi = 300, auto_pdf = TRUE)
```
| Axis | Legend | Argument name
|-------------------|---------------|-----------------
| Label | Title | `name`
| Ticks & grid line | Key | `breaks`
| Tick label | Key label | `labels`
The early sections of this chapter highlight functionality that is shared by axes and legends. Section \@ref(scale-name) discusses the `name` argument, while Section \@ref(breaks-labels) covers the `breaks` and `labels` arguments in more detail. However, legends are more complicated than axes because:
1. A legend can display multiple aesthetics (e.g. colour and shape), from
multiple layers, and the symbol displayed in a legend varies based on the
geom used in the layer.
1. Axes always appear in the same place. Legends can appear in different
places, so you need some global way of controlling them.
1. Legends have considerably more details that can be tweaked: should they
be displayed vertically or horizontally? How many columns? How big should
the keys be?
As a consequence there are some extra options that only apply to legends, and the later sections in the chapter focus on this legend-specific behaviour.
## Scale name {#scale-name}
The first argument to the scale function, `name`, is the axes/legend title. You can supply text strings (using `\n` for line breaks) or mathematical expressions in `quote()` (as described in `?plotmath`): \index{Axis!title} \index{Legend!title}
`r columns(2, 1 / 2)`
```{r guide-names}
df <- data.frame(x = 1:2, y = 1, z = "a")
p <- ggplot(df, aes(x, y)) + geom_point()
p + scale_x_continuous("X axis")
p + scale_x_continuous(quote(a + mathematical ^ expression))
```
Because tweaking these labels is such a common task, there are three
helpers that save you some typing: `xlab()`, `ylab()` and `labs()`:
```{r guide-names-helper}
p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
p +
xlab("X axis") +
ylab("Y axis")
p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
```
There are two ways to remove the axis label. Setting it to `""` omits the label, but still allocates space; `NULL` removes the label and its space. Look closely at the left and bottom borders of the following two plots. I've drawn a grey rectangle around the plot to make it easier to see the difference.
```{r guide-names-remove}
p <- ggplot(df, aes(x, y)) +
geom_point() +
theme(plot.background = element_rect(colour = "grey50"))
p + labs(x = "", y = "")
p + labs(x = NULL, y = NULL)
```
## Scale breaks and labels {#breaks-labels}
The `breaks` argument controls which values appear as tick marks on axes and keys on legends. Each break has an associated label, controlled by the `labels` argument. If you set `labels`, you must also set `breaks`; otherwise, if data changes, the breaks will no longer align with the labels. \index{Axis!ticks} \index{Axis!breaks} \index{Axis!labels} \index{Legend!keys}
The following code shows some basic examples for both axes and legends.
`r columns(3, 2/3)`
```{r breaks-labels}
df <- data.frame(x = c(1, 3, 5) * 1000, y = 1)
axs <- ggplot(df, aes(x, y)) +
geom_point() +
labs(x = NULL, y = NULL)
axs
axs + scale_x_continuous(breaks = c(2000, 4000))
axs + scale_x_continuous(breaks = c(2000, 4000), labels = c("2k", "4k"))
```
```{r}
leg <- ggplot(df, aes(y, x, fill = x)) +
geom_tile() +
labs(x = NULL, y = NULL)
leg
leg + scale_fill_continuous(breaks = c(2000, 4000))
leg + scale_fill_continuous(breaks = c(2000, 4000), labels = c("2k", "4k"))
```
If you want to relabel the breaks in a categorical scale, you can use a named labels vector:
`r columns(2, 2/3)`
```{r}
df2 <- data.frame(x = 1:3, y = c("a", "b", "c"))
ggplot(df2, aes(x, y)) +
geom_point()
ggplot(df2, aes(x, y)) +
geom_point() +
scale_y_discrete(labels = c(a = "apple", b = "banana", c = "carrot"))
```
To suppress breaks (and for axes, grid lines) or labels, set them to `NULL`:
```{r axs-breaks-hide}
axs + scale_x_continuous(breaks = NULL)
axs + scale_x_continuous(labels = NULL)
```
```{r leg-breaks-hide}
leg + scale_fill_continuous(breaks = NULL)
leg + scale_fill_continuous(labels = NULL)
```
### Break and label functions
Additionally, you can supply a function to `breaks` or `labels`. The `breaks` function should have one argument, the limits (a numeric vector of length two), and should return a numeric vector of breaks. The `labels` function should accept a numeric vector of breaks and return a character vector of labels (the same length as the input). The scales package provides a number of useful labelling functions:
* `scales::comma_format()` adds commas to make it easier to read large numbers.
* `scales::unit_format(unit, scale)` adds a unit suffix, optionally scaling.
* `scales::dollar_format(prefix, suffix)` displays currency values, rounding
to two decimal places and adding a prefix or suffix.
* `scales::wrap_format()` wraps long labels into multiple lines.
See the documentation of the scales package for more details.
`r columns(3)`
```{r breaks-functions}
axs + scale_y_continuous(labels = scales::percent_format())
axs + scale_y_continuous(labels = scales::dollar_format(prefix = "$"))
leg + scale_fill_continuous(labels = scales::unit_format(unit = "k", scale = 1e-3))
```
You can adjust the minor breaks (the faint grid lines that appear between the major grid lines) by supplying a numeric vector of positions to the `minor_breaks` argument. This is particularly useful for log scales: \index{Minor breaks}
`r columns(2, 2/3)`
```{r}
df <- data.frame(x = c(2, 3, 5, 10, 200, 3000), y = 1)
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_log10()
mb <- as.numeric(1:10 %o% 10 ^ (0:4))
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_log10(minor_breaks = mb)
```
Note the use of `%o%` to quickly generate the multiplication table, and that the minor breaks must be supplied on the transformed scale. \index{Log!ticks}
### Exercises
1. Recreate the following graphic:
```{r, echo = FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous("Displacement", labels = scales::unit_format(suffix = "L")) +
scale_y_continuous(quote(paste("Highway ", (frac(miles, gallon)))))
```
Adjust the y axis label so that the parentheses are the right size.
1. List the three different types of object you can supply to the
`breaks` argument. How do `breaks` and `labels` differ?
1. Recreate the following plot:
```{r, echo = FALSE}
drv_labels <- c("4" = "4wd", "f" = "fwd", "r" = "rwd")
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = drv)) +
scale_colour_discrete(labels = drv_labels)
```
1. What label function allows you to create mathematical expressions?
What label function converts 1 to 1st, 2 to 2nd, and so on?
1. What are the three most important arguments that apply to both
axes and legends? What do they do? Compare and contrast their
operation for axes vs. legends.
### Dates: A special case {#date-scales}
An important special case arises when an aesthetic is mapped to date or date/time data, which are treated as continuous variables with special labels. The ggplot2 package supports the `Date` (for dates) and `POSIXct` (for date/times) classes, as well as the `hms` class provided by the hms package [@hms]. If your dates are in a different format you will need to convert them using `as.Date()`, `as.POSIXct()` or `hms::as_hms()`. For appropriately formatted data mapped to the x aesthetic, ggplot2 will default to `scale_x_date()` and `scale_x_datetime()`, and there are similar scales for other aesthetics. In most respects date scales behave the same way as other continuous scales, but they have special behaviour for dates that is worth noting. Specifically, in addition to the usual `breaks` and `labels` arguments, date scales have special `date_breaks` and `date_labels` arguments that allow you to work in date-friendly units:
\index{Date/times} \index{Data!date/time} \index{Time} \index{Scales!date/time} \indexf{scale\_x\_datetime}
* `date_breaks` and `date_minor_breaks()` allows you to position breaks by
date units (years, months, weeks, days, hours, minutes, and seconds).
For example, `date_breaks = "2 weeks"` will place a major
tick mark every two weeks.
* `date_labels` controls the display of the labels
using the same formatting strings as in `strptime()` and `format()`:
| String | Meaning
|---------------|-----------------------------------------
| `%S` | second (00-59)
| `%M` | minute (00-59)
| `%l` | hour, in 12-hour clock (1-12)
| `%I` | hour, in 12-hour clock (01-12)
| `%p` | am/pm
| `%H` | hour, in 24-hour clock (00-23)
| `%a` | day of week, abbreviated (Mon-Sun)
| `%A` | day of week, full (Monday-Sunday)
| `%e` | day of month (1-31)
| `%d` | day of month (01-31)
| `%m` | month, numeric (01-12)
| `%b` | month, abbreviated (Jan-Dec)
| `%B` | month, full (January-December)
| `%y` | year, without century (00-99)
| `%Y` | year, with century (0000-9999)
For example, if you wanted to display dates like 14/10/1979, you would use
the string `"%d/%m/%Y"`.
The code below illustrates some of these parameters.
`r columns(2, 1 / 2)`
```{r date-scale}
base <- ggplot(economics, aes(date, psavert)) +
geom_line(na.rm = TRUE) +
labs(x = NULL, y = NULL)
base # Default breaks and labels
base + scale_x_date(date_labels = "%y", date_breaks = "5 years")
```
```{r date-scale-2}
base + scale_x_date(
limits = as.Date(c("2004-01-01", "2005-01-01")),
date_labels = "%b %y",
date_minor_breaks = "1 month"
)
base + scale_x_date(
limits = as.Date(c("2004-01-01", "2004-06-01")),
date_labels = "%m/%d",
date_minor_breaks = "2 weeks"
)
```
## Legends for multiple layers {#sub-layers-legends}
\index{Legend}
The previous sections describe features of ggplot2 that are the same for both axes and legend, but because legends are more complex than axes, there are additional topics to discuss that pertain only to legends. First among these is the fact that a legend may need to draw symbols from multiple layers. For example, if you've mapped colour to both points and lines, the keys will show both points and lines. If you've mapped fill colour, you get a rectangle. Note the way the legend varies in the plots below:
`r columns(3)`
```{r legend-geom, echo = FALSE}
df <- data.frame(x = 1, y = 1:3, z = letters[1:3])
p <- ggplot(df, aes(x, y, colour = z))
p + geom_point()
p + geom_point() + geom_path(aes(group = 1))
p + geom_raster(aes(fill = z))
```
By default, a layer will only appear if the corresponding aesthetic is mapped to a variable with `aes()`. You can override whether or not a layer appears in the legend with `show.legend`: `FALSE` to prevent a layer from ever appearing in the legend; `TRUE` forces it to appear when it otherwise wouldn't. Using `TRUE` can be useful in conjunction with the following trick to make points stand out:
`r columns(2, 2/3)`
```{r}
ggplot(df, aes(y, y)) +
geom_point(size = 4, colour = "grey20") +
geom_point(aes(colour = z), size = 2)
ggplot(df, aes(y, y)) +
geom_point(size = 4, colour = "grey20", show.legend = TRUE) +
geom_point(aes(colour = z), size = 2)
```
Sometimes you want the geoms in the legend to display differently to the geoms in the plot. This is particularly useful when you've used transparency or size to deal with moderate overplotting and also used colour in the plot. You can do this using the `override.aes` parameter of `guide_legend()`, which you'll learn more about shortly. \indexf{override.aes}
```{r}
norm <- data.frame(x = rnorm(1000), y = rnorm(1000))
norm$z <- cut(norm$x, 3, labels = c("a", "b", "c"))
ggplot(norm, aes(x, y)) +
geom_point(aes(colour = z), alpha = 0.1)
ggplot(norm, aes(x, y)) +
geom_point(aes(colour = z), alpha = 0.1) +
guides(colour = guide_legend(override.aes = list(alpha = 1)))
```
ggplot2 tries to use the fewest number of legends to accurately convey the aesthetics used in the plot. It does this by combining legends where the same variable is mapped to different aesthetics. The figure below shows how this works for points: if both colour and shape are mapped to the same variable, then only a single legend is necessary. \index{Legend!merging}
`r columns(3)`
```{r legend-merge}
ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
ggplot(df, aes(x, y)) + geom_point(aes(shape = z))
ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z))
```
In order for legends to be merged, they must have the same `name`. So if you change the name of one of the scales, you'll need to change it for all of them.
## Legend layout {#legend-layout}
A number of settings that affect the overall display of the legends are controlled through the theme system. You'll learn more about that in Section \@ref(themes), but for now, all you need to know is that you modify theme settings with the `theme()` function. \index{Themes!legend}
The position and justification of legends are controlled by the theme setting `legend.position`, which takes values "right", "left", "top", "bottom", or "none" (no legend). \index{Legend!layout}
`r columns(3, 2/3)`
```{r legend-position}
df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
base <- ggplot(df, aes(x, y)) +
geom_point(aes(colour = z), size = 3) +
xlab(NULL) +
ylab(NULL)
base + theme(legend.position = "right") # the default
base + theme(legend.position = "bottom")
base + theme(legend.position = "none")
```
Switching between left/right and top/bottom modifies how the keys in each legend are laid out (horizontal or vertically), and how multiple legends are stacked (horizontal or vertically). If needed, you can adjust those options independently:
* `legend.direction`: layout of items in legends ("horizontal" or "vertical").
* `legend.box`: arrangement of multiple legends ("horizontal" or "vertical").
* `legend.box.just`: justification of each legend within the overall bounding
box, when there are multiple legends ("top", "bottom", "left", or "right").
Alternatively, if there's a lot of blank space in your plot you might want to place the legend inside the plot. You can do this by setting `legend.position` to a numeric vector of length two. The numbers represent a relative location in the panel area: `c(0, 1)` is the top-left corner and `c(1, 0)` is the bottom-right corner. You control which corner of the legend the `legend.position` refers to with `legend.justification`, which is specified in a similar way. Unfortunately positioning the legend exactly where you want it requires a lot of trial and error.
`r columns(3, 1)`
```{r legend-position-man}
base <- ggplot(df, aes(x, y)) +
geom_point(aes(colour = z), size = 3)
base + theme(legend.position = c(0, 1), legend.justification = c(0, 1))
base + theme(legend.position = c(0.5, 0.5), legend.justification = c(0.5, 0.5))
base + theme(legend.position = c(1, 0), legend.justification = c(1, 0))
```
There's also a margin around the legends, which you can suppress with `legend.margin = unit(0, "mm")`.
## Guide functions
The guide functions, `guide_colourbar()` and `guide_legend()`, offer additional control over the fine details of the legend. Legend guides can be used for any aesthetic (discrete or continuous) while the colour bar guide can only be used with continuous colour scales.
You can override the default guide using the `guide` argument of the corresponding scale function, or more conveniently, the `guides()` helper function. `guides()` works like `labs()`: you can override the default guide associated with each aesthetic.
```{r}
df <- data.frame(x = 1, y = 1:3, z = 1:3)
base <- ggplot(df, aes(x, y)) + geom_raster(aes(fill = z))
base
base + scale_fill_continuous(guide = guide_legend())
base + guides(fill = guide_legend())
```
Both functions have numerous examples in their documentation help pages that illustrate all of their arguments. Most of the arguments to the guide function control the fine level details of the text colour, size, font etc. You'll learn about those in the themes chapter. Here I'll focus on the most important arguments.
### `guide_legend()`
The legend guide displays individual keys in a table. The most useful options are: \index{Legend!guide}
* `nrow` or `ncol` which specify the dimensions of the table. `byrow`
controls how the table is filled: `FALSE` fills it by column (the default),
`TRUE` fills it by row.
`r columns(3)`
```{r legend-rows-cols}
df <- data.frame(x = 1, y = 1:4, z = letters[1:4])
# Base plot
p <- ggplot(df, aes(x, y)) + geom_raster(aes(fill = z))
p
p + guides(fill = guide_legend(ncol = 2))
p + guides(fill = guide_legend(ncol = 2, byrow = TRUE))
```
* `reverse` reverses the order of the keys. This is particularly useful when
you have stacked bars because the default stacking and legend orders are
different:
```{r}
p <- ggplot(df, aes(1, y)) + geom_bar(stat = "identity", aes(fill = z))
p
p + guides(fill = guide_legend(reverse = TRUE))
```
* `override.aes`: override some of the aesthetic settings derived from each
layer. This is useful if you want to make the elements in the legend
more visually prominent. See discussion in
Section \@ref(sub-layers-legends).
* `keywidth` and `keyheight` (along with `default.unit`) allow you to specify
the size of the keys. These are grid units, e.g. `unit(1, "cm")`.
### `guide_colourbar`
The colour bar guide is designed for continuous ranges of colors---as its name implies, it outputs a rectangle over which the color gradient varies. The most important arguments are: \index{Legend!colour bar} \index{Colour bar}
* `barwidth` and `barheight` (along with `default.unit`) allow you to specify
the size of the bar. These are grid units, e.g. `unit(1, "cm")`.
* `nbin` controls the number of slices. You may want to increase this from
the default value of 20 if you draw a very long bar.
* `reverse` flips the colour bar to put the lowest values at the top.
These options are illustrated below:
```{r}
df <- data.frame(x = 1, y = 1:4, z = 4:1)
p <- ggplot(df, aes(x, y)) + geom_tile(aes(fill = z))
p
p + guides(fill = guide_colorbar(reverse = TRUE))
p + guides(fill = guide_colorbar(barheight = unit(4, "cm")))
```
### Exercises
1. How do you make legends appear to the left of the plot?
1. What's gone wrong with this plot? How could you fix it?
`r columns(1, 2 / 3)`
```{r}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = drv, shape = drv)) +
scale_colour_discrete("Drive train")
```
1. Can you recreate the code for this plot?
`r columns(1, 2 / 3)`
```{r, echo = FALSE}
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point(show.legend = FALSE) +
geom_smooth(method = "lm", se = FALSE) +
theme(legend.position = "bottom") +
guides(colour = guide_legend(nrow = 1))
```