forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
scales.Rmd
603 lines (448 loc) · 32 KB
/
scales.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
```{r scales-chap, include = FALSE}
source("common.R")
columns(1, 2 / 3)
```
# Scales {#scales}
Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. They also provide the tools that let you interpret the plot: the axes and legends. You can generate many plots without knowing how scales work, but understanding scales and learning how to manipulate them will give you much more control.
\index{Scales}
Formally, each scale is a function from a region in data space (the domain of the scale) to a region in aesthetic space (the range of the scale). The axis or legend is the inverse function: it allows you to convert visual properties back to data. In ggplot2 these two roles are controlled by the same toolkit (the scale functions), but for expository reasons they are discussed separately in this book. In this chapter I will focus on the mapping from the data to the aesthetics, and in Chapter \@ref(guides) I will return to axes and legends.
Scales can be divided roughly into four families:
* Position scales, used to map integer, numeric, and date/time
data to x and y position (Section \@ref(scale-position)).
* Colour scales, used to map continuous and discrete data to colours (Section \@ref(scale-colour)).
* Manual scales, used to map discrete variables to your choice of size,
line type, shape or colour (Section \@ref(scale-manual)).
* The identity scale, paradoxically used to plot variables _without_
scaling them. This is useful if your data is already a vector of
colour names (Section \@ref(scale-identity)).
Before diving into the details for each scale type, Section \@ref(scale-usage) provides an overview of how scales are specified in ggplot2 and Section \@ref(limits) shows how to adjust the limits of a scale.
## Specifying scales {#scale-usage}
An important property of ggplot2 is the principle that every aesthetic in your plot is associated with exactly one scale. For instance, when you write:
```{r default-scales, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
```
What actually happens is this:
```{r, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
```
The original code did not specify any scales, so ggplot2 adds a default scale for each of the aesthetics used in the plot. The choice of default scale depends on both the aesthetic and the variable type. In this example `hwy` is a continuous variable mapped to the y aesthetic so the default scale is `scale_y_continuous()`; similarly `class` is discrete so when mapped to the colour aesthetic the default scale becomes `scale_colour_discrete()`. It would be tedious to manually add a scale every time you used a new aesthetic, so ggplot2 does it for you. But if you want to override the defaults, you'll need to add the scale yourself, like this: \index{Scales!defaults}
```{r, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous("A really awesome x axis label") +
scale_y_continuous("An amazingly great y axis label")
```
The use of `+` to "add" scales to a plot is a little misleading. Following the principle that every aesthetic has exactly one scale, when two scale specifications conflict the latter takes precedence. In other words, when you `+` a scale, you're not actually adding it to the plot, but overriding the existing scale. This means that the following two specifications are equivalent: \indexc{+}
```{r multiple-scales, fig.show = "hide", message = TRUE}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous("Label 1") +
scale_x_continuous("Label 2")
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous("Label 2")
```
Note the message: if you see this in your own code, you need to reorganise your code specification to only add a single scale.
A second thing to note is that there are multiple ways to specify a valid scale for an aesthetic. In the examples above I have tweaked the options of the default scales, but you can also override them completely with new scales:
```{r, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_sqrt() +
scale_colour_brewer()
```
You’ve probably already figured out the naming scheme for scales, but to be concrete, it’s made up of three pieces separated by "_":
1. `scale`
1. The name of the aesthetic (e.g., `colour`, `shape` or `x`)
1. The name of the scale (e.g., `continuous`, `discrete`, `brewer`).
### Exercises
1. What happens if you pair a discrete variable to a continuous scale?
What happens if you pair a continuous variable to a discrete scale?
1. Simplify the following plot specifications to make them easier to
understand.
```{r, eval = FALSE}
ggplot(mpg, aes(displ)) +
scale_y_continuous("Highway mpg") +
scale_x_continuous() +
geom_point(aes(y = hwy))
ggplot(mpg, aes(y = displ, x = class)) +
scale_y_continuous("Displacement (l)") +
scale_x_discrete("Car type") +
scale_x_discrete("Type of car") +
scale_colour_discrete() +
geom_point(aes(colour = drv)) +
scale_colour_discrete("Drive\ntrain")
```
## Adjusting scale limits {#limits}
Scales in ggplot2 are typically associated with limits that specify the domain over which the scale is defined and are usually derived from the range of the data. \index{Axis!limits} \index{Scales!limits} There are two reasons you might want to specify limits rather than relying on the data:
1. You want to shrink the limits to focus on an interesting area of the plot.
1. You want to expand the limits to make multiple plots match up.
It's most natural to think about the limits of position scales: they map directly to the ranges of the axes. But limits also apply to scales that have legends, like colour, size, and shape, and these limits are particularly important if you want colours to be consistent across multiple plots. To modify the limits of a scale, specify the `limits` parameter of the scale:
* For continuous scales, `limits` should be a numeric vector of length two.
If you only want to set the upper or lower limit, you can set the other value
to `NA`.
* For discrete scales, `limits` is a character vector which enumerates all possible
values.
A minimal example is shown below. In the left panel the limits of the x scale are set to the default values (the range of the data), the middle panel expands the limits, and the right panel shrinks them:
`r columns(3)`
```{r, messages = FALSE}
df <- data.frame(x = 1:3, y = 1:3)
base <- ggplot(df, aes(x, y)) + geom_point()
base
base + scale_x_continuous(limits = c(0, 4))
base + scale_x_continuous(limits = c(1.5, 2.5))
```
(The warning produced in the final example is discussed later in this section)
### Convenience functions
Because modifying scale limits is such a common task, ggplot2 provides some convenience functions to make this easier. For position scales the `xlim()` and `ylim()` helper functions inspect their input and then specify the appropriate scale for the x and y axes respectively. More generally, the `lims()` function takes name-value pairs as input, where the name specifies the aesthetic and the value specifies the limits. The results depend on the type of scale: \indexf{xlim} \indexf{ylim}
* `xlim(10, 20)`: a continuous scale from 10 to 20
* `ylim(20, 10)`: a reversed continuous scale from 20 to 10
* `xlim("a", "b", "c")`: a discrete scale
* `xlim(as.Date(c("2008-05-01", "2008-08-01")))`: a date scale from May 1 to August 1 2008 (date scales are discussed in Section \@ref(date-scales))
Three examples using the `base` plot are shown below:
```{r, messages = FALSE}
base + xlim(0, 4)
base + xlim(4, 0)
base + lims(x = c(0, 4))
```
### Visual range expansion
If you have eagle eyes, you'll have noticed that the visual range of the axes actually extends a little bit past the numerical limits that I have specified in the various examples. This ensures that the data does not overlap the axes, which is usually (but not always) desirable. To eliminate this space, set `expand = c(0, 0)`. One scenario where it is usually preferable to remove this space is when using `geom_raster()`: \index{Axis!expansion}
`r columns(2, 1, 0.66)`
```{r}
ggplot(faithfuld, aes(waiting, eruptions)) +
geom_raster(aes(fill = density)) +
theme(legend.position = "none")
ggplot(faithfuld, aes(waiting, eruptions)) +
geom_raster(aes(fill = density)) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0)) +
theme(legend.position = "none")
```
In the plot on the left the extra space is distracting, so the plot on the right would usually be preferable.
### Out of bounds values
Recall that an earlier plot in this section produced a warning about missing values when the scale limits were reduced. By default, ggplot2 converts data outside the limits to `NA`. This means that changing the limits of a scale is not precisely the same as visually zooming in to a region of the plot. If you want to do the latter, you should use the `xlim` and `ylim` arguments to `coord_cartesian()`, described in Section \@ref(cartesian), which performs purely visual zooming and does not affect the underlying data. \index{Zooming} To override the default behaviour you can set the `oob` (out of bounds) argument to the scale, a function that is applied to all observations outside the scale limits. The default `scales::censor()` which replaces any value outside the limits with `NA`. Another option is `scales::squish()` which squishes all values into the range. An example using a fill scale is shown below:
```{r}
df <- data.frame(x = 1:5)
p <- ggplot(df, aes(x, 1)) + geom_tile(aes(fill = x), colour = "white")
p
p + scale_fill_gradient(limits = c(2, 4))
p + scale_fill_gradient(limits = c(2, 4), oob = scales::squish)
```
On the left the default fill colours are shown. In the middle panel when the scale limits are reduced the values for the two tiles on the end are replace with `NA` and are mapped to a grey shade. In most cases this is not the desired behaviour, so the right panel fixes this by modifying the `oob` function appropriately.
### Exercises
1. The following code creates two plots of the mpg dataset. Modify the code
so that the legend and axes match, without using facetting!
`r columns(2, 2/3)`
```{r}
fwd <- subset(mpg, drv == "f")
rwd <- subset(mpg, drv == "r")
ggplot(fwd, aes(displ, hwy, colour = class)) + geom_point()
ggplot(rwd, aes(displ, hwy, colour = class)) + geom_point()
```
1. What does `expand_limits()` do and how does it work? Read the source code.
1. What happens if you add two `xlim()` calls to the same plot? Why?
1. What does `scale_x_continuous(limits = c(NA, NA))` do?
## Position scales {#scale-position}
Every plot in ggplot2 has two position scales corresponding to the x and y aesthetics. Typically the user specifies the variables mapped to x and y explicitly, but sometimes an aesthetic is mapped to a computed variable, as happens with `geom_histogram()`. The following plot specifications are equivalent:
\index{Scales!position} \index{Positioning!scales}
```{r, fig.show='hide', message = FALSE}
ggplot(mpg, aes(x = hwy)) + geom_histogram()
ggplot(mpg, aes(x = hwy, y = stat(count))) + geom_histogram()
```
In the first version the y aesthetic is not directly specified by the user but is instead supplied by the geom (or more precisely, the stat). In the second version this is made explicit, and the y aesthetic is mapped to the variable `count` computed by the stat. In both cases, the x and y aesthetics both exist and are mapped to some variable: accordingly they are both associated with a scale.
Position scales in ggplot2 can be used to represent continuous variables and discrete ones. I discuss continuous position scales first.
### Continuous position scales
The most common continuous position scales are `scale_x_continuous()` and `scale_y_continuous()`. By default, these scales map the data to the x and y axis linearly, but you can override this default using transformations. Every continuous scale takes a `trans` argument, allowing the use of a variety of transformations:
\index{Scales!position} \index{Transformation!scales} \indexf{scale\_x\_continuous}
```{r}
# Convert from fuel economy to fuel consumption
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(trans = "reciprocal")
# Log transform x and y axes
ggplot(diamonds, aes(price, carat)) +
geom_bin2d() +
scale_x_continuous(trans = "log10") +
scale_y_continuous(trans = "log10")
```
The transformation is carried out by a "transformer", which describes the transformation, its inverse, and how to draw the labels. You can construct your own transformer using `scales::trans_new()`, but --- as the plots above illustrate --- ggplot2 understands many common transformations supplied by the scales package. The following table lists the most common variants:
| Name | Function $f(x)$ | Inverse $f^{-1}(y)$
|-----------|-------------------------|------------------------
| asn | $\tanh^{-1}(x)$ | $\tanh(y)$
| exp | $e ^ x$ | $\log(y)$
| identity | $x$ | $y$
| log | $\log(x)$ | $e ^ y$
| log10 | $\log_{10}(x)$ | $10 ^ y$
| log2 | $\log_2(x)$ | $2 ^ y$
| logit | $\log(\frac{x}{1 - x})$ | $\frac{1}{1 + e(y)}$
| pow10 | $10^x$ | $\log_{10}(y)$
| probit | $\Phi(x)$ | $\Phi^{-1}(y)$
| reciprocal| $x^{-1}$ | $y^{-1}$
| reverse | $-x$ | $-y$
| sqrt | $x^{1/2}$ | $y ^ 2$
To simplify matters, ggplot2 provides wrapper functions for the most common transformations: `scale_x_log10()`, `scale_x_sqrt()` and `scale_x_reverse()` provide the relevant transformation on the x axis, with similar functions provided for the y axis. \index{Log!scale} \indexf{scale\_x\_log10}
Note that there is nothing preventing you from performing the transformation manually. For example, instead of using `scale_x_log10()` to transform the scale, you could transform the data instead and plot `log10(x)`. The appearance of the geom will be the same, but the tick labels will be different. Specifically, if you use a transformed scale, the axes will be labelled in the original data space; if you transform the data, the axes will be labelled in the transformed space. Regardless of which method you use, the transformation occurs before any statistical summaries. To transform _after_ statistical computation use `coord_trans()`. See Section \@ref(cartesian) for more details.
### Discrete position scales
The ggplot2 package also allows you to map discrete variables to position scales, with the default scales being `scale_x_discrete()` and `scale_y_discrete()` in this case. For example, the following two plot specifications are equivalent
```{r default-scales-discrete, fig.show = "hide"}
ggplot(mpg, aes(x = hwy, y = class)) +
geom_point()
ggplot(mpg, aes(x = hwy, y = class)) +
geom_point() +
scale_x_continuous() +
scale_y_discrete()
```
Internally, ggplot2 treats handles discrete scales by mapping each category to an integer value and then drawing the geom at the corresponding coordinate location. To illustrate this, we can add a custom annotation (see Section \@ref(custom-annotations)) to the plot:
```{r}
ggplot(mpg, aes(x = hwy, y = class)) +
geom_point() +
annotate("text", x = 5, y = 1:7, label = 1:7)
```
Another variation on discrete position scales are binned scales, in which a continuous variable is sliced into multiple bins and the discretised variable is plotted. For example, if we want to modify the plot above to show the number of observations at each location, we could use `geom_count()` instead of `geom_point()` so that the size of the dots scales with the number of observations. As the left plot below illustrates, this is an improvement but is still rather cluttered. To improve this, the plot on the right uses `scale_x_binned()` to cut the `hwy` values into 10 bins before passing them to the geom:
```{r}
ggplot(mpg, aes(hwy, class)) +
geom_count()
ggplot(mpg, aes(hwy, class)) +
geom_count() +
scale_x_binned(n.breaks = 10)
```
## Colour scales {#scale-colour}
After position, the most commonly used aesthetic is colour. There are quite a few different ways of mapping values to colours in ggplot2: four different gradient-based methods for continuous values, and two methods for mapping discrete values. But before we look at the details of the different methods, it's useful to learn a little bit of colour theory. Colour theory is complex because the underlying biology of the eye and brain is complex, and this introduction will only touch on some of the more important issues. An excellent and more detailed exposition is available online at <http://tinyurl.com/clrdtls>. \index{Colour} \index{Scales!colour}
At the physical level, colour is produced by a mixture of wavelengths of light. To characterise a colour completely, we need to know the complete mixture of wavelengths. Fortunately for us the human eye only has three different colour receptors, and so we can summarise the perception of any colour with just three numbers. You may be familiar with the RGB encoding of colour space, which defines a colour by the intensities of red, green and blue light needed to produce it. One problem with this space is that it is not perceptually uniform: the two colours that are one unit apart may look similar or very different depending on where they are in the colour space. This makes it difficult to create a mapping from a continuous variable to a set of colours. There have been many attempts to come up with colours spaces that are more perceptually uniform. We'll use a modern attempt called the HCL colour space, which has three components of **h**ue, **c**hroma and **l**uminance: \index{Colour!spaces}
* Hue is a number between 0 and 360 (an angle) which gives the "colour" of
the colour: like blue, red, orange, etc.
* Chroma is the purity of a colour. A chroma of 0 is grey, and the maximum
value of chroma varies with luminance.
* Luminance is the lightness of the colour. A luminance of 0 produces black,
and a luminance of 1 produces white.
Hues are not perceived as being ordered: e.g. green does not seem "larger" than red. The perception of chroma and luminance are ordered.
The combination of these three components does not produce a simple geometric shape. Figure \@ref(fig:hcl) attempts to show the 3d shape of the space. Each slice is a constant luminance (brightness) with hue mapped to angle and chroma to radius. You can see the centre of each slice is grey and the colours get more intense as they get closer to the edge.
```{r hcl, echo = FALSE, out.width = "100%", fig.cap="The shape of the HCL colour space. Hue is mapped to angle, chroma to radius and each slice shows a different luminance. The HCL space is a pretty odd shape, but you can see that colours near the centre of each slice are grey, and as you move towards the edges they become more intense. Slices for luminance 0 and 100 are omitted because they would, respectively, be a single black point and a single white point."}
knitr::include_graphics("diagrams/hcl-space.png", dpi = 300)
```
An additional complication is that many people (~10% of men) do not possess the normal complement of colour receptors and so can distinguish fewer colours than usual. \index{Colour!blindness} In brief, it's best to avoid red-green contrasts, and to check your plots with systems that simulate colour blindness. Visicheck is one online solution. Another alternative is the **dichromat** package [@dichromat] which provides tools for simulating colour blindness, and a set of colour schemes known to work well for colour-blind people. You can also help people with colour blindness in the same way that you can help people with black-and-white printers: by providing redundant mappings to other aesthetics like size, line type or shape.
### Continuous colour scales {#ssub:colour-continuous}
Colour gradients are often used to show the height of a 2d surface. In the following example we'll use the surface of a 2d density estimate of the `faithful` dataset [@azzalini:1990], which records the waiting time between eruptions and during each eruption for the Old Faithful geyser in Yellowstone Park. I hide the legends and set `expand` to 0, to focus on the appearance of the data. Remember: I'm illustrating these scales with filled tiles, but you can also use them with coloured lines and points.
\index{Colour!gradients} \index{Scales!colour}
`r columns(3, 1)`
```{r}
erupt <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
geom_raster() +
scale_x_continuous(NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, expand = c(0, 0)) +
theme(legend.position = "none")
```
There are four continuous colour scales:
* `scale_colour_gradient()` and `scale_fill_gradient()`: a two-colour gradient,
low-high (light blue-dark blue). This is the default scale for continuous
colour, and is the same as `scale_colour_continuous()`. Arguments `low` and
`high` control the colours at either end of the gradient.
\indexf{scale\_colour\_gradient} \indexf{scale\_fill\_gradient}
Generally, for continuous colour scales you want to keep hue constant, and
vary chroma and luminance. The munsell colour system is useful for this
as it provides an easy way of specifying colours based on their hue,
chroma and luminance. Use `munsell::hue_slice("5Y")` to see the valid
chroma and luminance values for a given hue.
```{r}
erupt
erupt + scale_fill_gradient(low = "white", high = "black")
erupt + scale_fill_gradient(
low = munsell::mnsl("5G 9/2"),
high = munsell::mnsl("5G 6/8")
)
```
* `scale_colour_gradient2()` and `scale_fill_gradient2()`: a three-colour
gradient, low-med-high (red-white-blue). As well as `low` and `high`
colours, these scales also have a `mid` colour for the colour of the
midpoint. The midpoint defaults to 0, but can be set to any value with
the `midpoint` argument. \indexf{scale\_colour\_gradient2}
\indexf{scale\_fill\_gradient2}
It's artificial to use this colour scale with this dataset, but we can
force it by using the median of the density as the midpoint. Note that the
blues are much more intense than the reds (which you only see as a very pale
pink)
```{r}
mid <- median(faithfuld$density)
erupt + scale_fill_gradient2(midpoint = mid)
```
* `scale_colour_gradientn()` and `scale_fill_gradientn()`: a custom n-colour
gradient. This is useful if you have colours that are meaningful for your
data (e.g., black body colours or standard terrain colours), or you'd like
to use a palette produced by another package. The following code includes
palettes generated from routines in the **colorspace** package.
[@zeileis:2008] describes the philosophy behind these palettes and provides
a good introduction to some of the complexities of creating good colour
scales. \index{Colour!palettes} \indexf{scale\_colour\_gradientn}
\indexf{scale\_fill\_gradientn}
```{r colorspace}
erupt + scale_fill_gradientn(colours = terrain.colors(7))
erupt + scale_fill_gradientn(colours = colorspace::heat_hcl(7))
erupt + scale_fill_gradientn(colours = colorspace::diverge_hcl(7))
```
By default, `colours` will be evenly spaced along the range of the data. To
make them unevenly spaced, use the `values` argument, which should be a
vector of values between 0 and 1.
* `scale_color_distiller()` and `scale_fill_distiller()` apply the ColorBrewer
colour scales to continuous data. You use it the same way as
`scale_fill_brewer()`, described below:
```{r}
erupt + scale_fill_distiller()
erupt + scale_fill_distiller(palette = "RdPu")
erupt + scale_fill_distiller(palette = "YlOrBr")
```
All continuous colour scales have an `na.value` parameter that controls what colour is used for missing values (including values outside the range of the scale limits). By default it is set to grey, which will stand out when you use a colourful scale. If you use a black and white scale, you might want to set it to something else to make it more obvious. \indexc{na.value} \index{Missing values!changing colour}
```{r}
df <- data.frame(x = 1, y = 1:5, z = c(1, 3, 2, NA, 5))
p <- ggplot(df, aes(x, y)) + geom_tile(aes(fill = z), size = 5)
p
# Make missing colours invisible
p + scale_fill_gradient(na.value = NA)
# Customise on a black and white scale
p + scale_fill_gradient(low = "black", high = "white", na.value = "red")
```
### Discrete colour scales {#ssub:colour-discrete}
There are four colour scales for discrete data. We illustrate them with a barchart that encodes both position and fill to the same variable: \index{Colour!discrete scales}
`r columns(3, 1)`
```{r}
df <- data.frame(x = c("a", "b", "c", "d"), y = c(3, 4, 1, 2))
bars <- ggplot(df, aes(x, y, fill = x)) +
geom_bar(stat = "identity") +
labs(x = NULL, y = NULL) +
theme(legend.position = "none")
```
* The default colour scheme, `scale_colour_hue()`, picks evenly spaced hues
around the HCL colour wheel. This works well for up to about eight colours,
but after that it becomes hard to tell the different colours apart. You
can control the default chroma and luminance, and the range of hues, with
the `h`, `c` and `l` arguments:
```{r}
bars
bars + scale_fill_hue(c = 40)
bars + scale_fill_hue(h = c(180, 300))
```
One disadvantage of the default colour scheme is that because the colours
all have the same luminance and chroma, when you print them in black and
white, they all appear as an identical shade of grey.
\indexf{scale\_colour\_hue}
* `scale_colour_brewer()` uses handpicked "ColorBrewer" colours,
<http://colorbrewer2.org/>. These colours have been designed to work well
in a wide variety of situations, although the focus is on maps and so the
colours tend to work better when displayed in large areas. For categorical
data, the palettes most of interest are 'Set1' and 'Dark2' for points and
'Set2', 'Pastel1', 'Pastel2' and 'Accent' for areas. Use
`RColorBrewer::display.brewer.all()` to list all palettes.
\index{Colour!Brewer} \indexf{scale\_colour\_brewer}
```{r}
bars + scale_fill_brewer(palette = "Set1")
bars + scale_fill_brewer(palette = "Set2")
bars + scale_fill_brewer(palette = "Accent")
```
* `scale_colour_grey()` maps discrete data to grays, from light to dark.
\indexf{scale\_colour\_grey} \index{Colour!greys}
```{r}
bars + scale_fill_grey()
bars + scale_fill_grey(start = 0.5, end = 1)
bars + scale_fill_grey(start = 0, end = 0.5)
```
* `scale_colour_manual()` is useful if you have your own discrete colour
palette. The following examples show colour palettes inspired by
Wes Anderson movies, as provided by the wesanderson package,
<https://github.com/karthik/wesanderson>. These
are not designed for perceptual uniformity, but are fun!
\indexf{scale\_colour\_manual} \index{wesanderson}
```{r}
library(wesanderson)
bars + scale_fill_manual(values = wes_palette("GrandBudapest1"))
bars + scale_fill_manual(values = wes_palette("Zissou1"))
bars + scale_fill_manual(values = wes_palette("Rushmore1"))
```
Note that one set of colours is not uniformly good for all purposes: bright colours work well for points, but are overwhelming on bars. Subtle colours work well for bars, but are hard to see on points:
`r columns(3, 1)`
```{r brewer-pal}
# Bright colours work best with points
df <- data.frame(x = 1:3 + runif(30), y = runif(30), z = c("a", "b", "c"))
point <- ggplot(df, aes(x, y)) +
geom_point(aes(colour = z)) +
theme(legend.position = "none") +
labs(x = NULL, y = NULL)
point + scale_colour_brewer(palette = "Set1")
point + scale_colour_brewer(palette = "Set2")
point + scale_colour_brewer(palette = "Pastel1")
```
```{r}
# Subtler colours work better with areas
df <- data.frame(x = 1:3, y = 3:1, z = c("a", "b", "c"))
area <- ggplot(df, aes(x, y)) +
geom_bar(aes(fill = z), stat = "identity") +
theme(legend.position = "none") +
labs(x = NULL, y = NULL)
area + scale_fill_brewer(palette = "Set1")
area + scale_fill_brewer(palette = "Set2")
area + scale_fill_brewer(palette = "Pastel1")
```
## Manual scales {#scale-manual}
The discrete scales --- for example `scale_linetype()`, `scale_shape()`, and `scale_colour_discrete()` --- basically have no options. These scales are just a list of valid values that are mapped to the unique discrete values. \index{Shape} \index{Line type} \index{Size} \indexf{scale\_shape\_manual} \indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual}
If you want to customise these scales, you need to create your own new scale with the "manual" version of each: `scale_linetype_manual()`, `scale_shape_manual()`, `scale_colour_manual()`, etc. The manual scale has one important argument, `values`, where you specify the values that the scale should produce. If this vector is named, it will match the values of the output to the values of the input; otherwise it will match in order of the levels of the discrete variable. You will need some knowledge of the valid aesthetic values, which are described in `vignette("ggplot2-specs")`.
The following code demonstrates the use of `scale_colour_manual()`:
`r columns(2, 2/3)`
```{r scale-manual}
plot <- ggplot(msleep, aes(brainwt, bodywt)) +
scale_x_log10() +
scale_y_log10()
plot +
geom_point(aes(colour = vore)) +
scale_colour_manual(
values = c("red", "orange", "green", "blue"),
na.value = "grey50"
)
colours <- c(
carni = "red",
insecti = "orange",
herbi = "green",
omni = "blue"
)
plot +
geom_point(aes(colour = vore)) +
scale_colour_manual(values = colours)
```
<!-- DN: Really not sure what to do with this example... -->
<!-- The following example shows a creative use of `scale_colour_manual()` to display multiple variables on the same plot and show a useful legend. In most other plotting systems, you'd colour the lines and then add a legend: \index{Data!longitudinal} -->
<!-- `r columns(1, 5/12, 0.8)` -->
<!-- ```{r huron} -->
<!-- huron <- data.frame(year = 1875:1972, level = as.numeric(LakeHuron)) -->
<!-- ggplot(huron, aes(year)) + -->
<!-- geom_line(aes(y = level + 5), colour = "red") + -->
<!-- geom_line(aes(y = level - 5), colour = "blue") -->
<!-- ``` -->
<!-- That doesn't work in ggplot because there's no way to add a legend manually. Instead, give the lines informative labels: -->
<!-- ```{r huron2} -->
<!-- ggplot(huron, aes(year)) + -->
<!-- geom_line(aes(y = level + 5, colour = "above")) + -->
<!-- geom_line(aes(y = level - 5, colour = "below")) -->
<!-- ``` -->
<!-- And then tell the scale how to map labels to colours: -->
<!-- ```{r huron3} -->
<!-- ggplot(huron, aes(year)) + -->
<!-- geom_line(aes(y = level + 5, colour = "above")) + -->
<!-- geom_line(aes(y = level - 5, colour = "below")) + -->
<!-- scale_colour_manual("Direction", -->
<!-- values = c("above" = "red", "below" = "blue") -->
<!-- ) -->
<!-- ``` -->
## Identity scales {#scale-identity}
The identity scale is used when your data is already scaled, when the data and aesthetic spaces are the same. The code below shows an example where the identity scale is useful. `luv_colours` contains the locations of all R's built-in colours in the LUV colour space (the space that HCL is based on). A legend is unnecessary, because the point colour represents itself: the data and aesthetic spaces are the same. \index{Scales!identity} \indexf{scale\_identity}
`r columns(1, 1, 0.75)`
```{r scale-identity}
head(luv_colours)
ggplot(luv_colours, aes(u, v)) +
geom_point(aes(colour = col), size = 3) +
scale_color_identity() +
coord_equal()
```
### Exercises
1. Compare and contrast the four continuous colour scales with the four discrete scales.
1. Explore the distribution of the built-in `colors()` using the `luv_colours` dataset.