You should look at your data. Graphs and charts let you explore and learn about the structure of the information you collect. Good data visualizations also make it easier to communicate your ideas and findings to other people. Beyond that, producing effective plots from your own data is the best way to develop a good eye for reading and understanding graphs — good and bad — made by others, whether presented in research articles, business slide decks, public policy advocacy, or media reports.(Kieran Healy: Data Visualization).
To create a powerful graph, it is a good starting principle that all of our decisions should be guided by the usage of the graph: a summary concept to capture what we want to show and to whom. Its main elements are purpose, focus, and audience. Once usage is clear, the first set of decisions to make is about how we convey information: how to show what we want to show. For those decisions it is helpful to understand the entire graph as the overlay of three graphical objects:
- Geometric object; the geometric visualization of the information we want to convey, such as a set of bars, a set of points, or a line; multiple geometric objects may be combined.
- Scaffolding: elements that support understanding the geometric object, such as axes, labels, and legends.
- Annotation: adding anything else to emphasize specific values or explain more detail.
Keeping these in mind this lecture introduces students to how to create graphs that take into account these principles.
This lecture extends the tools to create and manipulate plots with ggplot2
. After the lecture students should be able to create their personalized theme and create reportable graphs for almost all cases.
Case studies used/related in/to this lecture:
- Chapter 03, B Comparing hotel prices in Europe: Vienna vs London is the base for this lecture.
- Some tools are used in Chapter 04, A Management quality and firm size: describing patterns of association
- Understanding theme_bg, which is the main theme used in da_case_studies
After completing ggplot_indepth.R
, students should be able to:
- use pre-written themes from
ggplot2
andggthemes
- write own theme and call it via
source()
function- set different colors for background, axis, etc
- set font size for different elements
- manipulating axis with
scale_*_continuous
andscale_*_discrete
, where*
stands fory
orx
- set limits
- set break points
- add annotation to a plot
- lines, dots and text
- bar charts:
- simple
- stacked
- stacked with percentages, using
scales
package
- box plot
- violine plot
- import
theme_bg()
from url viasource_url()
fromdevtools
- extra task to annotate a grouped box-plot with using:
grid
andpBrackets
packages to place annotation with arrows- use
color[x]
color values fromtheme_bg()
Ideal overall time: 30-60mins.
Showing ggplot_indepth.R
takes around 30 minutes while doing the tasks would take approx 10-15 minutes. theme_bluewhite.R
would take another 5-15 minutes.
Type: quick practice, approx 15 mins
- students need to create their own theme. Encourage them to use it during the course (and in other courses).
- Two files:
homework_ggpplot_runfile.R
is the evaluation file, where students need to call their theme file and do some partial coding.theme_RENAMEME.R
is the skeleton for the theme and the student need to change the name of this script. This includes the main task: the creation of the theme.
- Kieran Healy: Data Visualization, Chapter 3 ggplot in general, Chapter 4 lines, histograms, and bar graphs Chapter 5 labels, coloring and transforming Chapter 8 custom colors, themes, complex and stacked graphs.
- Hadley Wickham and Garrett Grolemund: R for Data Science, Chapter 3 introduces
ggplot
and show some features of how to visualize data. Chapter 28 discusses some more advanced topics withggplot
and communicating with a good graph. - Winston Chang: R Graphics Cookbook is a great book all about graphics in general with R.
- Andrew Heiss: Data Visualization with R in general focuses on visualization with ggplot.
- Official webpage of
ggplot2
is very well documented and can be handy. Also look at the references there, which point to online courses and youtube material. - Some useful materials on creating your own theme: datanovia, ggplot2's description, Peng, Kross and Anderson: Mastering Software Development in R, Chapter 4.6 or towardsdatascience.com.
- raw_codes includes one code, which is ready to use during the course but requires some live coding in class.
ggplot_indepth.R
is the main course material.theme_bluewhite.R
shows an example for a user defined theme forggplot
.homework_ggpplot_runfile.R
is the homework main file.theme_RENAMEME.R
is the skeleton for the theme and the student need to change the name of this script.
- complete_codes includes one code with solutions for
ggplot_indepth.R_fin
is the completed file forggplot_indepth.R