A 2h30 crash course on scientific visualization...
Nicolas P. Rougier,
G-Node summer school,
University of Reading, 2016.
"Visualisation is a method of computing. It transforms the symbolic into the geometric, enabling researchers to observe their simulations and computations. Visualisation offers a method for seeing the unseen. It enriches the process of scientific discovery and fosters profound and unexpected insights."
Visualisation in Scientific Computing, NSF report, 1987.
Scientific visualization is classically defined as the process of graphically displaying scientific data. However, this process is far from direct or automatic. There are so many different ways to represent the same data: scatter plots, linear plots, bar plots, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more accurate definition for scientific visualization would be a graphical interface between people and data. But remember, there are two people in the loop: the one that produces the visualization and the one that watches it. What you intend to show might be quite different from what will be actually perceived...
The goal of this crash course is to introduce a few concepts in order for you to achieve better visualization (hopefully). If you want to go further, you'll have to look at the miscellaneous references given at the end of this document.
The visualization pipeline describes the process of creating visual representations of data, from the raw data up to the final rendering. There is no unique definition of such pipeline but most of the time you'll find at least 3 steps (filter, map, render).
- Raw data (whatever...)
- Filtered data (missing, noise, analytics, statistics, ...)
- Mapped data (geometry, attributes, colors, ...)
- Rendered data (static image, interactive display, ...)
The nature of the data has a great influence on the kind of visualization you can use. Traditionally, they are split as:
Quantitative (values or observations that can be measured)
- Continuous
- Discrete
Categorical (values or observations that can be sorted into groups or categories)
- Nominal
- Ordinal
- Interval
but you can also find finer detailed descriptions in the litterature.
In the end, a scientific figures can be fully described by a set of graphic primitives with different attributes:
- Points, markers, lines, areas, ...
- Position, color, shape, size, orientation, curvature, ...
- Helpers, text, axis, ticks,
- Interaction, animation
But describing a figure in terms of such graphic primitive would be a very tedious and complex task. This is exactly where visualization libraries or software are useful because they will automatize most of the work, more (e.g. seaborn) or less (e.g. matplotlib) depending on the library. In the ideal case, you want to only specify your data and let the library decides of almost everything (e.g. vega-lite)
From the Data visualization catalogue by Severino Ribecca.
Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away – Antoine de Saint-Exupery
From Ten simple rules for better figures, N.P. Rougier, M. Droettboom, P.E. Bourne, 2014.
- Know your audience
- Identify Your Message
- Adapt the Figure to the Support Medium
- Captions Are Not Optional
- Do Not Trust the Defaults
- Use Color Effectively
- Do Not Mislead the Reader
- Avoid “Chartjunk”
- Message Trumps Beauty
- Get the Right Tool
See also https://github.com/rougier/ten-rules
Consider the following figure and, using matplotlib, try to remove as much ink as you can while keeping the most relevant information.
You can start from the following python script:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
def gaussian(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
# Clean data
X = np.linspace(15, 21, 100)
Y = gaussian(X, 0.65, 17.6, 1.)
# Noisy dat
Xn = np.random.uniform(16, 20, 25)
Yn = gaussian(Xn, 0.65, 17.6, 1.) + 0.01 * np.random.normal(size=len(Xn))
You have a nice image and you would like to show labeled detailed sub-images alongside the main image (see below). What could be the easiest way to do that ? Be careful with the labels, they must be visible independently of the images color/contrast.
What's wrong with this graphic ? How would you correct it ?
Your article just been accepted but the editor request figure 2 to be at least 300 dpi. What does that mean ? What is the minium size (in pixels) of your figure ? Is it relevant if you figure has been saved in vector format ?
Look at Drawing a brain with Bokeh and try to replicate the final figure using matpltolib.
or
Pick one of your favorite graphic from the litterature and try to replicate it using matplotlib (and fake data).
There are any online resources about scientific visualization and a lot of excellent books as well. Since you probably not have time to read everything, I collected a small set of resources that might be read relatively rapidly.
Courses/Tutorials/Guides
- Matplotlib tutorial, N.P. Rougier, 2016.
- Ten simple rules for better figures, N.P. Rougier, M. Droettboom, P.E. Bourne, 2014.
- Scientific Visualization course, Paul Rosen, 2015.
- Information Visualization, T. Munzner, 2015.
- The Quartz guide to bad data, C. Groskopf, 2015.
- Quantitative vs. Categorical Data: A Difference Worth Knowing, S. Few , 2005.
- How to make beautiful data visualizations in Python with matplotlib, Randy Olson, 2014.
(Some) Tools
- Matplotlib, J. Hunter and M. Droettboom, 2010.
- 10 Useful Python Data Visualization Libraries for Any Discipline, M. Bierly, 2016.
- Datavisualization.ch, 2015.
- Data visualization catalogue, S. Ribecca, 2016.
- Fred's ImageMagick script, F. Weinhaus, 2016.
- TikZ and PGF, Stefan Kottwitz
Books
- Visualization Analysis and Design, T. Munzner, 2014.
- Trees, maps, and theorems, J.-L. Doumont, 2009.
- The Visual Display of Quantitative Information, E.R. Tufte, 1983.
Good examples
- A Tour through the Visualization Zoo, J. Heer, M. Bostock, and V. Ogievetsky, 2010.
- The most misleading charts of 2015, fixed, K. Collins, 2015.
- Data is beautiful / reddit.
Bad examples (don't do that at home)
- Junk charts, K. Fung, 2005-2016.
- WTF Visualizations, community supported.
- How to Display Data Badly, H. Wainer, 1984.
- exercise-1-sol.py / exercise-1-sol.png
(adapted from "Trees, maps, and theorems") - exercise-2-sol.sh or exercise-2-sol.py
- exercise-3-sol.py / exercise-3-sol.png
(adapted from "The most misleading charts of 2015, fixed") - exercise-4-sol.md or exercise-4-sol.py
- exercise-5-sol.py / exercise-5-sol.png