Scientific visualization

A 2h30 crash course on scientific visualization...
Nicolas P. Rougier, G-Node summer school, University of Reading, 2016.

"Visualisation is a method of computing. It transforms the symbolic into the geometric, enabling researchers to observe their simulations and computations. Visualisation offers a method for seeing the unseen. It enriches the process of scientific discovery and fosters profound and unexpected insights."

Visualisation in Scientific Computing, NSF report, 1987.

Introduction

Scientific visualization is classically defined as the process of graphically displaying scientific data. However, this process is far from direct or automatic. There are so many different ways to represent the same data: scatter plots, linear plots, bar plots, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more accurate definition for scientific visualization would be a graphical interface between people and data. But remember, there are two people in the loop: the one that produces the visualization and the one that watches it. What you intend to show might be quite different from what will be actually perceived...

The goal of this crash course is to introduce a few concepts in order for you to achieve better visualization (hopefully). If you want to go further, you'll have to look at the miscellaneous references given at the end of this document.

Visualization pipeline

The visualization pipeline describes the process of creating visual representations of data, from the raw data up to the final rendering. There is no unique definition of such pipeline but most of the time you'll find at least 3 steps (filter, map, render).

Raw data (whatever...)
Filtered data (missing, noise, analytics, statistics, ...)
Mapped data (geometry, attributes, colors, ...)
Rendered data (static image, interactive display, ...)

Data type

The nature of the data has a great influence on the kind of visualization you can use. Traditionally, they are split as:

Quantitative (values or observations that can be measured)

Continuous
Discrete

Categorical (values or observations that can be sorted into groups or categories)

Nominal
Ordinal
Interval

but you can also find finer detailed descriptions in the litterature.

Graphical elements

In the end, a scientific figures can be fully described by a set of graphic primitives with different attributes:

Points, markers, lines, areas, ...
Position, color, shape, size, orientation, curvature, ...
Helpers, text, axis, ticks,
Interaction, animation

But describing a figure in terms of such graphic primitive would be a very tedious and complex task. This is exactly where visualization libraries or software are useful because they will automatize most of the work, more (e.g. seaborn) or less (e.g. matplotlib) depending on the library. In the ideal case, you want to only specify your data and let the library decides of almost everything (e.g. vega-lite)

Visualization type

From the Data visualization catalogue by Severino Ribecca.

Less is more

Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away – Antoine de Saint-Exupery

Ten simple rules

From Ten simple rules for better figures, N.P. Rougier, M. Droettboom, P.E. Bourne, 2014.

Know your audience
Identify Your Message
Adapt the Figure to the Support Medium
Captions Are Not Optional
Do Not Trust the Defaults
Use Color Effectively
Do Not Mislead the Reader
Avoid “Chartjunk”
Message Trumps Beauty
Get the Right Tool

See also https://github.com/rougier/ten-rules

Exercices

Exercise 1: Too much ink...

Consider the following figure and, using matplotlib, try to remove as much ink as you can while keeping the most relevant information.

You can start from the following python script:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

def gaussian(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))

# Clean data
X = np.linspace(15, 21, 100)
Y = gaussian(X, 0.65, 17.6, 1.)

# Noisy dat
Xn = np.random.uniform(16, 20, 25)
Yn = gaussian(Xn, 0.65, 17.6, 1.) + 0.01 * np.random.normal(size=len(Xn))

Exercise 2: Using the right tool

You have a nice image and you would like to show labeled detailed sub-images alongside the main image (see below). What could be the easiest way to do that ? Be careful with the labels, they must be visible independently of the images color/contrast.

Exercise 3: Misleading the reader

What's wrong with this graphic ? How would you correct it ?

Exercise 4: Editor request

Your article just been accepted but the editor request figure 2 to be at least 300 dpi. What does that mean ? What is the minium size (in pixels) of your figure ? Is it relevant if you figure has been saved in vector format ?

Exercise 5: Replication

Look at Drawing a brain with Bokeh and try to replicate the final figure using matpltolib.

or

Pick one of your favorite graphic from the litterature and try to replicate it using matplotlib (and fake data).

References

There are any online resources about scientific visualization and a lot of excellent books as well. Since you probably not have time to read everything, I collected a small set of resources that might be read relatively rapidly.

Courses/Tutorials/Guides

Matplotlib tutorial, N.P. Rougier, 2016.
Ten simple rules for better figures, N.P. Rougier, M. Droettboom, P.E. Bourne, 2014.
Scientific Visualization course, Paul Rosen, 2015.
Information Visualization, T. Munzner, 2015.
The Quartz guide to bad data, C. Groskopf, 2015.
Quantitative vs. Categorical Data: A Difference Worth Knowing, S. Few , 2005.
How to make beautiful data visualizations in Python with matplotlib, Randy Olson, 2014.

(Some) Tools

Matplotlib, J. Hunter and M. Droettboom, 2010.
10 Useful Python Data Visualization Libraries for Any Discipline, M. Bierly, 2016.
Datavisualization.ch, 2015.
Data visualization catalogue, S. Ribecca, 2016.
Fred's ImageMagick script, F. Weinhaus, 2016.
TikZ and PGF, Stefan Kottwitz

Books

Visualization Analysis and Design, T. Munzner, 2014.
Trees, maps, and theorems, J.-L. Doumont, 2009.
The Visual Display of Quantitative Information, E.R. Tufte, 1983.

Good examples

A Tour through the Visualization Zoo, J. Heer, M. Bostock, and V. Ogievetsky, 2010.
The most misleading charts of 2015, fixed, K. Collins, 2015.
Data is beautiful / reddit.

Bad examples (don't do that at home)

Junk charts, K. Fung, 2005-2016.
WTF Visualizations, community supported.
How to Display Data Badly, H. Wainer, 1984.

Solutions to the exercises

exercise-1-sol.py / exercise-1-sol.png
(adapted from "Trees, maps, and theorems")
exercise-2-sol.sh or exercise-2-sol.py
exercise-3-sol.py / exercise-3-sol.png
(adapted from "The most misleading charts of 2015, fixed")
exercise-4-sol.md or exercise-4-sol.py
exercise-5-sol.py / exercise-5-sol.png

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
anscombe.py		anscombe.py
catalogue.png		catalogue.png
data-ink.gif		data-ink.gif
edward-tufte.png		edward-tufte.png
exercise-1-sol.png		exercise-1-sol.png
exercise-1-sol.py		exercise-1-sol.py
exercise-1.png		exercise-1.png
exercise-1.py		exercise-1.py
exercise-2-sol.py		exercise-2-sol.py
exercise-2-sol.sh		exercise-2-sol.sh
exercise-3-sol.png		exercise-3-sol.png
exercise-3-sol.py		exercise-3-sol.py
exercise-4-sol.md		exercise-4-sol.md
exercise-4-sol.py		exercise-4-sol.py
exercise-5-sol.png		exercise-5-sol.png
exercise-5-sol.py		exercise-5-sol.py
final.jpg		final.jpg
neurons.jpg		neurons.jpg
obama.jpg		obama.jpg
phd.jpg		phd.jpg
slides.pdf		slides.pdf
trust.jpg		trust.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific visualization

Introduction

Visualization pipeline

Data type

Graphical elements

Visualization type

Less is more

Ten simple rules

Exercices

Exercise 1: Too much ink...

Exercise 2: Using the right tool

Exercise 3: Misleading the reader

Exercise 4: Editor request

Exercise 5: Replication

References

Solutions to the exercises

About

Releases

Packages

Languages

ASPP/ASPP-dataviz-2016

Folders and files

Latest commit

History

Repository files navigation

Scientific visualization

Introduction

Visualization pipeline

Data type

Graphical elements

Visualization type

Less is more

Ten simple rules

Exercices

Exercise 1: Too much ink...

Exercise 2: Using the right tool

Exercise 3: Misleading the reader

Exercise 4: Editor request

Exercise 5: Replication

References

Solutions to the exercises

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages