Skip to content
This repository has been archived by the owner on Apr 1, 2020. It is now read-only.
/ ASPP-2016 Public archive
forked from ASPP/ASPP-dataviz-2016

Material for the Advanced Scientific Python Programming course, University of Reading, 2016

Notifications You must be signed in to change notification settings

maluethi/ASPP-2016

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scientific visualization

A 2h30 crash course on scientific visualization...
Nicolas P. Rougier, G-Node summer school, University of Reading, 2016.

"Visualisation is a method of computing. It transforms the symbolic into the geometric, enabling researchers to observe their simulations and computations. Visualisation offers a method for seeing the unseen. It enriches the process of scientific discovery and fosters profound and unexpected insights."

Visualisation in Scientific Computing, NSF report, 1987.

Introduction

Scientific visualization is classically defined as the process of graphically displaying scientific data. However, this process is far from direct or automatic. There are so many different ways to represent the same data: scatter plots, linear plots, bar plots, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more accurate definition for scientific visualization would be a graphical interface between people and data. But remember, there are two people in the loop: the one that produces the visualization and the one that watches it. What you intend to show might be quite different from what will be actually perceived...

The goal of this crash course is to introduce a few concepts in order for you to achieve better visualization (hopefully). If you want to go further, you'll have to look at the miscellaneous references given at the end of this document.

Visualization pipeline

The visualization pipeline describes the process of creating visual representations of data, from the raw data up to the final rendering. There is no unique definition of such pipeline but most of the time you'll find at least 3 steps (filter, map, render).

  1. Raw data (whatever...)
  2. Filtered data (missing, noise, analytics, statistics, ...)
  3. Mapped data (geometry, attributes, colors, ...)
  4. Rendered data (static image, interactive display, ...)

Data type

The nature of the data has a great influence on the kind of visualization you can use. Traditionally, they are split as:

Quantitative (values or observations that can be measured)

  • Continuous
  • Discrete

Categorical (values or observations that can be sorted into groups or categories)

  • Nominal
  • Ordinal
  • Interval

but you can also find finer detailed descriptions in the litterature.

Graphical elements

In the end, a scientific figures can be fully described by a set of graphic primitives with different attributes:

  • Points, markers, lines, areas, ...
  • Position, color, shape, size, orientation, curvature, ...
  • Helpers, text, axis, ticks,
  • Interaction, animation

But describing a figure in terms of such graphic primitive would be a very tedious and complex task. This is exactly where visualization libraries or software are useful because they will automatize most of the work, more (e.g. seaborn) or less (e.g. matplotlib) depending on the library. In the ideal case, you want to only specify your data and let the library decides of almost everything (e.g. vega-lite)

Visualization type

From the Data visualization catalogue by Severino Ribecca.

Less is more

Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away – Antoine de Saint-Exupery

Ten simple rules

From Ten simple rules for better figures, N.P. Rougier, M. Droettboom, P.E. Bourne, 2014.

  1. Know your audience
  2. Identify Your Message
  3. Adapt the Figure to the Support Medium
  4. Captions Are Not Optional
  5. Do Not Trust the Defaults
  6. Use Color Effectively
  7. Do Not Mislead the Reader
  8. Avoid “Chartjunk”
  9. Message Trumps Beauty
  10. Get the Right Tool

See also https://github.com/rougier/ten-rules

Exercices

Exercise 1: Too much ink...

Consider the following figure and, using matplotlib, try to remove as much ink as you can while keeping the most relevant information.

You can start from the following python script:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

def gaussian(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))

# Clean data
X = np.linspace(15, 21, 100)
Y = gaussian(X, 0.65, 17.6, 1.)

# Noisy dat
Xn = np.random.uniform(16, 20, 25)
Yn = gaussian(Xn, 0.65, 17.6, 1.) + 0.01 * np.random.normal(size=len(Xn))

Exercise 2: Using the right tool

You have a nice image and you would like to show labeled detailed sub-images alongside the main image (see below). What could be the easiest way to do that ? Be careful with the labels, they must be visible independently of the images color/contrast.

Exercise 3: Misleading the reader

What's wrong with this graphic ? How would you correct it ?

Exercise 4: Editor request

Your article just been accepted but the editor request figure 2 to be at least 300 dpi. What does that mean ? What is the minium size (in pixels) of your figure ? Is it relevant if you figure has been saved in vector format ?

Exercise 5: Replication

Look at Drawing a brain with Bokeh and try to replicate the final figure using matpltolib.

or

Pick one of your favorite graphic from the litterature and try to replicate it using matplotlib (and fake data).

References

There are any online resources about scientific visualization and a lot of excellent books as well. Since you probably not have time to read everything, I collected a small set of resources that might be read relatively rapidly.

Courses/Tutorials/Guides

(Some) Tools

Books

Good examples

Bad examples (don't do that at home)


Solutions to the exercises

  1. exercise-1-sol.py / exercise-1-sol.png
    (adapted from "Trees, maps, and theorems")
  2. exercise-2-sol.sh or exercise-2-sol.py
  3. exercise-3-sol.py / exercise-3-sol.png
    (adapted from "The most misleading charts of 2015, fixed")
  4. exercise-4-sol.md or exercise-4-sol.py
  5. exercise-5-sol.py / exercise-5-sol.png

About

Material for the Advanced Scientific Python Programming course, University of Reading, 2016

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.5%
  • Shell 28.5%