Skip to content

tavareshugo/2019-07-01-bioinformatics_for_biologists

Repository files navigation

Introduction to data analysis with R

1, 2 & 4 July 2019, Cambridge University Bioinformatics Training

Instructors: Hugo Tavares & Sandra Cortijo

Helper: Martin van Rongen


This is a general introduction to R for exploratory data analysis.

Our practicals will be very hands-on, focusing on learning the necessary sintax to allow you to do exploratory data analysis in R, from data manipulation to visualisation. We will focus on tabular data, which is general enough to allow you to apply these skills to a wide range of problems. On the third day we will go through a more complex example using transcriptomic data.

Below, we provide links to detailed materials for your reference, many of which were developed by the Data Carpentry organisation.

If you have any questions please post a new issue on our GitHub repository.


Setup

All necessary software and data will be available on the training machines at the Bioinformatics Training Room (Craik-Marshall Building).

However, you are welcome to use your own laptop, in which case you need to:

  • Download and install R (here)
  • Download and install RStudio (here)
  • Install the CRAN R packages tidyverse, corrplot, cowplot and ggfortify (open RStudio and go to Tools > Install Packages)
  • Install the Bioconductor R package ComplexHeatmap (instructions here)

Materials

Introduction to R (Mon)

This lesson will cover the basics of using R with RStudio and how to produce a wide range of graphs for data visualisation.

exercises

Data manipulation and visualisation in R (Tues)

This lesson will cover some functions to effectively manipulate and summarise tabular data and we will learn more about data visualisation.

Data Organisation in Spreadsheets (Tues evening)

Digital data recording often starts with a spreadsheet software (e.g. Excel). For an effective data analysis, it's crucial to start with a well structured and formatted dataset. Because of this, we will have a brief discussion about common issues that should be considered when recording data.

  • Download data for this lesson here
  • Find detailed materials here

Further reading:

Exploratory analysis of multivariate data (Thu)

In this session we will apply the concepts learned so far to a worked example of an exploratory data analysis of transcriptomic data.

exercises

Further reading:


Further resources

Reference books:

  • Holmes S, Huber W, Modern Statistics for Modern Biology - covers many aspects of data analysis relevant for biology/bioinformatics from statistical modelling to image analysis.
  • Peng R, Exploratory Data Analysis with R - an more general introduction to exploratory data analysis techniques.
  • Grolemund G & Wickham H, R for Data Science - a good follow up from this course if you want to learn more about tidyverse packages.
  • McElreath R, Statistical Rethinking - an introduction to statistical modelling and inference using R (a more advanced topic, but written in an accessible way to non-statisticians).
    • Also see the lecture materials, which include access to the draft of the book's second edition.
  • James G, Witten D, Hastie T & Tibshirani R, Introduction to Statistical Learning - an introductory book about machine learning using R (also advanced topic).

Other courses at Cambridge: