layout | title | subtitle | minutes |
---|---|---|---|
page |
Programming with R |
Best practices for using R and designing programs |
30 |
Define some best practices when working with R
- Start your code with a description of what it is:
#This is code to replicate the analyses and figures from my 2014 Science paper.
#Code developed by Sarah Supp, Tracy Teal, and Jon Borelli
- Run all of your import statments (
library
):
library(ggplot2)
library(reshape)
library(vegan)
- Set your working directory before
source()
ing a script, or startR
inside your project folder:
One should exercise caution when using setwd()
. Changing directories in your script can limit reproducibility:
setwd()
will throw an error if the directory you're trying to change to doesn't exit, or the user doesn't have the correct permissions to access it. This becomes a problem when sharing scripts between users who have organized their directories differently.- If/when your script terminates with an error, you might leave the user in a different directory to where they started, and if they call the script again this will cause further problems. If you must use
setwd()
, it is best to put it at the top of the script to avoid this problem.
The following error message indicates that R has failed to set the working directory you specified:
Error in setwd("~/path/to/working/directory") : cannot change working directory
Consider using the convention that the user running the script should begin in the relevant directory on their machine and then use relative file paths (see below).
-
Use
#
or#-
to set off sections of your code so you can easily scroll through it and find things. -
If you have only one or a few functions, put them at the top of your code, so they are among the first things run. If you have written many functions, put them all in their own .R file, and
source
them. Source will define all of these functions so that you can use them as you need them. For the reasons listed above, try to avoid usingsetwd()
(or other functions that have side-effects in the user's workspace) in scripts yousource
.
source("my_genius_fxns.R")
-
Use consistent style within your code.
-
Keep your code modular. If a single function or loop gets too long, consider breaking it into smaller pieces.
-
Don't repeat yourself. Automate! If you are repeating the same piece of code on multiple objects or files, use a loop or a function to do the same thing. The more you repeat yourself, the more likely you are to make a mistake.
-
Manage all of your source files for a project in the same directory. Then use relative paths as necessary. For example, use
dat <- read.csv(file = "/files/dataset-2013-01.csv", header = TRUE)
rather than:
dat <- read.csv(file = "/Users/Karthik/Documents/sannic-project/files/dataset-2013-01.csv", header = TRUE)
- R can run into memory issues. It is a common problem to run out of memory after running R scripts for a long time. In order to inspect your R session objects, you can list the objects, search current packages and remove objects that are currently not in use. A good practice when running long lines of computationally intensive sequential code is to remove temporary objects after they have served their purpose. Sometimes R will not clean up unused memory for a while after you delete objects. You can force R to tidy up its memory by using
gc()
.
interim_object <- data.frame(rep(1:100,10),rep(101:200,10),rep(201:300,10)) # Sample dataset of 1000 rows
object.size(interim_object) # Reports the memory size allocated to the object
rm(interim_object) # Removes only the particular object
gc() # Force R to release memory it is no longer using
ls() # Lists all the objects in your current workspace
rm(list = ls()) # If you want to delete all the objects in the workspace and start with a new slate
-
Don't save a session history (the default option in R, when it asks if you want an
RData
file). Instead, start in a clean environment so that older objects don't contaminate your current environment. This can lead to unexpected results, especially if the code were to be run on someone else's machine. -
Where possible keep track of
sessionInfo()
somewhere in your project folder. Session information is invaluable since it captures all of the packages used in the current project. If a newer version of a project changes the way a function behaves, you can always go back and reinstall the version that worked (Note: At least on CRAN all older versions of packages are permanently archived). -
Collaborate. Grab a buddy and practice "code review". We do it for methods and papers, why not code? Our code is a major scientific product and the result of a lot of hard work!
-
Develop your code using version control and frequent updates!
- What other suggestions do you have?
- How could we restructure the code we worked on today, to make it easier to read? Discsuss with your neighbor.
- Make two new R scripts called inflammation.R and inflammation_fxns.R
- Copy and paste the code so that inflammation.R "does stuff" and inflammation_fxns.R holds all of your functions. Hint: you will need to add
source
code to one of the files.