-
Notifications
You must be signed in to change notification settings - Fork 15
Notes on Anaconda
I've been wary of Anaconda but recently I've employed it to manage an environment shared between my OS X development machine and bonjovi, our Coop Lab server. Overall, I think it's good technology as it greatly helps in minimizing time spent on installing software — which gets annoying as a scientist. Below are some of my notes on how to build up a project's environment up from scratch.
Here are some general things to keep in mind -- these are things I learned the hard way.
-
Don't install Anaconda on OS X for all users -- permissions will be wacky and cause issues.
-
Don't use
conda env export -n your-env
to createenvironment.yml
files for creating environments for a project across different operating systems. Conda may install OS-specific dependencies, which hinders portability. Instead handcraft a minimal environment YAML file with only top-level project dependencies (more on this below). Then, only useconda env export -n your-env > project_depends.yml
for saving the versions of all dependencies for reproducibility reasons, not to reconstruct your environment on a different machine (thanks Joshua Shapiro and Jaime Ashander for this advice).
Channels are like Homebrew's kegs. You can add a new channel with:
$ conda config --add channels r
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
These are the essential channels as far as I know. They must be added in this order, so the ~/.condarc
file looks like:
$ cat ~/.condarc
channels:
- conda-forge
- bioconda
- defaults
Each project needs its own environment (or a general environment, e.g. one for all projects requiring scipy, iPython, numpy, and R). Below I quick cover how I build up an environment.
First, let's create an example new environment, with only Python 3 and R:
$ conda create -n rpy-base python=3.6.2 r=3.4.1
Now, we see this new environment:
$ conda env list
# conda environments:
#
default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11
rpy-base /Users/vinceb/anaconda/envs/rpy-base
root * /Users/vinceb/anaconda
The asterisk indicates we're currently using the root environment. This means
all programs executed will use your default $PATH
that looks in the usual
places (.e.g /usr/local/bin
).
Now, we switch to our new environment:
$ source activate rpy-base
(rpy-base)
Note the $PATH
now:
$ echo $PATH
/Users/vinceb/anaconda/envs/rpy-base/bin:/usr/local/bin [...]
(rpy-base)
Our anaconda environment is now first in our search $PATH
.
To add new packages to a specific environment (e.g. not the root environment), we use:
$ conda install install -n rpy-base r-tidyverse
This will ask you to proceed. After installation is complete, we see this R package (and its dependencies) are now in the environment:
$ conda list
# packages in environment at /Users/vinceb/anaconda/envs/rpy-base:
#
[...]
r-tidyverse 1.1.1 r3.4.1_0 r
Now, we can export the environment to a
YAML file which can be used to mirror the
environment elsewhere. However, this is not recommended for maintaining
project environments (source). The reason is that conda env export
returns all
packages and their dependencies installed, some of which may be OS X-specific
and not portable to Linux servers. A better approach is to maintain a
minimal list of packages used by your project, and let Conda find the
appropriate dependencies on whatever machine it's being run on. Then, the
following can be used to make a manifest of the versions per system (for
reproducibility, not for mirroring an environment):
$ conda env export -n rpy-base > project_depends.yml
Ideally, then hand edit project_depends.yml
to include only the minimal dependencies. We'll call this rpy-base.yml
— here is a minimal version example:
name: rpy-base
channels:
- conda-forge
- bioconda
- r
- defaults
dependencies:
- python=3.6.2=0
- r=3.4.1=r3.4.1_0
- r-tidyverse=1.1.1=r3.4.1_0
This clones a new repo:
$ conda env create -n rpy-base2 -f rpy-base.yml
which we see with:
$ conda env list
# conda environments:
#
default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11
rpy-base * /Users/vinceb/anaconda/envs/rpy-base
rpy-base2 /Users/vinceb/anaconda/envs/rpy-base2
root /Users/vinceb/anaconda
which we can now use source activate rpy-base2
to use.
Here's how to delete an environment, such as the cloned environment rpy-base2
we just created.
$ conda env remove rpy-base2
and now it's gone:
$ conda env list
# conda environments:
#
default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11
rpy-base * /Users/vinceb/anaconda/envs/rpy-base
root /Users/vinceb/anaconda
Here's an simple example script I use to start a Jupyter notebook kernel on a server that can be accessed locally through SSH forwarding.
#!/bin/bash
SERVER=bonjovi # yes, bonjovi is the name of my server
if [[ "$1" == "client" ]]; then
echo "setting up ssh forwarding..."
ssh -N -f -L localhost:8888:localhost:8890 $SERVER || (echo "error: ssh forwarding failed." && exit 1)
exit
fi
if [[ $# -lt 2 ]]; then
echo "usage: bash launch_notebook.sh notebook.ipynb [server]"
exit 1
fi
source activate default-fwdpy11
if [[ "$2" == "server" ]]; then
jupyter notebook "$1" --no-browser --port=8890
else
jupyter notebook "$1"
fi