-
Notifications
You must be signed in to change notification settings - Fork 0
Installing R and adding packages without Root permissions.
R programming language is preferred by many for data analytics and statistical computing. There are packages available to do various kinds of bioinformatics and statistical analysis, which avoid a user to develop scripts and analysis methods from scratch. These packages are available from 3 different sources
- CRAN webpage
- Bioconductor
- Github
Many a time the installation of some packages requires root or administrator privileges and that may not be available on cluster systems. We put an outline below to explain how to install R and the packages without the root/administrator permissions using miniconda. The tutorial is explained in four steps
- Installation of miniconda and creating an environment
- Installation of R
- R-package installation
- Running R
Lets get started
Step1: Installation of miniconda and creating an environment
Start interactive session
srun –-pty -p general -q general –mem=5G bash
- Download miniconda for appropriate python version from https://docs.conda.io/en/latest/miniconda.html . Select the appropriate version and right-click and select “Copy Link Address”.
Use the command to down load the link
wget <paste the copied link here>
- Execute the downloaded script as
sh Miniconda3-latest-Linux-x86_64.sh
Agree to license, location and for running conda init
-
Execute command
conda
at terminal and if it display conda help file you have installed miniconda properly. -
Now we would like to create an environment which will have appropriate version of python and required python packages in it. Let say we want to create an environment named python3 (any name can be given) with python-3.8. Execute the command
conda create -n python3 python=3.8
Any new environment can be created by replacing "python3" with new environment name. The names are arbitrary but name them such that you can recognise them later.
-
List all the environments available to you using the command below. You should see your environment(s) listed in the output.
conda env list
-
Once environment is created you can install packages you require e.g
ipython
. Google “ipython conda installation” and among different option suggested select one which is from anaconda website as shown below -
Once on anaconda website find the channel (-c) and installation instructions as shown below
-
Install the package with command and specifying the environment name.
conda install -n python3 -c anaconda ipython
This will install the package from appropriate channel in appropriate environment. In future if you would like to add another package for the same environment, simply find the channel and installation instruction from anaconda website (as we did for ipython) and run the installation command as
conda install -n python3 -c <channel> <PackageName>
This is the only line you have to execute in an interactive session for a package to be added to the environment.
This ends instructions to create a conda environment and installation of packages using conda. Follow along for installation of R and R-packages.
In order to install R, we will be first creating an environment for R so that it is isolated from other environments but this is optional.
- Create an environment for R and lets name the environment as
R411
(Installing R version 4.1.1).
conda create -n R411
and now switch to the new environment with code
conda activate R411
- In this environment we will be installing
4.1.1
version ofR
conda install -n R411 -c conda-forge r=4.1.1
Similarly, older version of R can also be installed by specifying the version in the above command.
- Now lets check if installation is successful
conda deactivate
conda activate R411
. Load the environment
which R
. # This should display the path to R as some/path/R411/bin/R
or
R
. # Here the loaded R should be R version 4.1.1
Run conda deactivate
to close the environment.
Now any time you want to use R
load the environment conda activate R411
run your analysis and when done close the environment with conda deactivate
R-package installation can be done as usual.
- First we invoke R after loading the R environment
conda activate R411
R
- Package installation inside R
# For CRAN packages
install.packages("ggplot2"). # Install ggplot2
install.packages("devtools") # Install devtools
#For Bioconductor packges
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DiffBind")
#github packages (Using devtools)
install_github("DeveloperName/PackageName") # SYNTAX
library(devtools) # If you do not have devtools, first install it
install_github("hadley/dplyr")
- Use
conda
for package installation
As an example, I will be installing Diffbind
using conda
. For this installation we only have to load R411 environment
as conda activate R411
, that's it, no need to invoke R.
Now, lets search for the channel that will be required for conda installation of Diffbind. As before do a google search conda diffbind
and pick up the appropriate link as shown in the image below.
Open the link and use the channel info
now simply run
conda install -n R411 -c bioconda bioconductor-diffbind
This will install diffbind
for R411 environment.
Sometimes the installation of a package in R may complain of a missing library, as example a user on our system while installing diffbind came across the following error.
-------------------------- [ERROR MESSAGE] ---------------------------
<stdin>:1:26: fatal error: librsvg/rsvg.h: No such file or directory
compilation terminated.
Package librsvg-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `librsvg-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'librsvg-2.0' found
Using PKG_CFLAGS=
Using PKG_LIBS=-lrsvg
.
.
.
It appears that librsvg-2.0
is missing. We now look for an option to install the missing package using conda so, we go back to our browser and search
conda librsvg-2.0
Go to the link and pick up the channel and install the missing package as instructed
and execute the command
conda install -n R411 -c conda-forge librsvg
This will install the missing package. Sometimes there will be more dependencies that may be missing, follow the same instruction for each of them.
Now we can try again to install diffbind using R package installation methods of step3.
load conda environment
conda activate R411
Invoke R
R
If you are looking to run a R
script called myscript.R
on the cluster, below is a template for Xanadu cluster to achieve it,
#!/bin/bash
#SBATCH --job-name=JOBNAME
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --mem=10G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
source ~/.bashrc
conda activate R411
R CMD BATCH myscript.R
# or other option is
# Rscript myscript.R
conda deactivate
The main differences between R CMD BATCH
and Rscript
are listed below [source] (https://thecoatlessprofessor.com/programming/r/working-with-r-on-a-cluster/)
The difference between the two can be stated succiently as:
R CMD BATCH
- Requires an input file (e.g. helloworld.R)
- Saves to an output file (e.g. Run script helloworld.R get helloworld.r.Rout)
- By default, echoes both input and output statement inline (e.g. as if you were actually typing them into console).
- Is not able to write output to stdout.
Rscript
- Similar to bash scripts
- Requires the use of a shebang (#!/usr/bin/Rscript)
- Requires authorization before being able to be run (chmod +x script.r)
- Output from print() and cat() are directly sent to STDOUT.
- No additional file is made.
- Able to issue one line comments (e.g. Rscript -e "print('hi!')")
BINGO!!! You are ready to go !!!!