title | number-sections |
---|---|
Data & Setup |
false |
::: {.callout-tip level=2}
If you are attending one of our workshops, we will provide a training environment with all of the required software and data.
If you want to setup your own computer to run the analysis demonstrated on this course, you can follow the instructions below.
:::
The data used in these materials is provided as a zip file. Download and unzip the folder to your Desktop to follow along with the materials.
DownloadTo run the analysis covered in this workshop, you will broadly need two things:
- R/RStudio for all the downstream analysis (i.e. after peak calling using the
nf-core/chipseq
workflow). These analyses can typically be run on your local computer and on any OS (macOS, Windows, Linux). - A Linux environment to run the pre-processing steps and peak calling (i.e. running the
nf-core/chipseq
workflow). We highly recommend using a dedicated server (typically a HPC) for this step. Technically, you can also run this workflow on Windows via WSL2 (we provide instructions below), but we do not recommend it for production runs.
::: {.panel-tabset group="os"}
Download and install all these using default options:
Download and install all these using default options:
- Go to the R installation folder and look at the instructions for your distribution.
- Download the RStudio installer for your distribution and install it using your package manager.
:::
Open RStudio and run the following:
# install BiocManager if not installed already
if (!require("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
# Install all packages used
BiocManager::install(c("GenomicRanges", "rtracklayer", "plyranges", "ChIPseeker", "profileplyr", "ggplot2", "DiffBind"))
For the command-line tools covered in the course you will need a Linux machine (or WSL2, if you are on Windows - see @sec-wsl).
If you are an experienced Linux user, you can install/compile each tool individually using your preferred method. Otherwise, we recommend doing it via the Mamba package manager. If you already use Conda/Mamba you can skip this step.
To make a fresh install of Mamba, you can run:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
And follow the instructions on the terminal, accepting the defaults. Make sure to restart your terminal after the installation completes.
These instructions also work if you're using a HPC server.
We recommend having a dedicated environment for Nextflow, which you can use across multiple pipelines you use in the future. Assuming you've already installed Conda/Mamba, open your terminal and run:
mamba create --name nextflow nextflow
Whenever you want to use nextflow, you need to activate your environment with conda activate nextflow
.
For other command-line tools that we covered in the workshop, you can install them in their own conda environment:
mamba create --name chipseq
mamba install --name chipseq idr deeptools meme homer
When you want to use any of them, make sure to activate your environment first with conda activate chipseq
.
:::{.callout-warning} We highly recommend running the raw data processing pipeline on a dedicated Linux server (typically a HPC), not directly on Windows via WSL2. Although you can technically run the entire pipeline on WSL2, it may be a very suboptimal way of doing so for real data. :::
The Windows Subsystem for Linux (WSL2) runs a compiled version of Ubuntu natively on Windows. There are detailed instructions on how to install WSL on the Microsoft documentation page. Briefly:
- Click the Windows key and search for Windows PowerShell, open it and run the command:
wsl --install
. - Restart your computer.
- Click the Windows key and search for Ubuntu, which should open a new terminal.
- Follow the instructions to create a username and password (you can use the same username and password that you have on Windows, or a different one - it's your choice).
- You should now have access to a Ubuntu Linux terminal.
This (mostly) behaves like a regular Ubuntu terminal, and you can install apps using the
sudo apt install
command as usual.
After WSL is installed, it is useful to create shortcuts to your files on Windows.
Your C:\
drive is located in /mnt/c/
(equally, other drives will be available based on their letter).
For example, your desktop will be located in: /mnt/c/Users/<WINDOWS USERNAME>/Desktop/
.
It may be convenient to set shortcuts to commonly-used directories, which you can do using symbolic links, for example:
- Documents:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Documents/ ~/Documents
- If you use OneDrive to save your documents, use:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/OneDrive/Documents/ ~/Documents
- If you use OneDrive to save your documents, use:
- Desktop:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Desktop/ ~/Desktop
- Downloads:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Downloads/ ~/Downloads
We've experienced issues in the past when running Nextflow pipelines from WSL2 with -profile singularity
.
As an alternative, you can instead use Docker, which is another software containerisation solution.
To set this up, you can follow the instructions given on the Microsoft Documentation: Get started with Docker remote containers on WSL 2.
Once you have Docker set and installed, you can then use -profile docker
when running your Nextflow command.
Singularity is a software for running a virtual operating system locally (known as a container) and popularly used for complex bioinformatic pipelines. Nextflow supports the use of Singularity for managing its software and we recommend its use it on HPC servers. Singularity is typically installed by your HPC admins, otherwise request that they do so.
However, if you want to run the analysis locally on your computer (again, we do not recommend you to do so), then you can install Singularity following the instructions below.
::: {.panel-tabset group="os"}
You can use Singularity from the Windows Subsystem for Linux (see @sec-wsl). Once you setup WSL, you can follow the instructions for Linux.
Singularity is not available for macOS.
These instructions are for Ubuntu or Debian-based distributions1.
sudo apt update && sudo apt upgrade && sudo apt install runc
codename=$(lsb_release -c | sed 's/Codename:\t//')
wget -O singularity.deb https://github.com/sylabs/singularity/releases/download/v3.10.2/singularity-ce_3.11.4-${codename}_amd64.deb
sudo dpkg -i singularity.deb
rm singularity.deb
:::
Footnotes
-
See the Singularity documentation page for other distributions. ↩