Development notes #1

tavareshugo · 2023-03-30T08:40:14Z

Titles:

Managing bioinformatic software and pipelines
How to do bioinformatics like a professional bioinformatician
Reproducible and scalable bioinformatics for researchers

Prerequisite:

Unix
HPC
Knowledge of some bioinformatic application (RNA-seq, ChIP-seq, Bacterial, Viral, variant calling)

Outline:

(30m) General intro - where we give a high-level view of all these concepts:
- Software environments
- Containers
- Pipelines
(1h30) Software (mamba)
- Similar to the content on current HPC course
- How to create and environment
- How to install softawre on that environment
- How to activate and run software
- Concept of channels
- Exercise: install FastQC and run your samples through
(30m) Containers (singularity) - similar to HPC course
- Introduce the concept of running things in a container
- singularity pull and singularity run
- Exercise: same as the conda exercise but using singularity
(4.5h) Pipelines (nextflow nf-core focused)
- What is a pipeline and what is it for? Why would you use it? Why you wouldn't want to use bash-based pipelines for example.
- Conceptually separate Nextflow and nf-core
- nf-core project
- nf-core documentation
- nextflow config - for running locally or on a server, use conda/singularity, etc.
  - set up the config locally
- introduce a simple pipeline (e.g. rnaqc, bacQC)
- for exercises we can have a collection of datasets from different applications for people to pick-and-choose (RNA-seq, ChIP-seq, bacterial, viral)
- reference genome/annotation management - we should mention something about this. Advise downloading reference/annotation and using --fasta and --gff options - ensure reproducibility and uses latest version (or whichever version you wish to use). Discourage using igenomes (and why we don't recommend it). Use the reference that matches your downstream needs, e.g. using dbsnp or something like that.

Useful resources:

nf-core training

The text was updated successfully, but these errors were encountered:

tavareshugo · 2024-07-25T15:31:06Z

Initial draft of the timetable (names is to indicate who is doing which slides):

09:45 - 10:15 Introduction to reproducible and scalable bioinformatics analysis - Adam
10:15 - 11:00 Using package managers (slides + practical) - Adam
11:00 - 11:15 Break
11:15 - 12:00 Singularity (slides + practical) - Hugo
12:00 - 12:30 Nextflow (slides) - Raquel
12:30 - 13:30 Lunch
13:30 - 14:30 Nf-core demo and practical - Raquel
14:30 - 15:30 Configuring nextflow on HPC - Hugo
16:00 - 17:00 Q&A

tavareshugo · 2024-09-24T09:09:16Z

On the live workshop:

09:30 - 09:45 - Welcome
09:45 - 10:10 - Introduction
10:10 - 10:50 - Mamba
10:50 - 11:30 - Mamba Exercise
11:30 - 12:00 - Singularity slides
12:00 - 12:30 - Singularity exercise and wrap-up
12:30 - 13:30 - Lunch
13:30 - 14:15 - Nextflow slides
14:15 - 15:30 - Nextflow exercises (this could be reduced, see #27)
15:30 - 16:00 - Nextflow exercise solution (took a bit too long - don't need to go through the samplesheet python script in such detail)
16:00 - 16:30 - Nextflow on a HPC slides and tmux demo
16:30 - 17:30 - HPC exercise and wrap-up

RaqManzano mentioned this issue May 24, 2024

Review materials #4

Closed

tavareshugo mentioned this issue Oct 1, 2024

nf-core exercise: clarify no need to wait until workflow finishes #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development notes #1

Development notes #1

tavareshugo commented Mar 30, 2023 •

edited

Loading

tavareshugo commented Jul 25, 2024 •

edited

Loading

tavareshugo commented Sep 24, 2024 •

edited

Loading

Development notes #1

Development notes #1

Comments

tavareshugo commented Mar 30, 2023 • edited Loading

tavareshugo commented Jul 25, 2024 • edited Loading

tavareshugo commented Sep 24, 2024 • edited Loading

tavareshugo commented Mar 30, 2023 •

edited

Loading

tavareshugo commented Jul 25, 2024 •

edited

Loading

tavareshugo commented Sep 24, 2024 •

edited

Loading