Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development notes #1

Open
tavareshugo opened this issue Mar 30, 2023 · 2 comments
Open

Development notes #1

tavareshugo opened this issue Mar 30, 2023 · 2 comments

Comments

@tavareshugo
Copy link
Contributor

tavareshugo commented Mar 30, 2023

Titles:

  • Managing bioinformatic software and pipelines
  • How to do bioinformatics like a professional bioinformatician
  • Reproducible and scalable bioinformatics for researchers

Prerequisite:

  • Unix
  • HPC
  • Knowledge of some bioinformatic application (RNA-seq, ChIP-seq, Bacterial, Viral, variant calling)

Outline:

  • (30m) General intro - where we give a high-level view of all these concepts:
    • Software environments
    • Containers
    • Pipelines
  • (1h30) Software (mamba)
    • Similar to the content on current HPC course
    • How to create and environment
    • How to install softawre on that environment
    • How to activate and run software
    • Concept of channels
    • Exercise: install FastQC and run your samples through
  • (30m) Containers (singularity) - similar to HPC course
    • Introduce the concept of running things in a container
    • singularity pull and singularity run
    • Exercise: same as the conda exercise but using singularity
  • (4.5h) Pipelines (nextflow nf-core focused)
    • What is a pipeline and what is it for? Why would you use it? Why you wouldn't want to use bash-based pipelines for example.
    • Conceptually separate Nextflow and nf-core
    • nf-core project
    • nf-core documentation
    • nextflow config - for running locally or on a server, use conda/singularity, etc.
      • set up the config locally
    • introduce a simple pipeline (e.g. rnaqc, bacQC)
    • for exercises we can have a collection of datasets from different applications for people to pick-and-choose (RNA-seq, ChIP-seq, bacterial, viral)
    • reference genome/annotation management - we should mention something about this. Advise downloading reference/annotation and using --fasta and --gff options - ensure reproducibility and uses latest version (or whichever version you wish to use). Discourage using igenomes (and why we don't recommend it). Use the reference that matches your downstream needs, e.g. using dbsnp or something like that.

Useful resources:

@tavareshugo
Copy link
Contributor Author

tavareshugo commented Jul 25, 2024

Initial draft of the timetable (names is to indicate who is doing which slides):

09:45 - 10:15 Introduction to reproducible and scalable bioinformatics analysis - Adam
10:15 - 11:00 Using package managers (slides + practical) - Adam
11:00 - 11:15 Break
11:15 - 12:00 Singularity (slides + practical) - Hugo
12:00 - 12:30 Nextflow (slides) - Raquel
12:30 - 13:30 Lunch
13:30 - 14:30 Nf-core demo and practical - Raquel
14:30 - 15:30 Configuring nextflow on HPC - Hugo
16:00 - 17:00 Q&A

@tavareshugo
Copy link
Contributor Author

tavareshugo commented Sep 24, 2024

On the live workshop:

09:30 - 09:45 - Welcome
09:45 - 10:10 - Introduction
10:10 - 10:50 - Mamba
10:50 - 11:30 - Mamba Exercise
11:30 - 12:00 - Singularity slides
12:00 - 12:30 - Singularity exercise and wrap-up
12:30 - 13:30 - Lunch
13:30 - 14:15 - Nextflow slides
14:15 - 15:30 - Nextflow exercises (this could be reduced, see #27)
15:30 - 16:00 - Nextflow exercise solution (took a bit too long - don't need to go through the samplesheet python script in such detail)
16:00 - 16:30 - Nextflow on a HPC slides and tmux demo
16:30 - 17:30 - HPC exercise and wrap-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant