The course schedule consists of 1 afternoon (Tuesday) and 3 "full" days (Wednesday - Friday) in the first two weeks, with 1.5 days in the last week.
The "full days" run from 09:00 - 17:00, with a break from 12:00 - 13:30.
Conceptually, mornings are predominantly related to introductions, presentations and discussion of the previous days, and are afternoons reserved for independent work on the examples and tasks.
Introduction, File Formats & Genome Browsers (Michael Baudis)
- general introduction into the topic (slides)
- schedule adjustment
- guidance about course room and computer use (Tina Siegenthaler)
- reading:
- 1000 Genomes paper
- The sequence of sequencers paper
- tasks:
- Genome Storage Space & Cost, e.g. required for 1000 Genomes
- WES & WGS
- Different file formats
- SAM
- BAM
- VCF
- FASTA
- Associated costs
- Cost factors
- Raw Storage costs
- Genome Storage Space & Cost, e.g. required for 1000 Genomes
- reference genome resources
- reading:
- "Genomics made easier"
- UCSC genome browser tutorial
- exercise: UCSC tutorial
- Answer for UCSC tutorial, genelist, genelist_canonical
- genome editions and coordinates
- reading:
- segment_liftover article
- exercise: genome liftover
- Annotating genome variants
- reading:
- HGVS Recommendations (not for details, though)
- dbVar "Overview of Structural Variation" (link)
- info slides from the morning session
- Literature review and discussion
- Associating variants with phenotypes and diseases (focus on cancer…)
- info slides from the morning session
- Hands-on analysis of genome data
Tools & Programmatic Solutions (Izaskun Mallona)
- How are UCSC Genome Browser data stored? Why?
- Genomics data management: automation
- Computer basics: plain text files, Unix terminal
- Reproducibility
- Systems set up (data download and software installs)
- Unix for bioinformatics
- Chapter 1: What is UNIX
- Chapter 2: The UNIX filesystem
- Chapter 3: UNIX shell - first steps
- Chapter 4: UNIX shell - filesystem commands
- Chapter 5: UNIX shell - working with files
- Overview of the standard genomics data formats (I)
- FASTA
- FASTQ
- SAM
- BED
- Basic file processing for bioinformatics
- awk, cut
- Overview of the standard genomics data formats (II)
- GFF/GTF
- BEDgraphs
- Wiggle files
- VCFs
- Indexed genomic data formats
- Exercises
Genome Variants to Modified Proteins (Elif Ozkirimli Olmez)
- Protein Structure Slides
- Protein Data Bank PDB
- Literature
- Bhattacharya R, Rose PW, Burley SK, Prlić A (2017) Impact of genetic variation on three dimensional structure and function of proteins. PLOS ONE 12(3): e0171355.
- Zehir, A., Benayed, R., Shah, R. H., Syed, A., Middha, S., Kim, H. R., et al. (2017). Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nature Medicine, 23(6), 703–713.
- Studer, R. A., Dessailly, B. H., & Orengo, C. A. (2013). Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochemical Journal, 449(3), 581–594.
- Protein Structure Analysis Task
- Go over the protein structure analysis task from Tuesday. All of you did a great job!
- Uniprot slides
- Uniprot introduction video, Uniprot Feature viewer video
- Uniprot activity
- Alignment slides
- Afternoon activity
- We start at 9:30
- Oct 4 slides
- Oct 4 activity
- Morning: Presentations on your protein
- Biological relevance of your protein
- Experimental details/methods
- 2 key findings
- Position of mutations on protein structure (structure figure)
- Discussion
- Afternoon: BLAST task
- 2018-10-09 (Tue), 13-17 (slides)
- Ontologies for metadata annotations (very brief introduction)
- Privacy, security, society - implications of availability & possible re-identification of genome data
- long range familial identification
- principles of Beacon-style re-identification attack
- "ease" of field sequencing (MinIon etc.)
- 2018-10-10 (Wed), 09-14:30
- preparation/recap time in the morning
- Written exam (13:00 - 14:30)
- multiple choice and free questions