-
Notifications
You must be signed in to change notification settings - Fork 7
Proposed Improvements
- Each module should have at least two exercises where the students are not copying/pasting anything. One could be at 1/2 way point, and the other at the end of the module. The one at the end could be a group exercise.
- Each day should end with at least one hour of integrated assignment.
The STAR aligner currently throws errors when run on an ubuntu system because /bin/sh
is linked to /bin/dash
and not to /bin/bash
.
ls -l /bin/sh
sudo mv /bin/sh /bin/sh.orig
sudo ln -s /bin/bash /bin/sh
Pre-install the tree
command in the Amazon AMI so that it is ready for students to use
In order for htseq-count to use bam files directly it needs pysam. This can be installed with pip but that is not available by default.
Then, On AMI install and test htseq-count with bam files:
sudo apt-get install python-pip
sudo pip install pysam
An alternative install procedure that has been tested and worked is as follows. The above procedure is preferred and should be tried first.
cd ~/bin/
wget https://pysam.googlecode.com/files/pysam-0.7.5.tar.gz
tar -zxvf pysam-0.7.5.tar.gz
cd pysam-0.7.5/
python setup.py build
sudo python setup.py install
For R and other applications it would be nice if X11 worked. Note the install instructions for R would need to change as well.
Create a wiki section and exercise that summarizes read trimming concepts. Start with some raw data, including aligned reads. Align these reads without any trimming and assess alignment statistics using Picard, FastQC, etc. Now take these same reads and perform both adaptor trimming and quality trimming. Re-align the trimmed reads and assess the effect of trimming on alignment metrics.
Add installation and running of RSeQC This could possible replace Samstat which never works very well
We should add a section about batch effects. Both detecting the presence of batch effects as well as correcting for them during analysis.
For convenience the cloud instances have been set up with very permissive security. Some better practices should be documented.
We previously had a fusion detection module but it was difficult to complete in time frames appropriate for a workshop. Further optimization is required. Another challenge is the lack of well engineered fusion detection software. This publication State-of-the-art fusion-finder algorithms sensitivity and specificity does a decent job of summarizing the current options available. Another caveat of this topic is that is mostly of interest to cancer researchers so it might only be included where there are sufficient students with this interest.
In particular we should add use of Picard CollectRnaSeqMetrics
(https://broadinstitute.github.io/picard/command-line-overview.html) and RNA-SeQC
(http://www.broadinstitute.org/cancer/cga/rna-seqc). It would also be good to include use of splicing metrics calculated from the TopHat junctions files. A standalone version of the TGI tool that does this would need to be created for this purpose.
There a some nice slides/concepts that we could borrow from the BaseSpace Demo slides (see Obi's ~/Dropbox Teaching/CSHL/2015/Workshop-CSHL-RNA-Seq-Metagenomics.pdf).
Can expand on the current Kallisto exercise.
For example, pathway analysis of RNA-seq data, clustering, etc.
http://www.ncbi.nlm.nih.gov/gds/?term=rna-seq+splicing
Gray lab breast cancer cell line dataset:
- http://www.ncbi.nlm.nih.gov/pubmed/24176112
- https://www.biostars.org/p/111040/ (biostars tutorial on downloading data)
- https://github.com/genome/gms/wiki/Guide-to-Importing-and-Analyzing-External-Data (another guide on downloading and reformatting this data)
Update the tutorial to take into account recent developments in RNA-seq analysis methods, best practices, and new tools
Perhaps HISAT2 can be used instead of TopHat/STAR. Using only a single aligner would save time for exploring more concepts downstream. Not sure of the value of using multiple alignments anyway. Time to shift focus a bit more towards the downstream stuff as alignment and QC become more routine.
Also, the new "Tuxedo" protocol paper is out demonstrating an equivalent HISAT2/StringTie/Ballgown workflow: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
After use of TopHat2 or RNA-star (or hopefully HISAT2), QoRTs (written in Scala) performs QC but also processes RNA-seq data to produce count files needed for splicing analysis by DEXSeq and JunctionSeq.
| Previous Section | This Section | Next Section | |:------------------------------------------------------------:|:--------------------------:|:-------------------------------------------:| | Integrated Assignment | Proposed Improvements | AWS Setup |
##Note: The current version of this tutorial is now at www.rnaseq.wiki
Table of Contents
Module 0: Authors | Citation | Syntax | Intro to AWS | Log into AWS | Unix | Environment | Resources
Module 1: Installation | Reference Genomes | Annotations | Indexing | Data | Data QC
Module 2: Adapter Trim | Alignment | IGV | Alignment Visualization | Alignment QC
Module 3: Expression | Differential Expression | DE Visualization
Module 4: Ref Guided | De novo | Merging | Differential Splicing | Splicing Visualization
Module 5: Kallisto
Appendix: Abbreviations | Lectures | Practical Exercise Solutions | Integrated Assignment | Proposed Improvements | AWS Setup