-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-Abundancy_Estimation.Rmd
50 lines (29 loc) · 1.16 KB
/
05-Abundancy_Estimation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
output:
html_document: default
pdf_document: default
---
# Abundance Estimation
## Format check
Look at the format for a .gff file: https://en.wikipedia.org/wiki/General_feature_format
Exercise: Which genome/annotation pair in /home/data/de-2403/exercise_genomes/ has mismatched headers?
<!-- ASM674v1 has .1 in the fasta but not in the gff -->
## FeatureCounts
### Generate counts from the alignments directory {-}
Make sure the annotation is unzipped first.
```{bash, eval=FALSE}
source activate featurecounts
featureCounts -a ~/DGE_workshop/genome/GCF_000001635.27_GRCm39_genomic.gff -o /home/$USER/DGE_workshop/count_matrix.tsv --largestOverlap -M --primary -T 10 -g gene *.bam
```
FeatureCounts put in an extra row and some columns we want to get rid of before doing differential expression analysis.
```{bash, eval=FALSE}
less count_matrix.tsv
```
Use tail to take every row starting with the second, then extract only the columns of interest.
```{bash, eval=FALSE}
tail -n +2 count_matrix.tsv | cut -f 1,7-12 > DESEQ2_matrix.tsv
```
```{bash, eval=FALSE}
mkdir /home/$USER/DGE_workshop/DESeq2
mv DESEQ2_matrix.tsv /home/$USER/DGE_workshop/DESeq2/
```