05-Abundancy_Estimation.Rmd

---
output:
  html_document: default
  pdf_document: default
---
# Abundance Estimation

## Format check

Look at the format for a .gff file: https://en.wikipedia.org/wiki/General_feature_format

Exercise: Which genome/annotation pair in /home/data/de-2403/exercise_genomes/ has mismatched headers?

<!-- ASM674v1 has .1 in the fasta but not in the gff -->

## FeatureCounts

### Generate counts from the alignments directory {-}

Make sure the annotation is unzipped first.

```{bash, eval=FALSE}

source activate featurecounts

featureCounts -a ~/DGE_workshop/genome/GCF_000001635.27_GRCm39_genomic.gff -o /home/$USER/DGE_workshop/count_matrix.tsv --largestOverlap -M --primary -T 10 -g gene *.bam
```

FeatureCounts put in an extra row and some columns we want to get rid of before doing differential expression analysis.

```{bash, eval=FALSE}
less count_matrix.tsv
```

Use tail to take every row starting with the second, then extract only the columns of interest.

```{bash, eval=FALSE}
tail -n +2 count_matrix.tsv | cut -f 1,7-12 > DESEQ2_matrix.tsv
```

```{bash, eval=FALSE}
mkdir /home/$USER/DGE_workshop/DESeq2
mv DESEQ2_matrix.tsv /home/$USER/DGE_workshop/DESeq2/
```