RNASeq DE Analysis

Differential Expression analysis of RNAseq data in Loblolly Pine

Abstract
Background of samples
Location of data
Methods

Abstract

The NCSU TIP has been established for over 60 years with the objective of increasing genetic value of family-level selections for production deployment. Given the size and resources required to conduct tree breeding, experiments and selections of families used for deployment occur across thousands of acres in the Southeastern United States.
*This study examines the number and magnitude of differentially expressed transcripts among a set of 50+ families which span across a wide geographic range. Additionally, for those families which have phenotypes, we assess possible transcripts that may be differentially expressed between extreme ends of the phenotypic distribution.

Background of samples

A total of 144 biological replicates were grown from a group of families subset from the lower gulf elite population (LGEP) and 80 bioloigcal replicates were grown from the east-west diallel (EW) experiment.
LGEP
- 144 Biological Replicates x 3 technical replicates = 432 technical replicates
EW
- 80 Biological Replicates x 3 technical replicates = 240 technical replicates
Note that some samples from the EW had to be dropped after sequencing due to contamination, so the total number of bioloigcal replicates analyzed do not add up to 80+144.

Location of data

All raw data are stored in the same locations as noted in the breeding value prediction project but are provided again here for consistancy.

Data Subject Type	Data File Type	Path	Notes
raw read files		`/media/disk6/ARF/RNASEQ/shared/rawreads/86kSalmon`	Raw files returned from GSL
	raw tar	`./EWtarfiles or ./LGEPtarfiles`
	raw fasta	`./EWfasta or ./LGEPfasta`
trimmed and filtered read files		`/media/disk6/ARF/RNASEQ/shared/trimmedfiltreads/86k`	Files post trim & adapater removal
	EW	`./EW/lane01 ... ./lane12`
	LGEP	`./LGEP/lane01 ... ./lane18`
salmon count files		`/media/disk6/ARF/RNASEQ/shared/counts/86kSalmon`	Direcotries containing quant.sf files
	EW bio reps	`./bio_EW/Sample_<animal_id>/`
	LGEP bio reps	`./bio_LGEP/Sample_<animal_id>/`
experimental data resources		`/media/disk6/ARF/RNASEQ/shared/resources`	Experiment information
	sequencing	`./exptdesign/sequencing`
	pedigree	`./pedigree`
	phenotypes	`./phenos`

All scripts and markdown files relating to this project are stored within this repository.

Methods

Step 1 - Data Prep

Data prep includes everything from unpacking the original tar files recieved by the GSL up to estimating transcript abundance with Salmon. Additionally, this step includes identification of the indicies used within both batches and creates an experimental info matrix containing all meta data from both batches.

See the raw reads README for step by step processing of files.

Step 2 - Load Count Data

Once counts have been estimated, the next step involves reading in the aligned biological replicate counts using the tximport package.

Additionally, the phenotype and other sample meta-data is constructed for normalization.

To see this process for biological reps, navigate to: load counts bio rep html file which contains the complete markdown and output.

Step3 - Conduct pairwise DE tests among families

An initial filter was applied so that only families which had 3 or more bioloigcal replicates were kept for further analysis. The 57 families which passed this threshold were then used with the R package DESeq2 to test for differential epxression among all pairs of families.

The result of this step is a list of 1596 comparisons. Each comparison contains the results from DESeq2 between family X vs. family Y.

Generate pairwise DE tests: markdown file

Step4 - Conduct DE tests among State of origin

Similar to conducting pairwise tests among all families, the same filter was applied so that only families which had 3 or more bioloigcal replicates were kept for further analysis. However, an additional filter was applied here to remove all families which were not open-pollinated. The 41 families which passed this threshold were then used with the R package DESeq2 to test for differential epxression among the following states NC, SC, GA, FL, TX

The result of this step is a list of 840 comparisons. Each comparison contains the results from DESeq2 between family X vs. family Y.

Generate State origin DE tests: markdown file

Step5 - Conduct DE contrast between high and low volume

Families which had 3 or more bioloigcal replicates and contained a volume breeding value were kept for this analysis. The bottom and upper quartile of the volume phenotypic distribution were used with DESeq2 to generate a contrast between high and low volume breeding value families.

The result of this step is a single comparison between 10 low volume families and 11 high volume families.

Generate High vs Low vol DE test: markdown file

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
analyses		analyses
old		old
resources		resources
.gitignore		.gitignore
README.md		README.md
RNAseq-DE-analysis.Rproj		RNAseq-DE-analysis.Rproj
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNASeq DE Analysis

Abstract

Background of samples

Location of data

Methods

Step 1 - Data Prep

Step 2 - Load Count Data

Step3 - Conduct pairwise DE tests among families

Step4 - Conduct DE tests among State of origin

Step5 - Conduct DE contrast between high and low volume

About

Releases

Packages

Languages

arfesta/RNAseq-DE-analysis

Folders and files

Latest commit

History

Repository files navigation

RNASeq DE Analysis

Abstract

Background of samples

Location of data

Methods

Step 1 - Data Prep

Step 2 - Load Count Data

Step3 - Conduct pairwise DE tests among families

Step4 - Conduct DE tests among State of origin

Step5 - Conduct DE contrast between high and low volume

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages