Skip to content

Quality Control software for Diagnosing RNA-seq data

Notifications You must be signed in to change notification settings

hamsamilton/QCDR

Repository files navigation

Quality Control software for Diagnosing RNA-seq (QC DR)

QCDR

This is a software package designed to facilitate RNA-seq Quality Control analysis by creating visualizations and performing for key QC metrics.

This software is currently in alpha, made available early to align with ISCB 2023. Expect updates as we clean up the interface and fix any potential bugs. If you run into any bugs using this software, please leave a report for us. Additional information on methods will be published along with the paper in the future.

QCDR is written in Python and designed to be run in command line. For ease of user, QCDR comes with a containerized environment for use. It can be activated using

source activate QCDRenv

Once the environment is activated, the only file that needs to be filled with information is the input file. At its most simple, the User_Input.csv file contained within the data/User_Template folder should be filled with information about the User's data. If a user does not wish to supply data for a given metric, the column can be left blank. Once filled, the main command can be run by navigating to the software folder and entering

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/User_Template/User_Input.csv

In the case the user would like a different dataset to be used as the background distribution, it can be specified. For example.

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/SCRIPT/SCRIPT_stats_allbatches.csv

Given that the QC metric distributions used in QCDR are generally non-normal, the Warn and Fail cutoffs generated by QCDR are estimated by bootstrap sampling the distribution before estimating a confidence interval. By default, the fail and warn cutoffs generated by QCDR are at .05 and .1 significance levels of that confidence interval, respectively. However, these can be modified using the -f and -w options. For example the following code would shift these cutoffs.

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/User_Template/User_Input.csv -w .01 -f .001

Two additional subfigures can be created by QCDR by supplying additional tables, those are the Gene Distribution figure and the Gene Body Coverage figure.

To generate the Gene Distribution figure, The user may use the included mk_hist_frm_cnt_tbl.R function to generate the required file from a count matrix. An example of the count matrix required for this function is included at /data/SCRIPT/SCRIPT_CountTable.xlsx. Files like these can be transformed into the required input format by

Rscript mk_hist_frm_cnt_tbl.R -f ../data/SCRIPT/SCRIPT_CountTable.xlsx -o /path/to/output

Once created, it can be included within the main function by

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/User_Template/User_Input.csv -hist /histdataloc.csv

A separate supplied table is additionally required to generate the gene body coverage distribution subplot. We have built a separate but interoperable utility, ezGBC to generate these tables. Download and usage instructions can be found at https://github.com/hamsamilton/ezGBC. The csv files created by ezGBC can be used directly as input. An aexample final can be found at data/SCRIPT/SCRIPT_B11_GC_info.csv. Once generated, it can be added to the QCDR output like this

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/User_Template/User_Input.csv -gc gcdata.csv

Last, cutoffs can be set to specific values by filling the values in the data/User_Template/user_cutoff_table.xlsx file. For all metrics except for the hist and gc cutoffs, these should be raw values. For the hist and gc cutoffs, these must be significance levels. If the user does not want to set a metric, the cells can be left blank. If left blank, it will use the default cutoffs or those set by the -w or -f tags. The table should be applied to the -ctf flag as such.

python3 pyRetroPlotter_main.py -ip ../data/User_Template/User_Input.csv -out /path/to/outlocation.pdf -bgd ../data/User_Template/User_Input.csv. -ctf USER_cutoff_table.xlsx

This covers the capabilities of QCDR. If you are having difficulties, encounter bugs, or have other feedback about the software, please email me at [email protected]

About

Quality Control software for Diagnosing RNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published