Skip to content
Yaobo Xu edited this page Jun 15, 2020 · 20 revisions

cgpNgsQc

Collection of code for checking NSG sequencing results.

Executables

  • compareBamGenotypes.pl

    For usage: compareBamGenotypes.pl -h.

    Compare genotypes of a set of BAM files from the same donor and produces the fraction of matched genotypes. It also checks if the inferred genders are matched.

  • verifyBamHomChk.pl

    For usage: verifyBamHomChk.pl -h.

    Runs verify BAM

  • validate_sample_meta.pl

    For usage: validate_sample_meta.pl -h.

    Validate sample meta data and corresponding bam files, upon successful validation, UUIDs will be assigned and md5sum of bam files will be produced.

    If the bam header satisfies pan-prostate bam header requirements (SQ lines and CL lines are checked for this) as well, it'll be labeled as 'pp-remapped', 'raw' if it does not.

    A successful validation requires following fields in the input file:

    • Donor ID
    • Tissue ID
    • Sample ID (should be unique as well)
    • is_normal (Y/N, Yes/No, indicate if a sample is normal sample or not a normal sample, ie: a tumour sample)
    • is_normal_for_donor (Y/Yes or blank, indicate if a sample is used as the matched normal for all other samples of the donor)
    • relative_file_path (an existing bam file, with at least a million reads and satisfying requirements below)

    Bam requirements:

    • The bam should have Read Group (RG) line(s). Each RG line has ID tag, library (LB) tag, platform (PL) tag and sample (SM) tag
    • ID tag value must be unique, and reads been assigned with the ID must exist in the bam
    • PL tag value must be 'ILLUMINA', as we don't support other platforms yet
    • SM tag value must be the corresponded Sample ID
    • All reads in the bam should have RG ID assigned, and the assigned ID must be declared in one of the RG lines in the header