Skip to content

Pisces 5.2.7 Design Document

tamsen edited this page Sep 24, 2018 · 3 revisions

Overview

Pisces as a rapid, robust, versatile and accurate variant caller. Pisces runs on linux or windows. It will run on germline or tumor-only somatic samples, and search for SNVs, MNVs, and small indels. It takes in .bams and generates .vcf or .gvcf files. It is included with the Illumina MiSeqReporter pipeline and various BaseSpace workflows. The caller can also be run as a standalone program.

In accordance with best software practices, both for development and evaluation, Pisces is deliberately limited in its scope to variant calling. It does not do any BAM pre-processing, or any VCF-post processing. However, best clinical practices are very different from best software development practices. For the best analytic results, the variant calling pipeline should include much greater complexity. This is discussed in https://github.com/Illumina/Pisces/wiki/Suggested-Pipeline-Configuration-5.2.7 .

Variants are detected by examining the CIGAR strings in the BAM file and comparing read sequences to the reference sequence. Pisces does not perform any re-alignment. Because of this approach, Pisces is dependent on the preferences of the upstream aligner in representing complex variants.

At a high level, Pisces identifies candidate alleles in aligned reads, calculates metrics for each candidate, and then calls an allele, either reference or variant. After calling, variants are annotated with additional information, and then outputted in the VCF or gVCF file.

Each allele genotype detected is processed and outputted as separate calls. In other words, multi-allelic variants, such as A->T/C, will be represented as two separate variants, A->T and A->C, at the same position. The reason for considering these variants independently is because somatic samples frequently have multiple cell line populations. In germline mode, Pisces will pare down the calls to conform to a diploid model, and report all conforming co-located variants on a single VCF line.

Glossary

Pisces Glossary

Configuration

Pisces supports configuration of parameters so that its behavior can be fine tuned depending on the application context.

Format: dotnet Pisces.dll [-options]

Example: dotnet Pisces.dll -bam C:\my\path\to\TestData\example_S1.bam -g C:\my\path\to\WholeGenomeFasta

SDS ID Specification
SDS-1 Pisces shall accept command line arguments as a whitespace-separated list of name and value pairs.
SDS-2 Pisces shall accept the command line arguments listed here: Pisces 5.2.5 Command Line Arguments. If an argument is not a valid input, Pisces shall exit with an error message describing the failed argument and the reason for failure.
SDS-3 If an unknown command line argument is encountered, Pisces shall exit with an error message describing the unknown argument.

Input

Pisces requires one or more BAM files as input or a parent BAM folder. These BAM files are assumed to be sorted, i.e. alignment reads should be sorted by mapped reference position. This assumption allows Pisces to process BAMs in one pass without a large memory footprint. Most standard aligners produce sorted BAM files. Positions in BAM files are expected to use 0-based coordinate system. For performance, Pisces requires each BAM file to have an accompanying BAI index file. The index file allows Pisces to efficiently jump to chromosomes it is configured to call. It also requires a reference genome for each BAM file. Optionally, the user can specify genomic regions of interest for which Pisces should produce allele calls. This is done by providing Picard-style interval files. This format lists intervals defined as a chromosome, and start and end positions, inclusive. Positions in interval files are expected to use a 1-based coordinate system. Picard is a common, industry standard tool.

SDS ID Specification
SDS-4 Pisces shall require at least one and at most 96 sorted BAM file(s) as input or a BAM folder location.
SDS-56 If neither BAM files nor BAM folder are specified, Pisces shall exit with an error message.
SDS-5 For each BAM input file, Pisces shall require an accompanying BAI file with the same file name in the same directory.
SDS-6 If a BAM input file is missing an accompanying BAI file, Pisces shall exit with an error message describing which BAI file is missing.
SDS-7 Pisces shall require one or more valid reference genomes, specified as directory paths, as input. A valid reference genome directory must contain: (1) An xml file named “GenomeSize.xml”, containing the names of all fasta files for the genome. (2) One or more fasta files containing the reference sequences. Fasta files are expected to have a “fa” extension, but this is not enforced. (3) One or more fasta index files. There must be one index file per fasta file, and the index file shall be named the same as the fasta file, but with a “fai” extension.
SDS-8 If an invalid reference genome is specified, Pisces shall exit with an error message describing which genome is invalid.
SDS-9 If only one reference genome directory is specified, Pisces shall apply that reference genome to all bam files.
SDS-10 If more than one reference genome directory is specified, Pisces shall apply the reference genomes to the bam files according to their position in the list, i.e. 1st genome is applied to 1st bam file, 2nd genome is applied to 2nd bam file, etc.
SDS-11 Pisces shall accept Picard-style interval files as input. If only one interval file is provided, Pisces shall apply that interval file to all bam files.
SDS-12 When Picard-style interval files are provided, Pisces shall apply interval files to the bam files according to their position in the list, i.e. 1st interval file is applied to 1st bam file, 2nd interval file is applied to 2nd bam file, etc.

An example of a “GenomeSize.xml” file is below:

<sequenceSizes genomeName="chr19FASTA">

       <chromosome fileName="chr19.fa" contigName="chr19" totalBases="3119000" isCircular="false" md5="1aacd71f30db8e561810913e0b72636d" ploidy="2" knownBases="55808983" />

</sequenceSizes>

Output

Pisces outputs allele calls in the form of VCF or gVCF files. VCF is an industry standard format for representing variant calls. gVCF files conform to VCF format, but contain both variant and reference calls.

One output file is produced per input BAM file. The output file is named the same as the input BAM, but with a different extension, so users can easily associate input and output.

Allele position is reported using 1-based coordinate system. SNV and MNVs are reported at the first position of the mutation. Insertions and deletions are reported at the position prior to the mutation. See section “Candidate Allele Identification” for more details.

Pisces output format is standard VCF, or a somatic VCF. In the standard VCF format, all alleles detected for a given loci are written to a single line (“crushed”). In the somatic VCF output, each allele detected is written to a single VCF line. The crushed format is industry-standard for germline results. Pisces somatic output is more readable for somatic results, when many somatic variants maybe detected.

More details on the output vcf format are found at the link below: Pisces VCF Specifications

Optionally, Pisces can be configured to output bias files. These are files containing intermediate data used to calculate strand bias. They are meant to be used for troubleshooting or debugging purposes, and are not considered supported output.

SDS ID Specification
SDS-13 By default, Pisces shall produce output files in the same directory as input BAM files.
SDS-14 If an output folder is configured, Pisces shall produce output files in that folder. If the output folder does not exist, SAV shall create it.
SDS-15 By default, Pisces shall output a VCF file containing variant calls. The VCF file shall be named the same as the input BAM file, but with a “.vcf” file extension.
SDS-16 Pisces shall follow the Pisces VCF Specifications given here. [Pisces VCF Specifications](Pisces VCF Specifications)
SDS-24 If configured to output gVCFs, Pisces shall output a gVCF file containing both variant and reference calls. The gVCF file shall be named the same as the input BAM file, but with a “genome.vcf” file extension”
SDS-25 Pisces shall output a gVCF file with the same format and header section as a VCF file.
SDS-26 If unable to output a VCF or gVCF file, Pisces shall exit with an error message describing which file could not be written and the reason for the error.
SDS-27 If configured to output bias files, Pisces shall output a file containing intermediate data used during strand bias calculations.

Design

Alignment Processing

Alignments are read from the BAM file and processed to extract candidate alleles. Read processing happens in memory and the BAM file is never modified.

SDS ID Specification
SDS-28 Pisces shall ignore alignment reads for which any of the following conditions are true. These conditions are all specified in the BAM file.
Unaligned or unmapped
Not a primary alignment
No mate and Pisces is configured to only use proper pairs
PCR duplicate
Alignment score, a.k.a map quality, is < minimum alignment score configured
Cigar data is missing

Candidate Allele Identification

Pisces identifies candidate alleles in an alignment (either stitched or unstitched) according to the CIGAR string. Each CIGAR operation is examined and could produce one or more candidates. As previously mentioned, candidates are identified for each allele genotype found.

When a candidate allele is found in an alignment, it is either added to a collection of candidates if it has not yet been seen, or it increases the support of an existing candidate, if already encountered.

Insertions and deletions are reported at the position prior to the actual variation. This an industry standard and allows these variant types to have an anchor to the reference genome when reporting reference/alternate alleles. Note that BAM files are expected to adhere to standard practice of soft clipping insertion or deletion operations at the ends of the read.

Optionally, if interval files are specified, only candidates at positions of interest are identified. Pisces will also identify a candidate reference at positions with no coverage. This is to ensure that every position of interest has a corresponding call in the output. Positions with no coverage will have coverage and qscore set to 0, and subsequently will have the appropriate filters applied.

SDS ID Specification
SDS-33 Pisces shall identify a candidate SNV if all of the following apply:
The CIGAR operation is “M”
The observed allele does not match the reference allele at that position
The observed allele is not “N”
The observed allele does not qualify as part of potential MNV
The observed allele base call quality is equal to or greater than the configured minimum base call quality
Pisces shall report a candidate SNV as follows:
The reported position is the position of the observed allele
The reported reference allele is the reference allele at the reported position
The reported alternate allele is the observed allele at the reported position
SDS ID Specification
SDS-34 Pisces shall identify a candidate MNV if all of the following apply:
The CIGAR operation is “M”
The observed allele sequence length ≤ maximum MNV length configured
The observed allele sequence does not contain an “N”
The observed allele sequence does not contain any bases where the base call quality is below the configured minimum base call quality
The observed allele sequence may contain one or more non-variant gaps where:
(A)The length of the gap ≤ the maximum gap length configured
(B)The gap does not start at the first base of the allele sequence
(C)The gap does not end at the last base of the allele sequence
SDS-58 Pisces shall report a candidate MNV as follows:
The reported position is the position of the first base of the observed sequence
The reported reference allele is the reference sequence along the length of the MNV
The reported alternate allele is the observed sequence along the length of the MNV
SDS ID Specification
SDS-35 Pisces shall identify a candidate insertion if all the following apply:
The CIGAR operation is “I”
The first base of the observed insertion has base call quality equal to or greater than the configured minimum base call quality.
SDS-59 Pisces shall report a candidate insertion as follows:
The reported position is the position preceding the inserted sequence
The reported reference allele is the reference allele at the reported position
The reported alternate allele is the reported reference allele, plus the inserted sequence, where the inserted sequence length matches the length of the CIGAR operation.
SDS ID Specification
SDS-36 Pisces shall identify a candidate deletion if the following apply:
The CIGAR operation is “D”
The preceding and trailing bases must have a base call quality equal to or greater than the configured minimum base call quality.
SDS-60 Pisces shall report a candidate deletion as follows:
The reported position is the position preceding the deleted sequence
The reported reference allele is the reference allele at the reported position, plus the deleted reference sequence, where the deleted sequence length matches the length of the CIGAR operation
The reported alternate allele is the reference allele at the reported position
SDS ID Specification
SDS-37 If configured to output gVCFs, Pisces shall identify a candidate reference if all of the following apply:
The CIGAR operation is “M”
The observed allele matches the reference allele at that position
SDS-61 Pisces shall report a candidate reference as follows:
The reported position is the position of the observed allele
The reported reference allele is the reference allele at the reported position
The reported alternate allele is empty.
SDS ID Specification
SDS-38 If configured to output gVCFs and interval files are specified, Pisces shall identify a candidate allele call for only positions within the defined intervals. Each position in a given interval shall have a call regardless of coverage. If there is no coverage for a given position, Pisces shall identify a candidate reference at that position with zero coverage.

Collapsing

If configured to collapse variants, Pisces shall perform collapsing after candidate identification. Collapsing two variants together requires an exact match of the variants in the overlapping region. It also requires the open ended-ness of the variants to support the hypothesis that the variants are the same.

Collapsing is a greedy algorithm in the larger variants are prioritized as target hypotheses. The rationale for this is that aligners typically do not prefer large variants. If the upstream aligner suggests a large variant, Pisces assumes there's good evidence for this.

SDS ID Specification
SDS-62 If configured to collapse variants, Pisces shall track the open endedness of a candidate variant, according to the following rules:
If the candidate variant is at the start of the read, the variant shall be tagged as open on the left. Candidate variants next to a soft clipped region are not considered open.
If the candidate variant is at the end of the read, the variant shall be tagged as open on the right. Candidate variants next to a soft clipped region are not considered open.
If the candidate variant is an SNV or MNV and the base before the candidate variant is a no-call or does not meet the basecall quality threshold, the variant shall be tagged as open on the left.
If the candidate variant is an SNV or MNV and the base after the candidate variant is a no-call or does not meet the basecall quality threshold, the variant shall be tagged as open on the right.
SDS ID Specification
SDS-63 For each variant that is not fully anchored (open ended on one side or both), Pisces shall attempt to collapse by doing the following:
1) Find all potential matches. A potential match must meet the following criteria:
* Fully overlap the variant to collapse
* Be a compatible variant type. If the variant to collapse is an insertion, deletion, or MNV, the potential match should match exactly. If the variant to collapse is an SNV, the potential match can be an SNV or MNV.
* Have bases that exactly match the variant to collapse in the overlap region
* If the variant to collapse is not open-ended on the left, the potential match must also not be open-ended on the left and the left end must align by position.
* If the variant to collapse is not open-ended on the right, the potential match must also not be open-ended on the right and the right end must align by position.
2)If a potential match is fully anchored, take that as the winner. Otherwise, rank potential matches according to the following preference and pick the most likely.
* Known variants
* Fully anchored variants
* Larger variant
* More frequent variant
* Left most variant
* Alphabetically ranked by alternate allele (for deterministic behavior)

Calculations

The following calculations are applied to each candidate allele identified.

Coverage

Total coverage is the count of observations for a given allele position, either variant or wildtype. Allele A, C, G, or T observations counts towards total coverage. Deletions that span a given position also counts towards coverage, because it is a concrete observation. No calls do not count.

Coverage for variants that span multiple bases is calculated by taking the minimum coverage between two data points. For insertions, this is the position preceding and trailing the variation. This provides a consistent anchor to the reference genome across variant and wildtype reads. For deletions and MNVs, this is the position of the first and last bases in the variation. These variant types are already anchored to the reference genome. The minimum coverage is used to filter out coverage contribution from reads that only partially span a variant.

SDS ID Specification
SDS-39 Pisces shall calculate total coverage for candidate alleles as follows:
Allele type Rule
SNV & reference calls Total number of bases called at that position. This includes A,C,G,T and deletions.
Insertion (if unstitched) sum of the minimums by direction of, or (if stitched) the average of:
A)Total coverage at position preceding inserted sequence
B)Total coverage at position trailing inserted sequence
Deletion The average of:
A)Total coverage at first deleted position
B)Total coverage at last deleted position
MNV Tthe average of:
A)Total coverage at first position in MNV
B)Total coverage at last position in MNV
SDS ID Specification
SDS-40 For candidate variants, Pisces shall calculate the following:
VariantSupport = ∑ observed processed reads containing the candidate allele

For point mutations: ReferenceSupport = the total support for the reference call. 
For extended mutations: ReferenceSupport = the total coverage - the variant support. 

VariantFrequency = VariantSupport / TotalCoverage
ReferenceFrequency = ReferenceSupport / TotalCoverage

Variant Quality Scoring

Variant quality score is a Phred score based on the probability that an allele call is not a sequencing error. Phred scores are an industry accepted representation of sequencing quality.

Variant quality scoring assumes a Poisson distribution, which corresponds to the Poisson distribution of sequencing errors. The Poisson model relies on a fixed sequencing error rate which is derived from the estimated basecall noise level configured.

By default, the estimated basecall noise level is set to the minimum basecall quality configured. This is reasonable as anything below the minimum basecall quality is considered noise. Both parameters are configurable. The noise level applied may be decoupled from the minimum basecall quality and set to a specific value with the "-NL [X]" argument/setting, where X is the desired level, in Q space. Another option is to change the noise level to reflect the noise local to a variant. This may be done with the "-NoiseModel Window" argument/setting, which will calculate the average noise level in a window around the variant, and use that. (The window size is currently hard coded to 1). In practice, we have found keeping the applied noise level coupled to the minimum basecall quality to be the most effective.

SDS ID Specification
SDS-41 Pisces shall calculate a qscore for each candidate allele, according to the following equations:
ErrorRate = 10^( -1 * EstimatedNoiseLevel / 10)
ProbabilityNotNoise = 1 – CDFPoisson ( VariantSupport – 1, TotalCoverage * ErrorRate )
Qscore = -10 * Log10 (ProbabilityNotNoise)

Strand Bias

Strand bias determines if a variant is preferentially observed in one direction over the other. This is indicative of sequence-specific sequencing error (SSE). To avoid SSE’s showing up as false positives, a strand bias score is computed and a filter is later applied.

Strand bias is the maximum of forward strand bias and reverse strand bias. These are considered directional bias. Directional bias is the conditional probability that support in that direction is not due to sequencing error, given the probability of a false positive in the opposite direction, relative to the total support across both directions.

Similar to quality scoring, the probability that support is not due to sequencing error is based on a Poisson distribution. The probability of a false positive is the inverse, since false positive is mistaking a sequencing error for a true variant.

If stitching is configured and a candidate is found in the stitched region, the read will contribute half to forward direction and half to reverse direction. In other words, there was support found in both directions so contribution is distributed evenly.

SDS ID Specification
SDS-42 Pisces shall calculate a strand bias score for a candidate allele, according to the following equations:
ForwardBias = ProbabilityForwardNotNoise * ProbabilityReverseFalsePos / ProbabilityTotalNotNoise
ReverseBias = ProbabilityReverseNotNoise * ProbabilityForwardFalsePos / ProbabilityTotalNotNoise
ProbabilityDirectionNotNoise = CDFPoisson ( SupportDirection  – 1, CoverageDirection * ErrorRate )
ProbabilityDirectionFalsePos = 1 – ProbabilityDirectionNotNoise
StrandBias = 10 * Log10 Max( ForwardBias, ReverseBias )
SDS ID Specification
SDS-43 If stitching is configured and a candidate is found in the stitched region, Pisces will count half the read support to forward direction and half to reverse direction.

Allele Calling

Once metrics for candidate alleles have been calculated, Pisces will reject candidates that do not meet the basic requirements for calling. By default, candidates must meet coverage and qscore thresholds. Candidate variants must also meet frequency thresholds.

For MNVs that are rejected, Pisces will attempt to reallocate them to smaller existing candidate MNVs, if possible. This is a rescue mechanism to ensure we do not lose supporting evidence for the smaller candidates. Because of this reallocation, Pisces will call MNVs in order of allele length, descending. That allows for one pass through the MNV list. Bases that cannot be reallocated and do not match the reference will be converted to SNVs, either existing or newly created.

If configured to output gVCFs, candidates that do not meet coverage thresholds are still called but filtered. This is to avoid confusion as a false negative if we output a reference call. The variant might be real even if there is not enough evidence to be confident.

Pisces may call multiple variants at a given position, but will never call both a variant and reference at a given position. References are only called at positions with no variants called.

Lastly, if interval files were specified, Pisces will reject candidates with a position outside of the interval set. Furthermore, if there are positions within the interval set for which no candidate allele was identified, i.e. no coverage, Pisces will output a reference call with total coverage 0 and qscore 0. This will be filtered downstream.

SDS ID Specification
SDS-44 Pisces shall call candidate variants if all of the following apply:
Total coverage ≥ minimum coverage configured, or Pisces is configured to output gVCFs.
Variant frequency ≥ minimum frequency configured.
Qscore ≥ minimum qscore configured.
SDS ID Specification
SDS-45 If configured to output gVCFs, Pisces shall call a candidate reference only if no variants were called at the same position.
SDS-46 If configured to use interval files, Pisces shall call candidate alleles only if their position exists within a valid interval, inclusive of specified start and end positions.
SDS-47 If configured to use interval files, Pisces shall output a reference call with total coverage = 0 and qscore = 0 for any interval position with no coverage.
SDS-53 During calling, Pisces shall reallocate rejected MNVs to smaller called MNVs, if possible. Reallocation requires an exact match in allele sequence.
SDS-54 For non-reference bases of an MNV that cannot be reallocated, Pisces shall either add support to an existing SNV, if one exists, or create a new SNV.
SDS-57 If configured to output gVCFs, Pisces shall either add support to an existing reference allele, if one exists, or create a new reference for reference bases of an MNV that cannot be reallocated.
SDS-55 If an MNV can be reallocated, Pisces shall increment support for the new target(s) by adding the support for the original MNV.

Post Processing

Once an allele has been called, filters and additional annotations are applied.

Pisces supports seven types of filters: indel repeats, low genome quality, low variant frequency, coverage, variant qscore, genotype qscore, and strand bias. Thresholds are configurable. The calling application can apply additional filters if desired downstream.

SDS ID Specification
SDS-48 Pisces shall optionally apply the following filters to each allele call:
Indel Repeat Filter:
Triggers if indel repeat length >= maximum repeat configured
Indel repeat length is calculated for insertions and deletions by scanning up to 50 base pairs of the chromosome reference on either side of the allele coordinate.
This filter:
This filter filters indels that are in sections of the genome with repeats of length [1 to M], repeated >= N times. By default, N==9. M is the smallest possible repeated unit in the variant bases. Ie, . A>ATCTG would be 4, TCTG; A>ATCTC would be 2.
* Only considers repeated sequences that are consecutive and adjacent
* Only consider as triggers homopolymers/repeat sections in the reference, and not the variant
* Only filters for what the inserted/deleted bases are. IE, if the homopolymer is AAAAAAAAAAAA and we have an inserted "T", that should not be filtered.
This implementation was taken as-is from Isas, with the caveat that we changed ">N" to ">=N" , because it felt more intuitive.
RMxN Filter:
This filter filters indels that are in sections of the genome with repeats of length [1 to M], repeated >= N times. By default, M=5, and N=9.
This filter is stronger than Indel Repeat Filter, because
it filters MNVs and SNPs as well as indels (where ths SNP might be one added repeat minus one subtracted repeat).
it filters when there is repeat content at the end or the beginning of an indel. it does not have to be the entire indel to trigger the filter.
Low Variant Quality:
Variant quality score < minimum variant quality configured
Low Genotype Quality
Genotype quality score < minimum genotype quality configured
Low Variant Frequency
Allele frequency < minimum frequency configured
Low Depth
Total coverage < minimum coverage configured.
StrandBias
Either of the following is true:
Strand bias score > strand bias threshold configured
Only present on one strand and this filter option is configured
Multi Allelic Site
Pisces is running in "diploid" mode, but the variants discovered at the given loci do not conform to adiploid model.

The filter tags in the output vcf are as follows:

Filter Tag
RMxN R{threshold_M}x{threshold_n}
Indel Repeat R{threshold}
Low Variant Quality q{threshold}
Low Genotype Quality LowGQ
Low Variant Frequency LowVariantFreq
Low Depth LowDP
StrandBias SB
Multi Allelic Site MultiAllelicSite

Additionally, Pisces calculates the fraction of "no calls", if configured. Here, "no call" means a base in a read that Pisces cannot apply to its coverage counts as either a A,C,G or T. These might be bases where the basecaller (upstream of variant calling) was unable to determine if the base observed was an A,C,G, or T, output an "N" or any base with a very low base call qscore. Fraction no call is a valuable troubleshooting metric. If the fraction no call is high, the calling application or user might consider the allele with less confidence.

SDS ID Specification
SDS-49 If configured to report no calls, Pisces shall calculate the no call fraction for each allele call using the following equation at that position:
fraction_no_calls = ∑ No calls / ( TotalCoveragePassingFilters (ie, DP) + ∑ No calls )
SDS ID Specification
SDS-50 Pisces shall set the genotype for an allele call. The Genotyping rules depend on the genotype model configured, as shown below:

If the Somatic Ploidy model is selected: (default, or -ploidy somatic)

Genotype Description Rule
./. No Call Reference LowDP filter and allele call is reference.
./. No Call Alternate LowDP filter and allele call is variant.
0/. An indeterminate reference call The reference call passes the omit filters, and a secondary allele may not be ruled out. For example, there may be a 50% deletion or MNV upstream, that extends through this position, and the remaining calls constitute a strong reference call.
1/. An indeterminate alternate call The alternate call passes the omit filters, and a secondary allele may not be ruled out.
0/0 Homozygous Reference Allele Variant frequency is below the minimum threshold. Default is 1%
0/1 Heterozygous Alternate Allele Variant frequency is >= minimum threshold and reference frequency is above the minimum threshold. Default is 1%
1/1 Homozygous Alternate Allele Reference frequency is below the minimum threshold. Default is 1%
1/2 Heterozygous, with two Alternate Alleles Not available when ploidy=Somatic.

If the Diploid Ploidy model is selected: (-ploidy diploid)

Genotype Description Rule
./. No Call Reference LowDP filter and allele call is reference.
./. No Call Alternate LowDP filter and allele call is variant
./. No Call Alternate Depth is acceptable, but multiple alleles are present and do not conform to a diploid model.
0/. An indeterminate reference call The reference call passes the omit filters, and a secondary allele may not be ruled out. For example, there may be a 50% deletion or MNV upstream, that extends through this position, and the remaining calls constitute a strong reference call.
1/. An indeterminate alternate call The alternate call passes the omit filters, and a secondary allele may not be ruled out.
0/0 Homozygous Reference Allele Variant frequency is below the minimum threshold "A". Default is 20%
0/1 Heterozygous Alternate Allele Variant frequency is >= minimum threshold "A" and reference frequency is <= the minimum threshold "B". Default values for A and B are 20 and 70, respectively.
1/1 Homozygous Alternate Allele Variant frequency is >= minimum threshold "B"
1/2 Heterozygous, with two Alternate Alleles The top two variant frequencies combined are >= the minimum threshold "C", together, and are each >= minimum threshold "A" alone. Be default, C is 80%.

Parallelization

By default, BAMs are processed in parallel for efficiency, up to the maximum number of threads configured. If desired, Pisces can be configured to parallelize by chromosome as well. In this case, chromosomes within a given BAM will be processed in parallel. This is to support whole genome sequencing applications where there is one very large BAM containing full coverage of the genome.

SDS ID Specification
SDS-51 By default, Pisces shall process BAM files in parallel, up to the maximum number of threads configured.
SDS-52 If configured to parallelize by chromosome, Pisces shall process chromosomes within a given BAM in parallel, up to the maximum number of threads configured.

Forced Genotype

Pisces accepts an input vcf file with a set of alleles to force Pisces to output the basic information (coverage and reads support these alleles) for these alleles even when they would not normally be detected as variants (called).

When adding this vcf file to the command line, we have to turn off crushvcf by -crushvcf false. The genotype field for the forced allele depends on the other (called) allele genotype for the position.

Genotype Description
0/0 If the 'true' call is a reference site
./. If the 'true' call is a no call (ex, low depth) or spanning upstream variant (such as an upstream homozygous deletion)
* / * If the 'true' call is a variant other than the forced-report query.

All forced alleles will have the filter "ForcedReport" added, unless the allele would be called successfully (PASS) without being forced.

Limitations

In germline mode, there is currently no special treatment for mitochondrial or allosomes such as chrX and chrY.

General

5.2.10

5.2.9

5.2.7

5.2.5

5.2.0

5.1.6

5.1.3

Clone this wiki locally