Skip to content
This repository has been archived by the owner on May 15, 2020. It is now read-only.

Demo evaluation needs correction #118

Open
logust79 opened this issue Apr 23, 2019 · 5 comments
Open

Demo evaluation needs correction #118

logust79 opened this issue Apr 23, 2019 · 5 comments

Comments

@logust79
Copy link

In the demo's Evaluation section, the command:

zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf (remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /ihart/BaseSpace/Projects/CanvasSPW/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /CanvasDIR/Tools/EvaluateCNV/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt 

would not run since the path to generic.cnaqc.excluded_regions.bed is wrong, and also for consistency, CanvasSPW should be renamed to canvas. And it's better to comment out the (remove REF calls) part. So in the end it would be something like this:

zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf #(remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt 

But in the end it still crashes saying that I need to provide reference ploidy

...
2019-04-23T09:47:57+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
	Error: Truth variant chr6:105256020-105271607 with no overlapping Canvas calls. Reference ploidy cannot be determined! Please provide reference ploidy via command line options
...
@eroller
Copy link
Member

eroller commented Apr 23, 2019

Yes, the demo documentation is outdated. Sorry about that. I will keep this issue open so others can see the workaround. For reference ploidy vcf input see this post: #89 (comment)

@logust79
Copy link
Author

logust79 commented Apr 24, 2019

Thank you for your reply. After some research and trials / errors, I still fail. This is the code I ran:

zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
    output/demo/TempCNV_child1/CNV.vcf \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
    inheritedCNVs.txt \
    --ploidy=1 1 data/Files/par.bed

par.bed being

chrX	60001	2699520
chrX	154931044	155260560
chrY	10001	2649520
chrY	59034050	59363566

Error being

2019-04-24T12:20:45+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
        Value cannot be null.
Parameter name: fileName
System.ArgumentNullException: Value cannot be null.
Parameter name: fileName
   at System.IO.FileInfo..ctor(String originalPath, String fullPath, String fileName, Boolean isNormalized)
   at EvaluateCNV.CNVChecker.ComputeCallability(ILogger logger, Dictionary`2 callsByContig, EvaluateCnvOptions options, IDirectoryLocation output) in D:\TeamCity\buildAgent\work\a2$a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 543
   at EvaluateCNV.CNVChecker.<>c__DisplayClass24_0.<Evaluate>b__4(IWorkDoer workDoer) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 536
   at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation loggingDir, Action`1 logCommand, Cance$lationToken cancellationToken, Action`1 function)
   at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationToken canc$llationToken, Action`1 function)
   at Isas.Framework.WorkManagement.ResourceManagement.WorkResourceManagerFactory.RunWithResourceManager(ILogger logger, ISettings settings, CancellationToken cancellationToken, Ac$ion`1 function)
   at Isas.Framework.WorkManagement.WorkDoerFactory.RunWithWorkDoer(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationTokenSource cancellationToken$ource, Action`1 function)
   at EvaluateCNV.CNVChecker.Evaluate(String truthSetPath, String cnvCallsPath, String excludedBed, String outputPath, EvaluateCnvOptions options) in D:\TeamCity\buildAgent\work\a29
a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 538
   at EvaluateCNV.Program.MainHelper(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 49
   at EvaluateCNV.Program.Main(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 16

Any idea?

@logust79
Copy link
Author

logust79 commented Apr 24, 2019

I figured out that I needed to provide kmer.fa. And since it infers the (wrong) location of GenomeSize.xml, I needed to soft link some of the files such as kmer.fa and filter13.bed.

zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
    output/demo/TempCNV_child1/CNV.vcf \
    /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
    inheritedCNVs.txt \
    --ploidy=1 1 data/Files/ploidy.bed \
    -k=data/canvasdata/Files/kmer.fa

This command works with no errors, and outputs the following as part of the result:

Ploidy  1.86
Results for PASSing variants
Accuracy        39.7608
DirectionAccuracy       40.1665
F-score 0.8575
Recall  77.7004
DirectionRecall 78.4933
Precision       95.6493
DirectionPrecision      96.6254
GainRecall      70.6110
GainDirectionRecall     71.4076
GainPrecision   91.2464
GainDirectionPrecision  92.2757
LossRecall      80.0021
LossDirectionRecall     80.0021
LossPrecision   96.9904
LossDirectionPrecision  97.9502
MeanEventAccuracy       68.7341
MedianEventAccuracy     94.5666
VariantEventsCalled     2133
VariantBasesCalled      219903552
...

The recall rate is a bit far off from the documentation, though there are warnings in the stderr that might be related, such as that it failed to locate PARv5.bed, and one of the chrY calls has GT as 1/1:... instead of 1:....
Any ideas?

@eroller
Copy link
Member

eroller commented Apr 24, 2019

There are no truth events on chrX for that sample so the PAR calls will not affect recall. The lower recall number you are seeing is probably just a limitation in the truth set for that simulated dataset. ~80% recall is typical for a germline sample.

PARv5.bed files attached
PARv5.bed.hg19.txt
PARv5.bed.grch38.txt
PARv5.bed.grch37.txt

@logust79
Copy link
Author

Thank you @eroller ! I guess the demo run can be deemed a success.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants