-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: create tnscope mnvs #1524
fix: create tnscope mnvs #1524
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1524 +/- ##
===========================================
+ Coverage 99.48% 99.50% +0.02%
===========================================
Files 40 40
Lines 1932 2036 +104
===========================================
+ Hits 1922 2026 +104
Misses 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Here is the summary of my comments as we discussed.
To make the VCF format compatible, the original INFO tags should be preserved and a new tag may be added that contains the comma separated information containing the same order as of the merged variants and has the same size as the number of merged variants to be parsed as list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🥇 This is nice work. As I mentioned in the discussion, we should keep the merged file non-redundant i.e. all or most of the information from the variants to be merged should be retained in the merged variant and remove the additional lines representing the original variants that are merged. This will also eliminate the need for keeping the intermediate files and help with the interpretation as well.
Thanks for the review @khurrammaqbool ! I think I have addressed all your comments and feedback : ) Regarding the failing docker containers I will test them all after the PR has been approved, just in case I need to make further changes to the code which will force me to restart the dockerbuilds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 some additional minor comments.
|
Only somalier container failed, I will open a PR to fix it! |
Description
This PR aims to add a post-processing step to TNscope (for standard TGA workflow) to merge SNVs and InDels by their PID to MNVs to be more accurately merged with the VarDict results.
See issue: #1525
The script was taken from Sentieon here: https://github.com/Sentieon/sentieon-scripts/blob/master/merge_mnp/merge_mnp.py
Script has been slightly modified to merge variants despite filters other filters than PASS, this to allow for merging of variants with triallelic-site, and any other future soft-filters we might add. Some refactoring was done as well from the original filter to increase readability. (though it's still very messy)
The TNscope VCF is quality filtered before merging SNVs
To avoid merging of low quality variant to MNVs, the VCF is quality filtered before merging.
Merging SNVs with different filters set
There is an issue about how to consolidate the filters when merging SNVs with different filters set. Such as: germline_risk, in_normal, triallelic_site, and PASS. This was solved with logic that can be exemplified in this table below:
In summary:
On top of this, a few new INFO fields are added to preserve some information from the constituent variants, originally only the FILTER was added but this has been amended with:
With the values joined as comma-separated list for all variants.
Regarding benchmarking the speed of the merge script:
Added
Changed
Documentation
Tests
Feature Tests
Here's an example of the merged INFO field from the sheet above:
DB=.;ECNT=11.0;FS=0.0;HCNT=6.0;MAX_ED=80.0;MIN_ED=0.5;NLOD=175.255;NLODF=39.465;PV=0.40159999999999996;PV2=0.35085;RPA=.;RU=.;SOR=0.911;STR=.;TLOD=7.615;TNSCOPE_MNV_FILTERS=triallelic_site|in_normal,in_normal;TNSCOPE_MNV_NORMAL_ADs=494|10,726|10;TNSCOPE_MNV_NORMAL_AFs=0.0142857,0.013587;TNSCOPE_MNV_TUMOR_ADs=590|10,900|10;TNSCOPE_MNV_TUMOR_AFs=0.0115875,0.010989;TNSCOPE_MNV_VARS=2_158637135_GAAA_G,2_158637139_A_G
The fields added here are:
Pipeline Integrity Tests
.hk
file)Clinical Genomics Stockholm
Documentation
Panel of Normal specific criteria
User Changes
Infrastructure Changes
Validation criteria
Validation criteria to be added to validation report PR: https://github.com/Clinical-Genomics/validations/pull/285
Version specific criteria
In VCF of any TGA case: SNV.somatic.[case-id].tnscope.research.normalised.vcf.gz the following criteria are met:
Important
One of the below checkboxes for validation need to be checked
Checklist
Important
Ensure that all checkboxes below are ticked before merging.
For Developers
For Reviewers
conditions where applicable, with satisfactory results.