Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: create tnscope mnvs #1524

Merged
merged 124 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
0862377
Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …
mathiasbio Nov 29, 2024
ea4b5ce
rework bcftools filter
mathiasbio Nov 29, 2024
f60adac
black
mathiasbio Nov 29, 2024
3874a90
changelog
mathiasbio Nov 29, 2024
6b3fe6e
fix
mathiasbio Nov 29, 2024
e96a980
fix
mathiasbio Nov 29, 2024
68aeae1
new filter
mathiasbio Dec 10, 2024
24fd9a1
convert to list
mathiasbio Dec 10, 2024
13fab22
major refactor of filter logic
mathiasbio Dec 16, 2024
c66a66e
bug fixes
mathiasbio Dec 16, 2024
72cfbe1
fix
mathiasbio Dec 16, 2024
50fbc0a
fix
mathiasbio Dec 16, 2024
61aa1ab
test
mathiasbio Dec 16, 2024
28de8d4
fix
mathiasbio Dec 16, 2024
f3f5cb2
fix bug
mathiasbio Dec 16, 2024
3095fd9
fix
mathiasbio Dec 17, 2024
b2c62c6
black
mathiasbio Dec 17, 2024
d435039
update comments
mathiasbio Dec 17, 2024
eb46385
fix clinical and research filters
mathiasbio Dec 17, 2024
7c3a484
fix vardict tn hardfilters
mathiasbio Dec 18, 2024
0456b05
update vardict hardfilters
mathiasbio Dec 18, 2024
f29b271
remove pass and triallelic site prefilters
mathiasbio Dec 18, 2024
84dd929
fix bug
mathiasbio Dec 18, 2024
f63a41a
add sentieon script
mathiasbio Jan 17, 2025
7108a33
add new rule
mathiasbio Jan 20, 2025
82857a4
add license
mathiasbio Jan 20, 2025
9d30b8f
format black sentieon script
mathiasbio Jan 20, 2025
7efbf41
test
mathiasbio Jan 20, 2025
7698e73
update
mathiasbio Jan 21, 2025
8469b0d
change script
mathiasbio Jan 21, 2025
3b8aede
fix
mathiasbio Jan 21, 2025
4d436d9
fix
mathiasbio Jan 21, 2025
b056a11
return MERGED filter
mathiasbio Jan 21, 2025
55edd9b
test
mathiasbio Jan 21, 2025
bdd3710
fix
mathiasbio Jan 21, 2025
14c134d
test
mathiasbio Jan 21, 2025
4707d16
fix
mathiasbio Jan 21, 2025
654cb98
fix
mathiasbio Jan 21, 2025
3a65a29
test
mathiasbio Jan 21, 2025
54dd105
test
mathiasbio Jan 21, 2025
1e7d7ce
fix
mathiasbio Jan 21, 2025
2372e2d
set merged
mathiasbio Jan 21, 2025
3c9dfc2
fix
mathiasbio Jan 21, 2025
f51638a
refactor
mathiasbio Jan 21, 2025
5d76ac0
test
mathiasbio Jan 21, 2025
d2ee89a
fix
mathiasbio Jan 21, 2025
cdba200
fix typehints
mathiasbio Jan 21, 2025
984f791
black etc
mathiasbio Jan 21, 2025
4c0ecab
refactor
mathiasbio Jan 21, 2025
d407e63
black
mathiasbio Jan 21, 2025
5dc899b
change script
mathiasbio Jan 21, 2025
654e996
change raw delivery vcf file for UMI and standard tga
mathiasbio Jan 21, 2025
805ff7e
changelog
mathiasbio Jan 21, 2025
8056694
refactor
mathiasbio Jan 21, 2025
ab8cc0e
start refactor
mathiasbio Jan 23, 2025
98fcee4
refactor continues
mathiasbio Jan 24, 2025
72ab2dc
propogate refactor to rules
mathiasbio Jan 24, 2025
a5b0247
fix model
mathiasbio Jan 24, 2025
25e97c7
bug fix
mathiasbio Jan 24, 2025
b30b667
clean up old code
mathiasbio Jan 24, 2025
41025e1
refactor constant
mathiasbio Jan 24, 2025
15a9d1a
add BioinfoTools constant to balsamic smk
mathiasbio Jan 24, 2025
229e763
remove umi param from general tga
mathiasbio Jan 24, 2025
2c64e95
removing unussed analysis workflow param
mathiasbio Jan 24, 2025
7edc4d6
refactor
mathiasbio Jan 24, 2025
8975592
Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …
mathiasbio Jan 24, 2025
930d0ad
merge develop
mathiasbio Jan 24, 2025
b6632a9
fix
mathiasbio Jan 24, 2025
8636569
fix
mathiasbio Jan 24, 2025
39fcb05
fix clinical hard filters
mathiasbio Jan 27, 2025
c778a16
fix soft filter
mathiasbio Jan 27, 2025
8164e8e
refactor
mathiasbio Jan 27, 2025
aff2b8c
refactor names
mathiasbio Jan 27, 2025
6b5c1ce
fix pytest
mathiasbio Jan 27, 2025
2c4cc23
add new filter
mathiasbio Jan 27, 2025
c7b301e
add pytests
mathiasbio Jan 27, 2025
a598574
update test names
mathiasbio Jan 27, 2025
ff86e7d
remove unused function
mathiasbio Jan 27, 2025
4cc420e
black
mathiasbio Jan 27, 2025
c47fbd2
add new vardict filter
mathiasbio Jan 28, 2025
dff2a39
corrected vardict strandbias to be general
mathiasbio Jan 28, 2025
d66e4a2
add filter and mark it will be ignored
mathiasbio Jan 28, 2025
2d66075
remove filters that will be removed
mathiasbio Jan 28, 2025
85b2a20
updated docs
mathiasbio Jan 28, 2025
8490657
fix pytest
mathiasbio Jan 28, 2025
7e49b70
change logic of mnv filters
mathiasbio Jan 29, 2025
01fae23
Merge branch 'disable_normal_hardfilter' of github.com:Clinical-Genom…
mathiasbio Jan 29, 2025
a2390ae
merge with softfilter branch
mathiasbio Jan 29, 2025
44b11e9
fix filters
mathiasbio Jan 29, 2025
064a7b1
unpack set
mathiasbio Jan 29, 2025
b17c384
rename filter
mathiasbio Jan 29, 2025
8f0d7a2
change order of mnv processing
mathiasbio Jan 29, 2025
8dcc3c5
fix bug
mathiasbio Jan 29, 2025
ac6fe6f
fix bug
mathiasbio Jan 29, 2025
d2de191
add hard filter of MERGED variants
mathiasbio Jan 30, 2025
d9b2ff7
black
mathiasbio Jan 30, 2025
c15c8c9
add to docs
mathiasbio Jan 30, 2025
b743d4d
fix filter
mathiasbio Jan 30, 2025
1282707
convert filter to infofield
mathiasbio Jan 31, 2025
6dbc9a3
bug fix
mathiasbio Jan 31, 2025
32844de
clean up
mathiasbio Jan 31, 2025
04bf774
try to fix
mathiasbio Jan 31, 2025
40895fd
testing
mathiasbio Jan 31, 2025
c05036b
change filter
mathiasbio Jan 31, 2025
d5de252
Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …
mathiasbio Jan 31, 2025
ed2ea13
merge develop
mathiasbio Jan 31, 2025
e5c6e2e
fix
mathiasbio Feb 3, 2025
6a72ac2
fix
mathiasbio Feb 4, 2025
3c85260
refactor
mathiasbio Feb 5, 2025
8f628cb
fix
mathiasbio Feb 5, 2025
5f9a87f
fix
mathiasbio Feb 5, 2025
1776d41
fix
mathiasbio Feb 5, 2025
ac85443
fix tumor and normal af
mathiasbio Feb 5, 2025
520b08a
fix
mathiasbio Feb 5, 2025
8d9f6b3
fix
mathiasbio Feb 5, 2025
e3f5fa8
update docs
mathiasbio Feb 5, 2025
9919a55
remove file
mathiasbio Feb 5, 2025
2b71f38
remove file
mathiasbio Feb 5, 2025
76fcfbb
black
mathiasbio Feb 5, 2025
04d51fd
Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …
mathiasbio Feb 5, 2025
b73d372
merge develop
mathiasbio Feb 5, 2025
ca88e5e
refactor names
mathiasbio Feb 5, 2025
02c1a28
fix
mathiasbio Feb 5, 2025
9d7d71e
round
mathiasbio Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
537 changes: 537 additions & 0 deletions BALSAMIC/assets/scripts/merge_mnp.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion BALSAMIC/constants/cluster_analysis.json
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@
"time": "01:00:00",
"n": 4
},
"modify_tnscope_infofield": {
"post_process_tnscope": {
"time": "01:00:00",
"n": 4
},
Expand Down
5 changes: 3 additions & 2 deletions BALSAMIC/constants/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,9 @@
"vep_annotate_germlineVAR_tumor",
"vep_annotate_germlineVAR_normal",
# SNVs
"modify_tnscope_infofield",
"modify_tnscope_infofield_umi",
"bcftools_split_tnscope_variants",
"sentieon_tnscope_umi",
"sentieon_tnscope_umi_tn",
"gatk_update_vcf_sequence_dictionary",
"bcftools_filter_tnscope_clinical_tumor_only",
"bcftools_filter_tnscope_clinical_tumor_normal",
Expand Down
3 changes: 3 additions & 0 deletions BALSAMIC/constants/variant_filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,9 @@ class WgsSNVFilters(BaseSNVFilters):

class TgaSNVFilters(BaseSNVFilters):
research = [
VCFFilter(
filter_name="MERGED", Description="SNV Merged with neighboring variants"
),
VCFFilter(tag_value=0.01, filter_name="SWEGENAF", field="INFO"),
VCFFilter(tag_value=0.005, filter_name="balsamic_high_pop_freq", field="INFO"),
]
Expand Down
2 changes: 1 addition & 1 deletion BALSAMIC/constants/workflow_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
},
}

SLEEP_BEFORE_START = 600
SLEEP_BEFORE_START = 800

WORKFLOW_PARAMS = {
"bam_post_processing": {
Expand Down
1 change: 1 addition & 0 deletions BALSAMIC/containers/varcall_py3/varcall_py3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ dependencies:
- pthread-stubs=0.4
- pycosat=0.6.4
- pycparser=2.20
- pyfaidx=0.8.1.3
- pyopenssl=20.0.1
- pysam=0.19.1
- pysocks=1.7.1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@

rule modify_tnscope_infofield_umi:
input:
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.preprocess.vcf.gz",
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
output:
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
vcf_tnscope_umi = vcf_dir + "sentieon_tnscope/SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.post_process.vcf.gz",
benchmark:
Path(benchmark_dir,'modify_tnscope_infofield_umi_' + config[ "analysis" ][ "case_id" ] + ".tsv").as_posix()
singularity:
Path(singularity_image, config["bioinfo_tools"].get("bcftools") + ".sif").as_posix()
params:
housekeeper_id = {"id": config["analysis"]["case_id"], "tags": "research"},
modify_tnscope_infofield = get_script_path("modify_tnscope_infofield.py"),
tmpdir = tempfile.mkdtemp(prefix=tmp_dir),
case_name = config["analysis"]["case_id"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ rule sentieon_tnscope_umi:
bed = config["panel"]["capture_kit"],
dbsnp = config["reference"]["dbsnp"]
output:
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.preprocess.vcf.gz",
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
namemap = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.sample_name_map"
benchmark:
Path(benchmark_dir, "sentieon_tnscope_umi_" + config["analysis"]["case_id"] + ".tsv").as_posix()
params:
tmpdir = tempfile.mkdtemp(prefix=tmp_dir),
housekeeper_id = {"id": config["analysis"]["case_id"],"tags": "research"},
sentieon_exec = config_model.sentieon.sentieon_exec,
sentieon_lic = config_model.sentieon.sentieon_license,
tumor_af = params.tnscope_umi.filter_tumor_af,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ rule sentieon_tnscope_umi_tn:
bed = config["panel"]["capture_kit"],
dbsnp = config["reference"]["dbsnp"]
output:
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.preprocess.vcf.gz",
vcf_tnscope_umi = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
namemap = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.sample_name_map"
benchmark:
Path(benchmark_dir, "sentieon_tnscope_umi_" + config["analysis"]["case_id"] + ".tsv").as_posix()
params:
tmpdir = tempfile.mkdtemp(prefix=tmp_dir),
housekeeper_id = {"id": config["analysis"]["case_id"],"tags": "research"},
sentieon_exec = config_model.sentieon.sentieon_exec,
sentieon_lic = config_model.sentieon.sentieon_license,
tumor_af = params.tnscope_umi.filter_tumor_af,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ if config["analysis"]["sequencing_type"] == 'targeted' and config["analysis"]["a
input:
vcf = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.vcf.gz",
output:
vcf_filtered = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.vcf.gz",
vcf_filtered = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.pre_process.vcf.gz",
benchmark:
Path(benchmark_dir,'bcftools_quality_filter_vardict_tumor_only_' + config["analysis"]["case_id"] + ".tsv").as_posix()
singularity:
Expand Down Expand Up @@ -155,7 +155,7 @@ elif config["analysis"]["sequencing_type"] == 'targeted' and config["analysis"][
input:
vcf = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.vcf.gz",
output:
vcf_filtered = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.vcf.gz",
vcf_filtered = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.pre_process.vcf.gz",
benchmark:
Path(benchmark_dir,'bcftools_quality_filter_vardict_tumor_normal_' + config["analysis"]["case_id"] + ".tsv").as_posix()
singularity:
Expand Down Expand Up @@ -223,7 +223,7 @@ elif config["analysis"]["sequencing_type"] == 'targeted' and config["analysis"][
if config_model.analysis.analysis_workflow == AnalysisWorkflow.BALSAMIC_UMI and config["analysis"]["analysis_type"] == 'paired':
rule bcftools_quality_filter_TNscope_umi_tumor_normal:
input:
vcf = vcf_dir + "SNV.somatic."+ config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
vcf = vcf_dir + "sentieon_tnscope/SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.post_process.vcf.gz",
output:
vcf_filtered = vcf_dir + "SNV.somatic."+ config["analysis"]["case_id"] + ".tnscope_umi.research.vcf.gz",
benchmark:
Expand Down Expand Up @@ -251,9 +251,9 @@ if config_model.analysis.analysis_workflow == AnalysisWorkflow.BALSAMIC_UMI and
elif config_model.analysis.analysis_workflow == AnalysisWorkflow.BALSAMIC_UMI and config["analysis"]["analysis_type"] == 'single':
rule bcftools_quality_filter_TNscope_umi_tumor_only:
input:
vcf=vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.vcf.gz",
vcf = vcf_dir + "sentieon_tnscope/SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.post_process.vcf.gz",
output:
vcf_filtered=vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope_umi.research.vcf.gz"
vcf_filtered = vcf_dir + "SNV.somatic."+ config["analysis"]["case_id"] + ".tnscope_umi.research.vcf.gz"
benchmark:
Path(benchmark_dir,'bcftools_quality_filter_TNscope_umi_tumor_only' + config["analysis"][
"case_id"] + ".tsv").as_posix()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@

rule bcftools_split_tnscope_variants:
input:
ref = config["reference"]["reference_genome"],
vcf = vcf_dir + "sentieon_tnscope/ALL.somatic." + config["analysis"]["case_id"] + ".tnscope.vcf.gz",
output:
vcf_tnscope = vcf_dir + "sentieon_tnscope/SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.preprocess.vcf",
vcf_tnscope = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.vcf.gz",
vcf_tnscope_sv = vcf_dir + "SV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.vcf.gz",
benchmark:
Path(benchmark_dir,'bcftools_split_tnscope_variants_' + config[ "analysis" ][ "case_id" ] + ".tsv").as_posix()
singularity:
Path(singularity_image, config["bioinfo_tools"].get("bcftools") + ".sif").as_posix()
params:
housekeeper_id = {"id": config["analysis"]["case_id"], "tags": "research"},
tmpdir = tempfile.mkdtemp(prefix=tmp_dir),
case_name = config["analysis"]["case_id"]
threads:
Expand All @@ -26,40 +26,52 @@ rule bcftools_split_tnscope_variants:
export TMPDIR={params.tmpdir};
mkdir -p {params.tmpdir};

bcftools view --include 'INFO/SVTYPE=="."' -o {output.vcf_tnscope} {input.vcf} ;
bcftools view --include 'INFO/SVTYPE=="."' -O z -o {output.vcf_tnscope} {input.vcf} ;
bcftools view --include 'INFO/SVTYPE!="."' -O z -o {output.vcf_tnscope_sv} {input.vcf};
tabix -p vcf -f {output.vcf_tnscope_sv};
tabix -p vcf -f {output.vcf_tnscope};
"""

rule modify_tnscope_infofield:
rule post_process_tnscope:
input:
vcf_tnscope = vcf_dir + "sentieon_tnscope/SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.preprocess.vcf",
vcf_tnscope = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.pre_process.vcf.gz",
ref = config["reference"]["reference_genome"],
output:
vcf_tnscope = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.vcf.gz",
vcf_tnscope = vcf_dir + "SNV.somatic." + config["analysis"]["case_id"] + ".tnscope.research.vcf.gz",
benchmark:
Path(benchmark_dir,'modify_tnscope_infofield_' + config[ "analysis" ][ "case_id" ] + ".tsv").as_posix()
Path(benchmark_dir,'post_process_tnscope_' + config[ "analysis" ][ "case_id" ] + ".tsv").as_posix()
singularity:
Path(singularity_image, config["bioinfo_tools"].get("bcftools") + ".sif").as_posix()
params:
housekeeper_id = {"id": config["analysis"]["case_id"], "tags": "research"},
merge_mnvs = get_script_path("merge_mnp.py"),
modify_tnscope_infofield = get_script_path("modify_tnscope_infofield.py"),
edit_vcf_script= get_script_path("edit_vcf_info.py"),
tmpdir = tempfile.mkdtemp(prefix=tmp_dir),
case_name = config["analysis"]["case_id"],
edit_vcf_script = get_script_path("edit_vcf_info.py"),
sentieon_exec = config_model.sentieon.sentieon_exec,
sentieon_lic = config_model.sentieon.sentieon_license,
matched_normal_filternames = ",".join(BaseSNVFilters.MATCHED_NORMAL_FILTER_NAMES),
variant_caller= "tnscope"
threads:
get_threads(cluster_config, 'modify_tnscope_infofield')
get_threads(cluster_config, 'post_process_tnscope')
message:
"Add DP and AF tumor sample info and FOUND_IN to INFO field for case: {params.case_name}"
"Merge TNscope SNVs with same phaseID to MNVs."
"Add DP and AF tumor sample info and FOUND_IN to INFO field: {params.case_name}"
shell:
"""
export TMPDIR={params.tmpdir};
mkdir -p {params.tmpdir};
export SENTIEON_TMPDIR={params.tmpdir};
export SENTIEON_LICENSE={params.sentieon_lic};

{params.sentieon_exec} pyexec {params.merge_mnvs} --preserve_filters {params.matched_normal_filternames} --max_distance 5 {input.vcf_tnscope} {input.ref} > {params.tmpdir}/tnscope.research.mnv.vcf ;

python {params.modify_tnscope_infofield} {params.tmpdir}/tnscope.research.mnv.vcf {params.tmpdir}/tnscope.research.mnv.add_info_fields.vcf ;

python {params.modify_tnscope_infofield} {input.vcf_tnscope} {params.tmpdir}/vcf_tnscope_snvs_modified.vcf ;
python {params.edit_vcf_script} -i {params.tmpdir}/vcf_tnscope_snvs_modified.vcf -o {params.tmpdir}/vcf_tnscope_snvs_modified_found_in_added.vcf -c {params.variant_caller};
bgzip {params.tmpdir}/vcf_tnscope_snvs_modified_found_in_added.vcf ;
mv {params.tmpdir}/vcf_tnscope_snvs_modified_found_in_added.vcf.gz {output.vcf_tnscope} ;
python {params.edit_vcf_script} -i {params.tmpdir}/tnscope.research.mnv.add_info_fields.vcf -o {params.tmpdir}/tnscope.research.mnv.add_info_fields.added_found_in.vcf -c {params.variant_caller};

bgzip {params.tmpdir}/tnscope.research.mnv.add_info_fields.added_found_in.vcf ;

mv {params.tmpdir}/tnscope.research.mnv.add_info_fields.added_found_in.vcf.gz {output.vcf_tnscope} ;
tabix -p vcf -f {output.vcf_tnscope} ;
"""


1 change: 1 addition & 0 deletions BALSAMIC/workflows/balsamic.smk
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ from BALSAMIC.constants.analysis import (
from BALSAMIC.constants.paths import BALSAMIC_DIR
from BALSAMIC.constants.rules import SNAKEMAKE_RULES
from BALSAMIC.constants.variant_filters import (
BaseSNVFilters,
SVDB_FILTER_SETTINGS,
MANTA_FILTER_SETTINGS,
WgsSNVFilters,
Expand Down
18 changes: 11 additions & 7 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,32 @@
[X.X.X]
-------
--------

Added:
^^^^^^
* Added option to disable hard filter of variants in matched normal https://github.com/Clinical-Genomics/BALSAMIC/pull/1509
* Added check to verify sample sex for all workflows https://github.com/Clinical-Genomics/BALSAMIC/pull/1516


Changed:
^^^^^^^^
* Reworked bcftools filters https://github.com/Clinical-Genomics/BALSAMIC/pull/1509
* Renamed high_normal_tumor_af_frac to in_normal https://github.com/Clinical-Genomics/BALSAMIC/pull/1509
* check to verify sample sex for all workflows https://github.com/Clinical-Genomics/BALSAMIC/pull/1516
* Merging SNVs into MNVs in TNscope TGA https://github.com/Clinical-Genomics/BALSAMIC/pull/1524
* Change raw delivery SNV file for TGA to before any post-processing https://github.com/Clinical-Genomics/BALSAMIC/pull/1524


Removed:
^^^^^^^^
* Remove WGS-level GC-bias metric from TGA workflow https://github.com/Clinical-Genomics/BALSAMIC/pull/1521


Changed:
^^^^^^^^
* Reworked bcftools filters https://github.com/Clinical-Genomics/BALSAMIC/pull/1509
* Renamed high_normal_tumor_af_frac to in_normal https://github.com/Clinical-Genomics/BALSAMIC/pull/1509
* check to verify sample sex for all workflows https://github.com/Clinical-Genomics/BALSAMIC/pull/1516
Fixed:
^^^^^^


[16.0.0]
-------
--------

Added:
^^^^^^
Expand Down
Binary file removed docs/.DS_Store
Binary file not shown.
28 changes: 28 additions & 0 deletions docs/balsamic_filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ The `TNscope <https://www.biorxiv.org/content/10.1101/250647v1.abstract>`_ algor
*min_base_qual*: Minimal base quality to consider in calling

::

min_base_qual = 15

*min_tumor_allele_frac*: Set the minimum tumor AF to be considered as potential variant site.
Expand Down Expand Up @@ -313,6 +314,33 @@ The `TNscope <https://www.biorxiv.org/content/10.1101/250647v1.abstract>`_ algor
marks variant with soft-filter `in_normal` variant if: AF(normal) / AF(tumor) > 0.3


**Post-processing of TNscope variants**

After quality-filtering TNscope variants and before merging with VarDict variants the phased SNVs and InDels from TNscope are merged together to MNVs using a slightly modified script from `Sentieon-scripts <https://github.com/Sentieon/sentieon-scripts/blob/master/merge_mnp/merge_mnp.py>`_ which can be found in ``BALSAMIC/assets/scripts/merge_mnp.py``

This was done to avoid multiple representations of the same variant as VarDict already outputs these types of variants as MNVs, and because VEP isn't coded to handle phased SNVs in the interpretation of protein effect.

In the merging of phased SNVs to MNV we need to handle how to consolidate information from multiple variants into a single metric, and importantly also for the FILTER column.

An example is a MNV created by merging a phased germline SNV with a somatic SNV. This has been solved as follows:

- `MNV_CONFLICTING_FILTERS`: Is a filter given to MNVs with constituent variants with different filters (such as `in_normal` and `PASS`)

.. note::

However, as we may have multiple filters which means similar things, such as germline_risk and in_normal, MNVs constituted by variants with only these filters set aren't exactly "conflicting".

Therefore the logic for setting `MNV_CONFLICTING_FILTERS` has been made a bit more complex, and in summary there are 3 possible outcomes for filters when merging SNVs/InDels into MNVs:

1. Single filter such as PASS, when all constituting variants all have the same filter and no other.
2. Multiple filters, such as in_normal,germline_risk, when all constituting variants have at least 1 of the matched normal filters.
3. `MNV_CONFLICTING_FILTERS` when the merged variants have conflicting filters, and they don't all contain matched normal filters.

.. note::

In addition to this a few more fields are added to the INFO field of the created MNVs containing comma-separated lists of AD, AF, and FILTER from its constituting variants.


**Post-call Observation database Filters**


Expand Down
2 changes: 1 addition & 1 deletion docs/balsamic_pon.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Currently two PON-methods are implemented in BALSAMIC to correct for biases and
- To produce normalised CN-profiles for WGS cases visualised in ``GENS``.

Sharing PON for publications
======================
============================

If a PON has been used for the analysis of samples in a research project and a publication requires that the PON is uploaded to some database, a request can be made to Clinical Genomics, and depending on the status of the consent of the individuals from which the samples used in the construction of the PON has been derived it may or may not be possible to share the PON.

Expand Down
2 changes: 1 addition & 1 deletion docs/balsamic_sv_cnv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ It is mandatory to provide the gender of the sample from BALSAMIC version >= 10.
Further details about a specific caller can be found in the links for the repositories containing the documentation for SV and CNV callers along with the links for the articles are listed in `bioinfo softwares <https://balsamic.readthedocs.io/en/latest/bioinfo_softwares.html>`_.

**Difficult to detect clinically relevant SVs**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**IGH::DUX4 rearrangements**

Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Short tutorial
Here a short tutorial is provided for BALSAMIC (**version** = 16.0.0).

Regarding fastq-inputs
---------------------
---------------------------

Previous versions of BALSAMIC only accepted one fastq-pair per sample, which required concatenation of fastq-pairs if multiple existed.

Expand Down
Loading