-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine all VCFs across different variants caller, such as mutect2, strelka2 #738
Comments
So do I |
What is the right tool for this? https://github.com/nf-core/modules/tree/master/modules/nf-core/bcftools/concat |
@Githubguanxudong could you elaborate a bit more? We are trying to tackle this during the hackathon. Which tools did you have in mind? Do you have some papers/examples on which level should be merged? |
As far as I could tell, we need to run https://samtools.github.io/bcftools/bcftools.html If one doesn't use When running without
but the tools still produces an output-file (which is seemingly missing some variants from the input-files!?). It is not clear to me what the option |
@FriederikeHanssen and @maxulysse : The vcf-file resulting from the concatenation may contain the same variant several times:
Such duplicates could perhaps be removed by the option Also, the vcf-file is not sorted. Should it perhaps be sorted? |
Hi @Githubguanxudong ! How do you usually deal with the above? :) |
@FriederikeHanssen |
We've decided that - for now - we just do concatenation of germline vcf-files. |
@maxulysse @FriederikeHanssen @amasplund : I'll run |
So here I ran deepvariant, mpileup, strelka and freebayes, and I get the same variant called three times:
I wonder if that is what the users would want? 🤔 |
So this is just the germline indels? |
yeah, is the meta info also the same across all? |
Not sure I understand what you mean 🤔 The different variant-callers produces vcf-files with different INFO- and FORMAT-fields. |
yeah maybe this info is interesting to keep? Not sure if we want to just throw it out to only keep one line per variant or if there is a way to merge these fields |
Here is the concatenated (and sorted) vcf-file resulting from the following cmd:
It is basically the result of running |
yes sounds good. can always add more later |
Just for the record: the cnvkit doesn't produce any vcf-files, so the concatenated vcf-file doesn't contain variants from the cnvkit. |
Some comments:
|
@amasplund : Thanks for your feedback. It is much appreciated. The vcf-file
I could imagine that it would be useful to have an INFO-field for each variant stating which variant-caller (or perhaps more realistically which vcf-file) found the given variant. Something like, for instance, |
@asp8200 Can you ad "VC=strelka" manually before the concat? Or is that to hacky?? :) |
@asp8200 Maybe the best is to send the example to some "real" interpreters and see what they think :). With the risk of getting divergent replies :). |
@asp8200 you could also ping the sarek community on slack to get some input. But as @amasplund mentioned quite possible we get some divergent opinions there, but that is always the case 😆 I like the idea of keeping all this INFO. Better a little more meta info than to little and to see how all of them compare this would be quite useful. |
A suggestion for CNVkit (our users like it a lot), which is more work demanding though, is to have a CNVkit option where you add a reference .cnn file. |
@amasplund a bit of topic here: but with the newest sarek release you can submit your own reference.cnn. In addition, Maxime has just proposed a new workflow to create all these cohort-of-normal like references, like: pon for mutect2, one for msisensorpro and cnvkit |
Coming back to the point about adding an INFO-field to the vcf-files before concatenating them: @amasplund come up with a nice, little bash+awk-script for adding some "constant" INFO-field to a vcf-file. How would Sarek run that script before doing the concatenation? As far as I can tell, there are two options:
Neither of those solutions seem ideal. Anybody else have any suggestions on how to do this? |
Closed by #792 |
Description of feature
We usually combine multiple software results in tumor variants calling, so I think it is necessary to use appropriate methods to combine them, such as union and consistency. I am looking forward to this feature
The text was updated successfully, but these errors were encountered: