Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: stratify produces all-zero columns #221

Closed
ptrebert opened this issue Aug 7, 2024 · 2 comments
Closed

Q: stratify produces all-zero columns #221

ptrebert opened this issue Aug 7, 2024 · 2 comments

Comments

@ptrebert
Copy link

ptrebert commented Aug 7, 2024

Version :
v4.3.0 installed via bioconda

Describe the bug :

Likely not a bug, but a user error, but I don't see where I am going wrong. I am preparing data for a training session and have a limited callset (in essence, only chr11). truvari bench produces output as expected (excerpt from summary.json) ...

{
    "TP-base": 789,
    "TP-comp": 789,
    "FP": 202,
    "FN": 110496,
    "precision": 0.7961654894046418,
   [ ... and so on ...]

but the call

truvari stratify --header --output OUTBED REGIONS OUT-BENCH-DIR

with regions specified as individual chromosomes (i.e., chrNN<TAB>0<TAB>END) results in all-zero columns even for chromosomes that have calls in the bench output vcfs. Here is the head of the stratify output table:

#chrom  start   end     tpbase  tp      fn      fp
chr1    0       248956422       0       0       0       0
chr10   0       133797422       0       0       0       0
chr11   0       135086622       0       0       0       0

and so on.

To Reproduce :
likely not needed

Expected behavior :
stratify to stratify the bench output by reference chromosome

Example Data :
n/a

Thanks for the hint what I need to change to make this work.

+Peter

@ACEnglish
Copy link
Owner

ACEnglish commented Aug 7, 2024

Hello,

This is a UI bug. You can get it to work by adding --within. I'll dig into it more, but pretty sure the current behavior is it is by default finding variant which are not within the boundaries, which in the case of the whole genome should be zero.

$ truvari stratify --header -o example.bed chr1.bed bench
#chrom	start	end	tpbase	tp	fn	fp
chr1	0	248956422	0	0	0	0

$ truvari stratify --within --header -o example.bed chr1.bed bench
#chrom	start	end	tpbase	tp	fn	fp
chr1	0	248956422	1763	1759	38	31

I'll flip that flag's logic so that --within becomes --outside.

Edit: I'm going to rename it to -v / --complement and mimmic grep -v and bedtools intersect -v

ACEnglish added a commit that referenced this issue Aug 7, 2024
`stratify -w` is now default. `-w` is removed. `-v` is the old default.
Or.. stratify by default counts variants within regions and `-v` counts
those outside regions
@ptrebert
Copy link
Author

ptrebert commented Aug 7, 2024

Ha, ok, glad that it wasn't me ;-)
Thanks for reacting so quickly and fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants