Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect rounding of methylation fractions #7

Closed
chrisamiller opened this issue Oct 14, 2020 · 6 comments
Closed

Incorrect rounding of methylation fractions #7

chrisamiller opened this issue Oct 14, 2020 · 6 comments
Assignees

Comments

@chrisamiller
Copy link

chrisamiller commented Oct 14, 2020

When running pileup to identify methylated CpGs, the data seems to be incorrectly rounded.

Output produced by pileup:

1   3001007 .   C   .   18  PASS    NS=1;CX=CG;N5=TTCGG GT:GL1:GQ:DP:SP:CV:BT   0/0:-1,-5,-30:18:6:C6:3:1.00
1   3001008 .   G   .   18  PASS    NS=1;CX=CG;N5=ACCGA GT:GL1:GQ:DP:SP:CV:BT   0/0:-1,-5,-30:18:6:G5R1:3:0.67

Output produced by vcf2bed

1   3001006 3001008 0.835   6   C:1.000:3,G:0.670:3

There are 6 reads here, 3 on the C (all methylated) and 3 on the G (with 2 methylated). So the first error comes with the incorrect rounding on each base: The G should have a beta value of 0.667, not 0.670. Looks like the code is rounding to two places, but outputting three. The second error comes when that fraction is propagated: There are 5 methylated reads out of 6, which should give a value of 0.833, but in this case, ends up with a value of 0.835.

The corrected line should look like.

1   3001006 3001008 0.833   6   C:1.000:3,G:0.667:3

Two fixes need to be made for this error:

  1. do the rounding correctly for each base
  2. recalculate the overall beta score from the counts, instead of from the rounded values to reduce error propagation.

This was observed using biscuit version 0.3.8, running via docker container mgibio/biscuit:0.3.8, which just adds a couple of tools on top of the zhouwanding/biscuit_v0.3.8 image.

Pileup was produced with:
biscuit pileup -q 4 -w pileup_stats.txt $reference_fasta $bam

Bed was produced with:
/usr/bin/biscuit vcf2bed -t cg -k 1 -e $vcf | /usr/bin/biscuit mergecg $reference /dev/stdin -k 2

Thanks!

@jamorrison
Copy link

Hi @chrisamiller,

Thanks for your issue submission. Can you try using the most recent version of BISCUIT (0.3.16)?

If you continue to see the incorrect rounding, let me know and I will fix it. Also, if you can include a small test case (a BAM of the reads in question will suffice), that will help in getting a fix out more quickly.

Thanks,
Jacob

@abelhj
Copy link

abelhj commented Oct 16, 2020

Hi @jamorrison,

We tested with the newest version (0.3.16) and obtained the same rounding error.

version=biscuit_0_3_16_linux_amd64
biscuit=../bin/$version
$biscuit pileup -w pileup_stats.txt $REF $BAM | bgzip -c > test.$version.vcf.gz
$biscuit vcf2bed -t cg -k 1 -e test.vcf.gz | $biscuit mergecg $REF /dev/stdin -k 2 | gzip > test.$version.cpgs.bed.gz

A small test bam can be found here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/bstest.bam

Thanks for looking in to this.

@jamorrison
Copy link

Hi @abelhj,

Thanks for checking the newest version. I will fix the rounding error.

Can you point me to the reference used in pileup and mergecg? As no index is needed, the website where the FASTA file was retrieved from is enough - I can get it from there.

Thanks!
Jacob

@chrisamiller
Copy link
Author

@jamorrison
Copy link

Hi @chrisamiller and @abelhj,

I have fixed the rounding error that you were seeing. The next release will break backwards compatibility (see notes on Version 1.0.0 Alpha), but the fixes will be included in that release when it arrives.

In the meantime, here is a zip file containing the updated source files for Version 0.3.16: issue_7_fixes.zip. After unzipping the file, replace the current files (src/pileup.c and src/mergecg.c) with the new ones and recompile. You should be good once that's been done.

Cheers,
Jacob

@chrisamiller
Copy link
Author

Appreciate the quick response, and looking forward to the new 1.0 version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants