-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using gc_extrap on mitochondrial target enrichment data #73
Comments
@eugeniosss I'm not entirely sure if using |
Thank you very much for your answer, it was helpful! Actually when running some tests, I had some problems with overwhelming preseq, so even had to do subsets. I would like to make a second question then, what about exome enriched data? I forgot to mention that I work with ancient DNA. And also most of the times, for the exome enriched samples, we do captures, sequence at low depth, do an evaluation of the reactions and, if the reactions were successful, we do a high depth sequencing. I was thinking preseq could be a cool thing to integrate in our pipeline. This way we could do an evaluation of the low depth captures and see how much more we could sequenced these. Any thoughts? |
The issue with overwhelming preseq emerges when there is saturation and high coverage. I know preseq has been used successfully with exam data using test runs at lower coverage. This has been done with heavy multiplexing so that among a large number of libraries, the majority that "look good" are sequenced deeply, and the rest are re-made. So just like your situation. But the best tool for this job might be the functionality that's only currently in preseqR. I can't remember right now if all the functions from preseqR have been released in the regular preseq. I had done some coding on that, and I just can't remember if I had finished it. Take a look and I'll try to respond in more detail later when I have more time. |
Hello!
Im working with gc_extrap with data has been enriched for mitochondrial genome (MT) and Im unsure what is the best way to proceed, or if is even adequate to use preseq with enriched data. I have bam files that have been generated by mapping to the full genome (i.e. nuclear + mitochondrial). So firstly, what Im doing:
Secondly, if I can do this or if there is any other way possible to run gc_extrap with enriched data, can I transform the output to be mean coverage across the mitochondria and number of raw sequenced reads, instead of total number of unique base pairs and total number of sequenced base pairs. I was trying to divide the number of unique base pairs by the size of the mitochondrial genome and the total number of sequenced base pairs by the reads length. However, for one sample its coverage is suppose to saturate at ~2 coverage. However, I have ~110 coverage, empirically calculated with GATK...
Am I doing something wrong, or am I unable to gc_extrap on my data?
The text was updated successfully, but these errors were encountered: