-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratio is the log-ratio of the likelihood between the most likely copy number and the second likely copy number. I'm still trying to optimize t1k-copynumber.py, so please interpret its result with caution. #12
Comments
Could you further explain how the calculation of the ratio in t1k-copynumber works? I can see that value would be very helpful for me to discriminate between false positives and true positives in KIR2DL5. |
By the way, is possible to extract the KIR2DL5 A/B reference reads that match with my sequenced reads? |
Thank you. For the copynumber script, it applies a square-root transform of the abundance values (FPK), and then fit a normal distribution to model the single-copy allele distributions. Since the normal distribution is additive, we can use the parameter from the single-copy allele to calculate the distribution for two-copy, three-copy,... until ten-copy. We can calculate 10 likelihood values from each copy number distribution for an allele's abundance. The log-likelihood ratio is based on the best likelihood value and the second best likelihood value.
Do you mean you want to know which reads are assigned to 2DL5? |
Thank you very much for resolving the first question! Regarding the second one, yes, I would like to know which reference sequences my reads align to for KIR2DL5, and what these reads are. I believe the issue I'm having with false positives for KIR2DL5 is the generation of nonspecific reads in my sequencing. Therefore, I want to compare the regions of the reference sequences to which reads from truly positive and negative samples for KIR2DL5 align, and be able to modify the reference based on this. |
I just added the option "--outputReadAssignment" to the github repo, which will output the allele assignment to the {prefix}_assign.tsv file. Each row is one assignment, with the format of read_id allele_id allele_start allele_end. Will this help? |
Thank you very much, I will try this option now! Will keep informed |
We don't remove the duplicated reads. The duplicated reads will contribute to the allele abundance estimation (or other type of allele score in other HLA genotypers), therefore it is expected that the deduplication will affect the genotyping results. Hope this helps.
Originally posted by @mourisl in #11 (comment)
The text was updated successfully, but these errors were encountered: