-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LiftOver for summary statistics #34
base: master
Are you sure you want to change the base?
Conversation
Functions are included under `mungesumstats.jl`. Consists of two user facing functions: `readchain` to read a chain file for liftover, and `liftover_sumstats!` to liftover a munged summary statistics DataFrame. To do includes handling the edge case where the target strand in the chain file has a negative strand and adding examples into the documentation.
`liftover_gwas!` is now `liftover_sumstats` to be in line with `mungesumstats!`. Additionally, `liftover_gwas!` can now be applied to AbstractVector{<:AbstractDataFrame}, in line with `mungesumstats` behavior. `parsechain` drops the "chr" prefix to be in line with `mungesumstats!` behavior. Function parameter `echain` renamed to `chain` (originally named `echain` for "expanded chain", but was needlessly verbose since we never use the output from `parsechain`). Documentation for the liftover functions added to the end of the summary statistics tutorial and mentioned in the GENCODE GTF parsing tutorial. Tests updated to test liftover functionality over a vector.
All the other functions omit underscores; rename to match style.
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #34 +/- ##
==========================================
+ Coverage 60.38% 64.19% +3.81%
==========================================
Files 12 12
Lines 982 1472 +490
==========================================
+ Hits 593 945 +352
- Misses 389 527 +138
☔ View full report in Codecov by Sentry. |
Rewrote `findnewcoords` to sort and then iterate through the summary statistics in order to liftover more efficiently. `readchain` also sorts the chain file now for that same purpose. This replaces the slower old code which used DataFrames.subset to find matching regions between builds. Also, added a bug fix to `readchain` where the ending coordinate was included: the chain format uses half-open intervals.
Rewrote the code for liftover to replace the calls to DataFrame.subset when finding the matching region in the chain file. Processes 100,000+ variants a second now. |
Moved liftover code from mungesumstats.jl to liftover.jl. Added some more error catching code (mainly for when lifted over indels no longer match the reference).
Added code for normalizing GWAS to the reference build and renaming the SNPs with rsIDs from a VCF file. Also added to liftover functionality to liftover indels as well as SNPs. |
Edits still needed:
|
Performs liftover on munged summary statistics. Added in documentation and tests for the functions.
New functions: