-
Notifications
You must be signed in to change notification settings - Fork 2
Process Plink genotyping files in Python
License
iainrb/plinktools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Plinktools: Python code to process Plink genotyping files Author: Iain Bancarz, [email protected] 1. INTRODUCTION Plinktools is a Python package for processing Plink format genotyping data. It was developed for the following applications not supported by the standard Plink executable. 1.1 Equivalence test Plink does not directly support an equivalence test on two datasets. The --merge option does allow a diff on SNP calls, but this does not check whether the SNP and sample sets match. In addition, certain Plink operations (such as merge) may transpose the major and minor alleles in a SNP set, and such a transpose is regarded as a mismatch by the Plink diff. Plinktools allows an equivalence test on pairs of binary or non-binary Plink files, with major/minor allele swaps regarded as matching. 1.2 Fast binary merge The standard Plink merge function performs extended cross-checking and recoding of data. This is rather slow, and does not scale well with an increasing number of inputs. To address this issue, Plinktools implements a fast binary merge for two important special cases: Congruent SNP sets with disjoint samples, and congruent samples with disjoint SNPs. In both cases, input and output is in the default SNP-major format. In the case of congruent samples, the fast merge strips off headers and concatenates the input .bed files with no need for recoding. For congruent SNPs, a fast merge can be done if the number of samples in each input is divisible by 4; otherwise recoding is necessary and the merge will be slowed, although still somewhat faster than Plink. 1.3 Heterozygosity calculation by high/low MAF As part of the Wellcome Trust Oxford SOP for exome chip QC, heterozygosity for each sample is calculated separately for SNPs with minor allele frequency above and below 1%. This test has been implemented in Plinktools. 2. USAGE Plinktools includes three front-end scripts: compare.py to compare two Plink datasets (binary or non-binary) merge_bed.py to merge two or more Plink binary datasets het_by_maf.py to compute heterozygosity for high/low MAF Run any script with --help for more information. 3. SEE ALSO The Plink data format was created by Shaun Purcell et al. See: http://pngu.mgh.harvard.edu/~purcell/plink Plinktools was created to support the WTSI genotyping pipeline: https://github.com/wtsi-npg/genotyping Plinktools is a prerequisite for the WTSI extension of the zCall genotype caller: https://github.com/wtsi-npg/zCall
About
Process Plink genotyping files in Python
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published