Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtc to plink ped format - 0|G instead of A|G #30

Open
1teaspoon opened this issue Mar 10, 2022 · 1 comment
Open

gtc to plink ped format - 0|G instead of A|G #30

1teaspoon opened this issue Mar 10, 2022 · 1 comment

Comments

@1teaspoon
Copy link

Hi folks,

I am having an issue when using IlluminaBeadArray libaray to convert gtc file to ped file format. A lot of snps in the converted plink file have 0|G instead they should be A|G. Below is the part of my script related to this conversion.

import sys
import os
from IlluminaBeadArrayFiles import GenotypeCalls, BeadPoolManifest, code2genotype

def outputPlink(gtc_file, manifest_file, sample_name, plink_out_dir, genoThresh = 0.15):
manifest = BeadPoolManifest(manifest_file)
gtc = GenotypeCalls(gtc_file)
GenoScores = gtc.get_genotype_scores()
top_strand_genotypes = gtc.get_base_calls()
outBase = plink_out_dir + '/' + sample_name
allGenotypes = []
with open(outBase + '.ped', 'w') as pedOut, open(outBase +'.map','w') as mapOut:
for (name, chrom, map_info, source_strand_genotype, genoScore) in zip(manifest.names, manifest.chroms, manifest.map_infos, top_strand_genotypes, GenoScores):
mapOut.write(' '.join([chrom, name, '0', str(map_info)]) + '\n')
if source_strand_genotype == '--':
geno = ['0', '0']
else:
geno = [source_strand_genotype[0], source_strand_genotype[1]]
allGenotypes += geno
pedOut.write(' '.join([sample_name, sample_name, '0', '0', '0', '-9'] + allGenotypes) + '\n')

@jjzieve
Copy link
Contributor

jjzieve commented Mar 10, 2022

@1teaspoon Can you post the entire script? Or better, open a branch? And which product are you running? Just looking at the code, I can't really tell what might be going wrong. I'd assume top_strand_genotypes would be populated with with actual string base calls based on get_base_calls (

def get_base_calls(self):
) so not sure where a "0" came up. But hard to say without reproducing what you're running and stepping through the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants