-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan values in the result #184
Comments
Hi @voichek, Thanks for reporting. I'll take a look. It may take up to a week because I am on the road. |
When I run with debug and check switches I get:
|
A debug stack trace shows
shows a division by zero
where the number of test individuals exactly matches the number of NA individuals. I'll add a fix for that. The behaviour of gemma 0.96 is rather unspecified here and 0.98 should have taken the right action (i.e., drop the SNP or halt). |
… It was supposed to work like this! See genetics-statistics#184
Dear @pjotrp, I am not sure I understand, I do have individuals with phenotypic data (no nan) in this dataset. Thanks, |
The problem is in the genotypes. Gemma selects a subset of individuals and for one SNP all were missing. |
Btw you also have missing values in your .fam file. That is why gemma only selects 50 individuals. |
I know, it is supposed to be this way. Thank you for the help! |
I am looking into it. If there is a fix I can make it available. |
I spent some time on this and it is actually quite tricky. When a SNP only contains NAs it is still included in the computation for Plink. I am working on a rewrite of GEMMA which should allow fixing such problems much easier. For now, all I can suggest is to use 0.96 for this particular edge case. If you want to make sure the answer is correct you can also convert the plink files to BIMBAM and drop all SNPs with genotypes that have only missing values for the individuals you are testing. |
Thank you! |
Are you creating a pipeline? Is that public information? Coming year we want to create some pipelines using Galaxy/CWL. |
I am writing some specific pipeline to my project. I don't think there are any competing interests. |
That is not what I am worried about ;). I am merely interested in sharing ideas. |
Soon I hope :) |
Regarding the pipeline I was building using GEMMA, it is now public here. I am calculating associations of k-mers presence/absence calculated directly from sequencing reads, without looking at the genome. I have my own code for the initial approximation of associations as there are a lot of k-mers (hundred of millions or more) and then I use GEMMA for calculating the exact score on a subset of filter variants. (Just remembered that you asked me previously) |
Hi,
When using GEMMA version 0.98 but not 0.96 I get only nan values in the result file:
I have uploaded this example, if you want to take a look.
Hope you can help me,
Yoav
The text was updated successfully, but these errors were encountered: