nan values in the result #184

voichek · 2018-11-06T10:44:30Z

Hi,

When using GEMMA version 0.98 but not 0.96 I get only nan values in the result file:

$ gemma -bfile snps.plink -lmm 2 -k K -o results.0_96
Reading Files ...
## number of total individuals = 1135
## number of analyzed individuals = 50
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 11769920
## number of analyzed SNPs = 1414417
Start Eigen-Decomposition...
pve estimate =0.724026
se(pve) =0.816976
Reading SNPs  ==================================================100.00%
$ head output/results.0_96.assoc.txt -n3
chr     rs      ps      n_miss  allele1 allele0 af      l_mle   p_lrt
1       .       540     1       A       G       0.020   1.000000e+05    1.073966e-01
1       .       603     2       A       G       0.083   1.000000e+05    1.380079e-01
$ cat output/results.0_96.assoc.txt | awk '$9 == "-nan" ' | wc -l
0


$ gemma-0.98-linux-static -bfile snps.plink -lmm 2 -k K -o results.0_98
GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018
Reading Files ...
## number of total individuals = 1135
## number of analyzed individuals = 50
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs/var        = 11769920
## number of analyzed SNPs         =  1414417
Start Eigen-Decomposition...
pve estimate =0.724026
se(pve) =0.816976
================================================== 100%
$ head output/results.0_98.assoc.txt -n3
chr     rs      ps      n_miss  allele1 allele0 af      logl_H1 l_mle   p_lrt
1       .       540     1       A       G       0.020   -nan    1.000000e+05    -nan
1       .       603     2       A       G       0.083   -nan    1.000000e+05    -nan
$ cat output/results.0_98.assoc.txt | awk '$10 == "-nan" ' | wc -l
1414417
$ cat output/results.0_98.assoc.txt | awk '$10 != "-nan" ' | wc -l
1

I have uploaded this example, if you want to take a look.

Hope you can help me,
Yoav

The text was updated successfully, but these errors were encountered:

pjotrp · 2018-11-06T18:56:06Z

Hi @voichek,

Thanks for reporting. I'll take a look. It may take up to a week because I am on the road.

pjotrp · 2018-11-21T08:27:37Z

When I run with debug and check switches I get:

~/gemma-0.98-linux-static -bfile snps.plink -lmm 2 -k K -o results.0_98 -debug -check
GEMMA 0.98 (2018-09-28) by Xiang Zhou and team (C) 2012-2018
Reading Files ... 
**** DEBUG: entered in src/gemma_io.cpp at line 511 in ReadFile_bim
**** DEBUG: entered in src/gemma_io.cpp at line 558 in ReadFile_fam
**** DEBUG: entered in src/gemma_io.cpp at line 857 in ReadFile_bed

FATAL ERROR: GEMMA caused a floating point error which suggests machine boundaries were reached.

You can disable floating point tests with the -no-check switch (use at your own risk!)

Floating point exception

pjotrp · 2018-11-21T08:37:41Z

A debug stack trace shows

#0  0x0000000000432405 in ReadFile_bed (file_bed=..., setSnps=..., W=W@entry=0x1a5a540, indicator_idv=..., indicator_snp=..., snpInfo=..., 
    maf_level=@0x7fffffffdec0: 0.01, miss_level=@0x7fffffffdeb8: 0.050000000000000003, hwe_level=@0x7fffffffdec8: 0, 
    r2_level=@0x7fffffffded0: 0.99990000000000001, ns_test=@0x7fffffffe2f8: 0) at src/gemma_io.cpp:973
#1  0x00000000004187d4 in PARAM::ReadFiles (this=this@entry=0x7fffffffd9d0) at src/param.cpp:277
#2  0x0000000000454112 in GEMMA::BatchRun (this=this@entry=0x7fffffffe790, cPar=...) at src/gemma.cpp:1645
#3  0x000000000048d6ad in main (argc=11, argv=0x7fffffffe928) at src/main.cpp:86

shows a division by zero

maf /= 2.0 * (double)(ni_test - n_miss);

where the number of test individuals exactly matches the number of NA individuals. I'll add a fix for that. The behaviour of gemma 0.96 is rather unspecified here and 0.98 should have taken the right action (i.e., drop the SNP or halt).

… It was supposed to work like this! See genetics-statistics#184

voichek · 2018-11-21T09:08:04Z

Dear @pjotrp,

I am not sure I understand, I do have individuals with phenotypic data (no nan) in this dataset.
How come all were thrown?

Thanks,
Yoav

pjotrp · 2018-11-21T09:24:31Z

The problem is in the genotypes. Gemma selects a subset of individuals and for one SNP all were missing.

pjotrp · 2018-11-21T09:28:13Z

Btw you also have missing values in your .fam file. That is why gemma only selects 50 individuals.

voichek · 2018-11-21T09:31:19Z

I know, it is supposed to be this way.

Thank you for the help!
I will have to wait for the next release for this to be fixed?

pjotrp · 2018-11-21T09:41:38Z

I am looking into it. If there is a fix I can make it available.

pjotrp · 2018-11-21T15:20:48Z

I spent some time on this and it is actually quite tricky.

When a SNP only contains NAs it is still included in the computation for Plink. I am working on a rewrite of GEMMA which should allow fixing such problems much easier. For now, all I can suggest is to use 0.96 for this particular edge case.

If you want to make sure the answer is correct you can also convert the plink files to BIMBAM and drop all SNPs with genotypes that have only missing values for the individuals you are testing.

voichek · 2018-11-21T18:55:45Z

Thank you!
I guess condensing the plink file only to the individuals I use will also work. Though, due to performance consideration I would prefer not to add more steps.

pjotrp · 2018-11-22T08:32:15Z

Are you creating a pipeline? Is that public information? Coming year we want to create some pipelines using Galaxy/CWL.

voichek · 2018-11-22T08:51:11Z

I am writing some specific pipeline to my project. I don't think there are any competing interests.

pjotrp · 2018-11-22T08:54:31Z

That is not what I am worried about ;). I am merely interested in sharing ideas.

voichek · 2018-11-22T09:54:37Z

Soon I hope :)

voichek · 2020-06-18T15:27:06Z

Regarding the pipeline I was building using GEMMA, it is now public here. I am calculating associations of k-mers presence/absence calculated directly from sequencing reads, without looking at the genome. I have my own code for the initial approximation of associations as there are a lot of k-mers (hundred of millions or more) and then I use GEMMA for calculating the exact score on a subset of filter variants.

(Just remembered that you asked me previously)

pjotrp self-assigned this Nov 6, 2018

pjotrp added a commit to genenetwork/GEMMA that referenced this issue Nov 21, 2018

Always check floating point unless the --no-fpe-check switch is used.…

aa07e8a

… It was supposed to work like this! See genetics-statistics#184

pjotrp added bug Later lmm labels Nov 21, 2018

pjotrp added this to the Later milestone Nov 21, 2018

pjotrp mentioned this issue Nov 25, 2018

When genotype values are not in 0.0-2.0 range no error is reported and MAF filter is wrong #187

Closed

peterdfields mentioned this issue Aug 21, 2019

GSL ERROR: function value is not finite in brent.c at line 202 errno 9 #210

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nan values in the result #184

nan values in the result #184

voichek commented Nov 6, 2018

pjotrp commented Nov 6, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 22, 2018

voichek commented Nov 22, 2018

pjotrp commented Nov 22, 2018

voichek commented Nov 22, 2018

voichek commented Jun 18, 2020

nan values in the result #184

nan values in the result #184

Comments

voichek commented Nov 6, 2018

pjotrp commented Nov 6, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 21, 2018

pjotrp commented Nov 21, 2018

voichek commented Nov 21, 2018

pjotrp commented Nov 22, 2018

voichek commented Nov 22, 2018

pjotrp commented Nov 22, 2018

voichek commented Nov 22, 2018

voichek commented Jun 18, 2020