-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LRT for comparing non-mixed to mixed models #508
Conversation
!!! note | ||
For comparisons between mixed and non-mixed models, the deviance for the non-mixed | ||
model is taken to be -2 log likelihood, i.e. omitting the additive constant for the | ||
fully saturated model. This is in line with the computation of the deviance for mixed | ||
models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmbates I can feel you looking over my shoulder and shaking your head when I write something like "the deviance isn't really the deviance". 🙂 Do you have a better way of handling / labelling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important point is that the difference in -2 log likelihood is the same as the difference in deviances would be if they could be calculated. Of course, now that I have written that I wonder if it makes sense when comparing a LinearModel and a LinearMixedModel. We know what the saturated model for a LinearModel is but we don't for a LinearMixedModel. My statement seems to indicate that the deviance for the saturated model in a LinearMixedModel would then have to be the same as that of a LinearModel but I am not sure that makes sense. I am starting to confuse myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad I'm not the only one who struggles with some of this stuff. 😄
I think the relevant bit here is actually that the test statistic for LRT is -2 log(L_1/L2) which then is just the difference of -2 log likelihood. Using the true deviance was always just a convenience because the additive constants cancel out, when both models have the same constant.
All that said, my question is: should we change the label of that column and/or the internal fields? I would prefer to keep the internal fieldnames, but changing the show
method's output is trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. For a linear mixed model it would be more accurate to label the column as -2 loglik
. For a GLMM I think it should be deviance
. I believe that for a Binomial
GLMM that is not also Bernoulli
(e.g. cbpp
) the quantity in that column will not be -2 loglik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that is true. And indeed the loglikelihood
and deviance
for GLMM are defined independently.
Let me see what it would take to add a conditional to the show
method --I think we'll need to add something to the LikelihoodRatioTest
struct because right now there is no notion of which model it came from.
Codecov Report
@@ Coverage Diff @@
## main #508 +/- ##
==========================================
+ Coverage 93.99% 94.14% +0.14%
==========================================
Files 26 26
Lines 2164 2236 +72
==========================================
+ Hits 2034 2105 +71
- Misses 130 131 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why true nesting isn't checked (it's hard to do in general) but it feels like a potential footgun...I suggested a few places where you could be clearer in the docstrings and/or add some additional heuristics for non-formula-bearing GLMs.
There is a way of checking nesting - approximately checking due to floating point, of course. The columns of the model matrix for the linear model should be in the column space of |
I like this idea (and that's what the tolerance kwarg is for, I suppose), but I think that should be a different PR because we can add that to the nesting checks for two |
The CI failures on Julia v1.4 are because the newly added code calls Should we continue to test on 1.4? Is it because that is currently the LTS release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look closely at the matrix checking stuff because it's a bit over my head but otherwise looks good.
Co-authored-by: Dave Kleinschmidt <[email protected]>
Co-authored-by: Douglas Bates <[email protected]>
@dmbates The current LTS is 1.0! I'm happy to drop 1.4 and 1.5 with MixedModels 4.0, if for no other reason than CI speed. If/when 1.6 is declared LTS, my preference would be to have to LTS to change out tiered support architecture to "LTS" and "current release". |
* LRT for comparing non-mixed to mixed models * nesting for LM/GLMM and GLM/LMM * NEWS entry * use _iscomparable instead of isnested * add missing dep * inner has no more preds than outer, thx @kleinschmidt * column span check * oops * julia 1.4 compat * condition LRT show method on G/LMM * use deviance for GLM and -2 loglikelihood for LMM * contains -> occursin (Julia 1.4 compat) * Apply suggestions from code review Co-authored-by: Dave Kleinschmidt <[email protected]> * copy-paste error * reenable a shortcut * Apply suggestions from code review Co-authored-by: Douglas Bates <[email protected]> Co-authored-by: Dave Kleinschmidt <[email protected]> Co-authored-by: Douglas Bates <[email protected]> (cherry picked from commit 365e5f7)
* LRT for comparing non-mixed to mixed models (#508) * LRT for comparing non-mixed to mixed models * nesting for LM/GLMM and GLM/LMM * NEWS entry * use _iscomparable instead of isnested * add missing dep * inner has no more preds than outer, thx @kleinschmidt * column span check * oops * julia 1.4 compat * condition LRT show method on G/LMM * use deviance for GLM and -2 loglikelihood for LMM * contains -> occursin (Julia 1.4 compat) * Apply suggestions from code review Co-authored-by: Dave Kleinschmidt <[email protected]> * copy-paste error * reenable a shortcut * Apply suggestions from code review Co-authored-by: Douglas Bates <[email protected]> Co-authored-by: Dave Kleinschmidt <[email protected]> Co-authored-by: Douglas Bates <[email protected]> (cherry picked from commit 365e5f7) * patch bump * add Distribution and Link methods
Closes #491.
This only allows one non-mixed model and it has to be the first argument. That's fairly restrictive, but generally speaking, I want to avoid testing a massive series of increasingly complicated fixed-effects models and then adding random-effects onto that series.