Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.resid column doesn't exist with log output in augment.lm #937

Closed
simonpcouch opened this issue Sep 18, 2020 · 5 comments
Closed

.resid column doesn't exist with log output in augment.lm #937

simonpcouch opened this issue Sep 18, 2020 · 5 comments

Comments

@simonpcouch
Copy link
Collaborator

simonpcouch commented Sep 18, 2020

Thanks to @rmtrane for noting the issue and supplying a reprex!

library(broom)

packageVersion("broom")
#> [1] '0.7.0'

augment(lm(data = mtcars, mpg ~ hp))
#> # A tibble: 32 x 9
#>    .rownames           mpg    hp .fitted .resid .std.resid   .hat .sigma .cooksd
#>    <chr>             <dbl> <dbl>   <dbl>  <dbl>      <dbl>  <dbl>  <dbl>   <dbl>
#>  1 Mazda RX4          21     110    22.6 -1.59      -0.421 0.0405   3.92 3.74e-3
#>  2 Mazda RX4 Wag      21     110    22.6 -1.59      -0.421 0.0405   3.92 3.74e-3
#>  3 Datsun 710         22.8    93    23.8 -0.954     -0.253 0.0510   3.92 1.73e-3
#>  4 Hornet 4 Drive     21.4   110    22.6 -1.19      -0.315 0.0405   3.92 2.10e-3
#>  5 Hornet Sportabout  18.7   175    18.2  0.541      0.143 0.0368   3.93 3.89e-4
#>  6 Valiant            18.1   105    22.9 -4.83      -1.28  0.0432   3.82 3.69e-2
#>  7 Duster 360         14.3   245    13.4  0.917      0.250 0.0976   3.92 3.38e-3
#>  8 Merc 240D          24.4    62    25.9 -1.47      -0.396 0.0805   3.92 6.88e-3
#>  9 Merc 230           22.8    95    23.6 -0.817     -0.217 0.0496   3.93 1.23e-3
#> 10 Merc 280           19.2   123    21.7 -2.51      -0.661 0.0351   3.90 7.94e-3
#> # … with 22 more rows

augment(lm(data = mtcars, log(mpg) ~ hp))
#> # A tibble: 32 x 8
#>    .rownames         `log(mpg)`    hp .fitted .std.resid   .hat .sigma   .cooksd
#>    <chr>                  <dbl> <dbl>   <dbl>      <dbl>  <dbl>  <dbl>     <dbl>
#>  1 Mazda RX4               3.04   110    3.08    -0.213  0.0405  0.189 0.000958 
#>  2 Mazda RX4 Wag           3.04   110    3.08    -0.213  0.0405  0.189 0.000958 
#>  3 Datsun 710              3.13    93    3.14    -0.0820 0.0510  0.189 0.000181 
#>  4 Hornet 4 Drive          3.06   110    3.08    -0.109  0.0405  0.189 0.000253 
#>  5 Hornet Sportabout       2.93   175    2.86     0.373  0.0368  0.189 0.00266  
#>  6 Valiant                 2.90   105    3.10    -1.13   0.0432  0.185 0.0286   
#>  7 Duster 360              2.66   245    2.62     0.226  0.0976  0.189 0.00275  
#>  8 Merc 240D               3.19    62    3.25    -0.299  0.0805  0.189 0.00392  
#>  9 Merc 230                3.13    95    3.13    -0.0440 0.0496  0.189 0.0000506
#> 10 Merc 280                2.95   123    3.04    -0.459  0.0351  0.188 0.00384  
#> # … with 22 more rows

Created on 2020-09-18 by the reprex package (v0.3.0.9001)

@simonpcouch
Copy link
Collaborator Author

@alexpghayes, read on an old issue (can't find it now) you writing that we need to document more clearly when .resid should and shouldn't exist.

What are those criteria? Can just write what follows from the logic in augment_newdata() if that's what you meant.

@vincentarelbundock
Copy link
Contributor

Diagnostic:

augment_newdata adds a new column called .resid only if this condition is met:

  resp <- safe_response(x, df)
  if (!is.null(resp) && is.numeric(resp)) {
    df$.resid <- (resp - df$.fitted) %>% unname() 
  }

In turn, the response (inside safe_response) function is defined as:

response <- function(object, newdata = NULL) {
  model.response(model.frame(terms(object), data = newdata, na.action = na.pass))
}

But this function failes to return the response values when the response is transformed, because newdata only includes the original variables.

@alexpghayes
Copy link
Collaborator

Yeah this is a bug, not intentional behavior. We do want residuals here. I'm surprised that safe_response() does this. I don't have the brain time to look at this now but am happy to review PRs that attempt to fix this (and I would like to review them because I recall there being some nuances here potentially).

@github-actions
Copy link

github-actions bot commented Jan 5, 2022

This issue has been automatically closed due to inactivity.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants