-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mlr::arsq gives 1.0 always and greater than mlr::rsq #2711
Comments
Thanks for reporting. Can you please add a reproducible example, preferably using the reprex package? Its great that you found out how to make a custom measure. Also you state: "gives always 1.0". Can you also give a reprex for this point? |
# Setup ####
Install_And_Load <- function(packages) {
k <- packages[!(packages %in% installed.packages()[,"Package"])];
if(length(k))
{install.packages(k, repos='https://cran.rstudio.com/');}
for(package_name in packages)
{suppressMessages(library(package_name,character.only=TRUE, quietly = TRUE));}
}
Install_And_Load(c("rapportools",
"plyr" , "dplyr", "tidyr",
"plotly", "gganimate", "gifski", "png", "ggmosaic",
"tictoc",
"pracma", "tis",
"purrr", "purrrlyr",
"cluster", "Rtsne", "mlr",
"kknn", "glmnet", "h2o", "caret",
"irace", "parallelMap",
"parallel" , "mice",
# wordcloud
"wordcloud", 'wordcloud2', 'RColorBrewer',
'datarium'
))
#> Warning: package 'dplyr' was built under R version 3.5.3
#> Warning: package 'gganimate' was built under R version 3.5.3
#> Warning: package 'gifski' was built under R version 3.5.3
#> Warning: package 'ggmosaic' was built under R version 3.5.3
#> Warning: package 'pracma' was built under R version 3.5.3
#> Warning: package 'tis' was built under R version 3.5.3
#> Warning: package 'purrrlyr' was built under R version 3.5.3
#> Warning: package 'Rtsne' was built under R version 3.5.3
#> Warning: package 'mlr' was built under R version 3.5.3
#> Warning: package 'ParamHelpers' was built under R version 3.5.3
#> Warning: package 'kknn' was built under R version 3.5.3
#> Warning: package 'glmnet' was built under R version 3.5.3
#> Warning: package 'h2o' was built under R version 3.5.3
#> Warning: package 'irace' was built under R version 3.5.3
#> Warning: package 'parallelMap' was built under R version 3.5.3
#> Warning: package 'mice' was built under R version 3.5.3
#> Warning: package 'wordcloud' was built under R version 3.5.3
#> Warning: package 'wordcloud2' was built under R version 3.5.3
#> Warning: package 'datarium' was built under R version 3.5.3
# Load sample data ####
data("marketing", package = "datarium")
# Split data
set.seed(12345)
index <- sample(1:200, size = trunc(.7 * 200))
trainData <- marketing %>% filter(row_number() %in% index)
testData <- marketing %>% filter(!row_number() %in% index)
# Generate task ####
trainTask <- trainData %>%
as.data.frame() %>%
makeRegrTask(data = ., target = "sales", id = "Train data")
testTask <- testData %>%
as.data.frame() %>%
makeRegrTask(data = ., target = "sales", id = "Test data")
# Generate the learner ####
regr.lm.learner = makeLearner("regr.lm")
# Train the learner on trainData ####
model <- mlr::train(learner = regr.lm.learner,
task = trainTask)
# Predict on testData ####
testPred <- predict(model, task = testTask)
# Measure performance ####
# With custom arsq.v2
arsq.v2 = makeMeasure(id = "arsq.v2", minimize = FALSE, best = 1, worst = 0,
properties = c("regr", "req.pred", "req.truth"),
name = "Adjusted coefficient of determination",
note = "Defined as: 1 - (1 - rsq) * ((n - 1) / (n - p - 1L)). Adjusted R-squared is only defined for normal linear regression.",
fun = function(task, model, pred, feats, extra.args) {
n = length(pred$data$truth)
p = length(model$features)
if (n == p + 1) {
warning("Adjusted R-squared is undefined if the number observations is equal to the number of independent variables plus one.")
return(NA_real_)
}
1 - (1 - measureRSQ(pred$data$truth, pred$data$response)) * ((n - 1) / (n - p - 1L))
})
meas = mlr::performance(testPred, measures = list(mlr::rmse, mlr::mae, mlr::rsq, arsq, arsq.v2)); meas
#> rmse mae rsq arsq arsq.v2
#> 2.1012827 1.5839530 0.9021156 1.0000000 0.9021156
# finding the bug ####
# setting n and p
n = length(testPred$data$truth); n # no of observations
#> [1] 60
p = length(model$features); p # no of features
#> [1] 3
# calculate arsq without any mlr
preds = testPred$data$response
actual = testPred$data$truth
rss <- sum((preds - actual) ^ 2) # residual sum of squares
tss <- sum((actual - mean(actual)) ^ 2) # total sum of squares
r.squared <- 1 - rss/tss; r.squared
#> [1] 0.9021156
adj.r.squared = 1 - (1 - r.squared) * ((n - 1)/(n-p-1)); adj.r.squared
#> [1] 0.8968717
# calculate arsq using R.Square from mlr results
R.squared = meas[[3]]; R.squared
#> [1] 0.9021156
adj.r.squared = 1 - (1 - R.squared) * ((n - 1)/(n-p-1)); adj.r.squared
#> [1] 0.8968717
# using mlr::ARSQ
measures::ARSQ(truth = testPred$data$truth,
response = testPred$data$response,
n,
p)
#> [1] 0.9947562
# reprex::reprex() Created on 2020-01-09 by the reprex package (v0.3.0) |
Basically two things
mlr::arsq gives value close to 1 and greater than mlr::rsq, where as adjusted r square should be always less than r square. This allerted me. I checked its code and found it to be different from what I know. Ideally, arsq should be Let me know if I am wrong about this.
I created arsq.v2 (using makeMeasure) with update in the formula part of mlr::arsq. But it gives same value as rsq. Where as same formula outside of mlr gives a value less than rsq. |
Ok, thanks for the research! I'll take a closer look in the next days and see what we can do on our side. For backward comp we can't just change the implementation but would need to solve this via a new arg or similar (if we take action). |
sure we can chnage this, if this is a bug, and we should change this. |
If there is only one implementation - it looks like there are maybe multiple ones floating around? I haven't yet taken a look how we did it in mlr3measures. If its clearly wrong then sure, we should just replace it. |
looks like a weird mix up between two definitions + mix up wrt the definition of The two, equivalent definitions in German Wikipedia are
In the german Wikipedia
Are cases caught were the model is defined like Also, in
where |
thats actually not really possible in mlr. well at least not directly, mlr does not have a formula interface. we do support specific learners for poly-regression though. should we maybe simple remove the measure? it really does not make sense from a broad ML perspective? |
I wouldn't mind. It only really makes sense when comparing models with different number of parameters in sample, i.e. on training data (no need to adjust R^2 when evaluated on test data) and according to help page it is only available for "normal" linear models, i.e. OLS. So how often does one compare performance of different linear models on training data using mlr? |
exactly, i agree 100%. we can still advice @ManuKulamkombil how to produce a "local" measure for his purposes, as he seems to be the one user that needs the adj-r^2 with mlr :) |
all agree? |
Ok, I'll remove it. |
Agreed. But I was not successful in creating Arsq correctly. I created arsq.v2 but it gave a value equal to rsq and not less than that. |
Please suggest corrections in arsq.v2 Thanks in advance |
@ManuKulamkombil Are you sure you need Adjusted R^2? You evaluate the performance on test data mlr::performance(testPred, measures = list(mlr::rmse, mlr::mae, mlr::rsq, arsq, arsq.v2)) Adjusted R^2 is only useful when comparing different models on training data. |
I am using it on training data for comparing models in my work, but I did not show that in my example. I wanted to make a brief example, so I skipped that part. |
You need to provide the trainPred <- predict(model, task = trainTask)
mlr::performance(trainPred, model = model, measures = list(mlr::rmse, mlr::mae, mlr::rsq, mlr::arsq, arsq.v2)) Otherwise |
Prework
Description
After train and predict, arsq (adjusted r squared) gave result as 1 always. Also greater than rsq value. I guess the formula used for arsq is different from what I know of. I tried with custom r code and got the right answer. Tried to extend to arsq.v2 in my local system with required changes and got an answer equal to rsq and not arsq. I am currently using mlr version 2.14.0
Reproducible example
Learned that the formula given in the link
is not same as I expected it to be. My understanding of ARSQ is
This is the inbuilt arsq:
mlr/R/measures.R
Line 268 in e124753
I tried to change that as follows:
Compared both with this
Got the following results.
Then tried arsq with custom rsq
Expected output
The text was updated successfully, but these errors were encountered: