-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adds support for regression.mean_squared_error to eval API #44140
[ML] Adds support for regression.mean_squared_error to eval API #44140
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a draft so feel free to disregard comments that are irrelevant/you intended to apply anyway.
...rc/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/Regression.java
Show resolved
Hide resolved
...n/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/MeanSquaredError.java
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/MeanSquaredErrorTests.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/MeanSquaredErrorTests.java
Outdated
Show resolved
Hide resolved
...rg/elasticsearch/client/ml/dataframe/evaluation/regression/MeanSquaredErrorResultsTests.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/RegressionTests.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/RegressionTests.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/MeanSquaredError.java
Outdated
Show resolved
Hide resolved
...el/src/main/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/Regression.java
Outdated
Show resolved
Hide resolved
…gression-mse-evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good Ben.
I have one major observation: we're building in inefficiency by using separate searches for different metrics and should we therefore be using a single scripted aggregation to gather all of the basic statistics we need at once.
Related, although not necessarily required in the first instance, is whether we should include a "normalised" metric such as R^2. This feels like a small step given this PR and could be just rolled in from the start.
|
||
public static final ParseField NAME = new ParseField("mean_squared_error"); | ||
|
||
private static final String PAINLESS_TEMPLATE = "def diff = doc[''{0}''].value - doc[''{1}''].value;return diff * diff;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of scripted aggs!
I think it would be worth gathering extra stats we need for other metrics as part of the same agg to avoid having to visit the same documents multiple times. This raises the question whether this class is too specific or we have some other class which manages the gathering of the raw statistics.
Some ones I'd think would be particularly useful would be:
- R^2 (
= 1 - sum_square(y_act - y_pred) / sum_squared(y_act - mean(y_act))
) for which we need(y_act - mean(y_act))^2
which requires the mean of y_act injected into the script. - mean absolute errors
Note we could also provide explained variance, which is closely related to R^2. This needs to also inject mean(y_act - y_pred)
. From an evaluation perspective it is useful to have "normalised" measures so R^2 and/or explained variance would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tveasey if we add them as "metrics" under the "regression" evaluation, R-Squared and MAE would be part of the same query. With how things are phased queries + aggs, they are applied at the "same time". We may "hit the same doc twice", but it would already be loaded on the shard. The resource utilization difference would be minuscule for the added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify, a Regression
evaluation can have numerous metrics (characterized by unique aggs) but all are done in a single query.
BinarySoftClassification
handles numerous metrics in the same manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, cool. I'd missed this detail: I was thinking each metric was responsible for actually performing its own search. In that case, I think the main comment is "is it worth getting R^2 at the same time?". This is an interesting sort of metric because it can essentially be got from mean squared error together with variance for the actuals. This is a useful metric in its own right, but also incorporating this from the start it will be interesting to see how it fits in without code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tveasey sure, I can add it :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tveasey ok, looking at what is possible with aggs, R^2 will be a two phase thing. We don't have the infrastructure in place for the evaluation API to do two phase metrics. This is something we can add in the future, but will definitely blow up the line count in this PR.
I can add MAE instead if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ben an I discussed this a bit further offline. Computing R^2 is in fact possible without 2-phase, but we feel like it is probably worth moving this to a separate PR since this one is already quite large. We also discussed a separate thought which is should we have a layer which is responsible for gathering simple statistics which are feed into evaluation metrics, MSE and R^2 being examples of metrics which can reuse the same simple statistics. We'll discuss this with @dimitris-athanasiou when he's back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tveasey and I talked offline. He taught me that R^2 can be calculated utilizing the variance, so access to the mean directly is not strictly necessary. :). I think adding new metrics should be booted to another PR to keep this size down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned this is LGTM, but I'm not super familiar with this code, so might be worth having someone else giving it a final check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…tic#44140) * [ML] Adds support for regression.mean_squared_error to eval API * addressing PR comments * fixing tests
…) (#44218) * [ML] Adds support for regression.mean_squared_error to eval API * addressing PR comments * fixing tests
This adds a new evaluation type of
Regression
(inside a new sub-package of the same name). Additionally, it adds a new metric ofMeanSquaredError
.I was debating on making mse more generic and usable in other parts of the evaluation API, but it seems to me that MSE is only really helpful with Regression type results.
I modeled
Regression
afterBinarySoftClassification
. MSE is not the only evaluation metric forRegression
type problems, and we may want to support more in the future.As for
MeanSquaredError
it current accepts no parameters. But, it should allow parameters in the future if necessary. Additionally, this format ofmean_squared_error: {}
adheres to the current API design.