Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Adds support for regression.mean_squared_error to eval API #44140

Merged

Conversation

benwtrent
Copy link
Member

This adds a new evaluation type of Regression (inside a new sub-package of the same name). Additionally, it adds a new metric of MeanSquaredError.

I was debating on making mse more generic and usable in other parts of the evaluation API, but it seems to me that MSE is only really helpful with Regression type results.

I modeled Regression after BinarySoftClassification. MSE is not the only evaluation metric for Regression type problems, and we may want to support more in the future.

As for MeanSquaredError it current accepts no parameters. But, it should allow parameters in the future if necessary. Additionally, this format of mean_squared_error: {} adheres to the current API design.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a draft so feel free to disregard comments that are irrelevant/you intended to apply anyway.

@benwtrent benwtrent marked this pull request as ready for review July 10, 2019 13:44
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good Ben.

I have one major observation: we're building in inefficiency by using separate searches for different metrics and should we therefore be using a single scripted aggregation to gather all of the basic statistics we need at once.

Related, although not necessarily required in the first instance, is whether we should include a "normalised" metric such as R^2. This feels like a small step given this PR and could be just rolled in from the start.


public static final ParseField NAME = new ParseField("mean_squared_error");

private static final String PAINLESS_TEMPLATE = "def diff = doc[''{0}''].value - doc[''{1}''].value;return diff * diff;";
Copy link
Contributor

@tveasey tveasey Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of scripted aggs!

I think it would be worth gathering extra stats we need for other metrics as part of the same agg to avoid having to visit the same documents multiple times. This raises the question whether this class is too specific or we have some other class which manages the gathering of the raw statistics.

Some ones I'd think would be particularly useful would be:

  1. R^2 (= 1 - sum_square(y_act - y_pred) / sum_squared(y_act - mean(y_act))) for which we need (y_act - mean(y_act))^2 which requires the mean of y_act injected into the script.
  2. mean absolute errors

Note we could also provide explained variance, which is closely related to R^2. This needs to also inject mean(y_act - y_pred). From an evaluation perspective it is useful to have "normalised" measures so R^2 and/or explained variance would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey if we add them as "metrics" under the "regression" evaluation, R-Squared and MAE would be part of the same query. With how things are phased queries + aggs, they are applied at the "same time". We may "hit the same doc twice", but it would already be loaded on the shard. The resource utilization difference would be minuscule for the added complexity.

Copy link
Member Author

@benwtrent benwtrent Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify, a Regression evaluation can have numerous metrics (characterized by unique aggs) but all are done in a single query.

See: https://github.com/elastic/elasticsearch/pull/44140/files#diff-391a4ea319550ee94db861139fc86e9aR108

BinarySoftClassification handles numerous metrics in the same manner.

Copy link
Contributor

@tveasey tveasey Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool. I'd missed this detail: I was thinking each metric was responsible for actually performing its own search. In that case, I think the main comment is "is it worth getting R^2 at the same time?". This is an interesting sort of metric because it can essentially be got from mean squared error together with variance for the actuals. This is a useful metric in its own right, but also incorporating this from the start it will be interesting to see how it fits in without code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey sure, I can add it :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey ok, looking at what is possible with aggs, R^2 will be a two phase thing. We don't have the infrastructure in place for the evaluation API to do two phase metrics. This is something we can add in the future, but will definitely blow up the line count in this PR.

I can add MAE instead if you would like.

Copy link
Contributor

@tveasey tveasey Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ben an I discussed this a bit further offline. Computing R^2 is in fact possible without 2-phase, but we feel like it is probably worth moving this to a separate PR since this one is already quite large. We also discussed a separate thought which is should we have a layer which is responsible for gathering simple statistics which are feed into evaluation metrics, MSE and R^2 being examples of metrics which can reuse the same simple statistics. We'll discuss this with @dimitris-athanasiou when he's back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey and I talked offline. He taught me that R^2 can be calculated utilizing the variance, so access to the mean directly is not strictly necessary. :). I think adding new metrics should be booted to another PR to keep this size down.

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned this is LGTM, but I'm not super familiar with this code, so might be worth having someone else giving it a final check.

@benwtrent benwtrent requested a review from przemekwitek July 10, 2019 17:52
Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 873e9f9 into elastic:master Jul 11, 2019
@benwtrent benwtrent deleted the feature/ml-add-regression-mse-evaluation branch July 11, 2019 12:13
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jul 11, 2019
…tic#44140)

* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
benwtrent added a commit that referenced this pull request Jul 11, 2019
…) (#44218)

* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants