-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement evaluation API for multiclass classification problem #47126
Implement evaluation API for multiclass classification problem #47126
Conversation
Pinging @elastic/ml-core |
7990436
to
e3bae20
Compare
e3bae20
to
7c5a7e9
Compare
fe1898e
to
60358f9
Compare
09d6ce9
to
c2bd38e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few comments.
.field(actualField) | ||
.order(List.of(BucketOrder.count(false), BucketOrder.key(true))) | ||
.size(size)); | ||
} else if (result == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally prefer that an else if
clause is operating on the same data as the if
clause to keep reasoning simple. In this case I think we don't really need the else
at all as we are returning from each branch. So we could have:
if (topActualClassNames == null) {
// This is step 1
...
return {};
}
if (result == null) {
// This is step 2
...
return {};
}
return List.of();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would also make it more readable to extract the building of the aggregations of each step into its own method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think we don't really need the else at all as we are returning from each branch.
Done.
I wonder if it would also make it more readable to extract the building of the aggregations of each step into its own method.
I'm not sure this would help much. The method is not (yet) very long so I'd keep it as it is if you don't have a strong opinion about refactor.
public static class Result implements EvaluationMetricResult { | ||
|
||
private static final ParseField CONFUSION_MATRIX = new ParseField("confusion_matrix"); | ||
private static final ParseField OTHER_CLASSES_COUNT = new ParseField("other_classes_count"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should also be called just _other_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done on the API level.
LMK if you think I should also rename the variables. In the code I would slightly prefer more descriptive "other_classes_count".
...lti-node-tests/src/test/java/org/elasticsearch/xpack/ml/integration/EvaluateDataFrameIT.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/classification/Classification.java
Show resolved
Hide resolved
run elasticsearch-ci/default-distro |
4c44e2e
to
58983da
Compare
run elasticsearch-ci/bwc |
run elasticsearch-ci/bwc |
58983da
to
d8472f0
Compare
run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds a new evaluation type ("classification") and a new metric ("MulticlassConfusionMatrix").
Relates #46735