Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement evaluation API for multiclass classification problem #47126

Merged

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Sep 25, 2019

This PR adds a new evaluation type ("classification") and a new metric ("MulticlassConfusionMatrix").

Relates #46735

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch 11 times, most recently from 7990436 to e3bae20 Compare September 26, 2019 07:30
@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch from e3bae20 to 7c5a7e9 Compare September 26, 2019 07:33
@przemekwitek przemekwitek removed the WIP label Sep 26, 2019
@przemekwitek przemekwitek marked this pull request as ready for review September 26, 2019 07:36
@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch 2 times, most recently from fe1898e to 60358f9 Compare September 26, 2019 08:56
@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch 2 times, most recently from 09d6ce9 to c2bd38e Compare September 27, 2019 08:08
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a few comments.

.field(actualField)
.order(List.of(BucketOrder.count(false), BucketOrder.key(true)))
.size(size));
} else if (result == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer that an else if clause is operating on the same data as the if clause to keep reasoning simple. In this case I think we don't really need the else at all as we are returning from each branch. So we could have:

if (topActualClassNames == null) {
    // This is step 1
    ...
    return {};
}
if (result == null) {
    // This is step 2
    ...
    return {};
}
return List.of();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would also make it more readable to extract the building of the aggregations of each step into its own method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think we don't really need the else at all as we are returning from each branch.

Done.

I wonder if it would also make it more readable to extract the building of the aggregations of each step into its own method.

I'm not sure this would help much. The method is not (yet) very long so I'd keep it as it is if you don't have a strong opinion about refactor.

public static class Result implements EvaluationMetricResult {

private static final ParseField CONFUSION_MATRIX = new ParseField("confusion_matrix");
private static final ParseField OTHER_CLASSES_COUNT = new ParseField("other_classes_count");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should also be called just _other_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on the API level.

LMK if you think I should also rename the variables. In the code I would slightly prefer more descriptive "other_classes_count".

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/default-distro

@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch from 4c44e2e to 58983da Compare September 30, 2019 06:55
@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/bwc

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/bwc

@przemekwitek przemekwitek force-pushed the classification_evaluation_2 branch from 58983da to d8472f0 Compare September 30, 2019 09:45
@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/1
run elasticsearch-ci/bwc

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants