Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make num_top_classes parameter's default value equal to 2 #48119

Merged

Conversation

przemekwitek
Copy link
Contributor

Users are often interested in the probabilities of each class being predicted.
This PR sets the default of num_top_classes parameter (which controls how many classes with probabilities are returned) to 2.

Relates to #46735

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@przemekwitek przemekwitek force-pushed the change_default_num_top_classes branch from 549320b to 5d6c1af Compare October 16, 2019 09:59
@przemekwitek przemekwitek force-pushed the change_default_num_top_classes branch from 2e5121b to 98240f7 Compare October 16, 2019 11:46
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really make sense for multi-class as returning the top two will seem arbitrary in that situation.

Maybe just the probability of the one we return (i.e. numTopClasses = 1)?

One could see a parallel between this setting and the computeFeatureInfluence and featureInfluenceThreshold settings for outlier_detection.

It almost seems like we need a "close to" parameter that returns all the top classes that are above a certain probability... @tveasey what say you?

@przemekwitek
Copy link
Contributor Author

This does not really make sense for multi-class as returning the top two will seem arbitrary in that situation.

I agree. Please remember though that in 7.5 we don't go multiclass and for binary "2" is a sensible default.

Maybe just the probability of the one we return (i.e. numTopClasses = 1)?

Let's revisit it in 7.6 when we go multiclass.

One could see a parallel between this setting and the computeFeatureInfluence and featureInfluenceThreshold settings for outlier_detection.

It almost seems like we need a "close to" parameter that returns all the top classes that are above a certain probability... @tveasey what say you?

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with this, we just need to document that the default value is 2 somewhere and that supplying a 0 is needed to prevent them from being returned.

I do think that a parameter like top_classes_threshold for the user to provide a probability threshold and return the top classes that are above that threshold would be a great addition. I will open an separate issue for that :)

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample
run elasticsearch-ci/packaging-sample-matrix

1 similar comment
@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample
run elasticsearch-ci/packaging-sample-matrix

@przemekwitek przemekwitek force-pushed the change_default_num_top_classes branch from 422ab37 to dbefbc9 Compare October 17, 2019 05:38
@przemekwitek przemekwitek force-pushed the change_default_num_top_classes branch from dbefbc9 to 0c04ba7 Compare October 17, 2019 08:55
@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample-matrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants