Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve discreteness handling, allow binary outcomes #816

Merged
merged 29 commits into from
Jan 12, 2024

Conversation

fverac
Copy link
Collaborator

@fverac fverac commented Sep 22, 2023

Adds a binary_outcome keyword arg to most estimators, where if True then the outcome nuisance model will be a classifier.

@fverac fverac marked this pull request as ready for review October 13, 2023 15:44
@fverac fverac requested a review from kbattocchi January 3, 2024 05:09
Copy link
Collaborator

@kbattocchi kbattocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great. I've suggested a few minor changes.

Related to this PR, I think lines 52-57 of econml/dml/dml.py should be made more robust - what happens if the target is discrete and there is no predict_proba method, or target is not discrete and there is a predict_proba method? I think in either case, we should at least warn the user that they are passing a classifier where a regressor is expected or vice versa. When there is no predict_proba method but one is expected, there doesn't seem to be much harm in falling back to calling predict instead as long as the user is warned; in the opposite scenario it's less clear to me that calling predict_proba instead of just calling predict as usual is a good idea, but at least if we warn the user they can change the discreteness and get that behavior if they want.

Whatever we decide here, if there's a non-trivial amount of logic we do something similar with other estimators that don't use this first stage wrapper; probably that should happen in a new utility method that this module and others can all use consistently (perhaps something like get_prediction(estimator, expected_discrete)).

econml/_ortho_learner.py Show resolved Hide resolved
econml/_ortho_learner.py Outdated Show resolved Hide resolved
econml/_ortho_learner.py Outdated Show resolved Hide resolved
econml/_ortho_learner.py Outdated Show resolved Hide resolved
econml/_ortho_learner.py Outdated Show resolved Hide resolved
econml/dr/_drlearner.py Outdated Show resolved Hide resolved
econml/dr/_drlearner.py Outdated Show resolved Hide resolved
econml/tests/test_bootstrap.py Outdated Show resolved Hide resolved
econml/tests/test_ortho_learner.py Outdated Show resolved Hide resolved
@fverac
Copy link
Collaborator Author

fverac commented Jan 5, 2024

Mostly looks great. I've suggested a few minor changes.

Related to this PR, I think lines 52-57 of econml/dml/dml.py should be made more robust - what happens if the target is discrete and there is no predict_proba method, or target is not discrete and there is a predict_proba method? I think in either case, we should at least warn the user that they are passing a classifier where a regressor is expected or vice versa. When there is no predict_proba method but one is expected, there doesn't seem to be much harm in falling back to calling predict instead as long as the user is warned; in the opposite scenario it's less clear to me that calling predict_proba instead of just calling predict as usual is a good idea, but at least if we warn the user they can change the discreteness and get that behavior if they want.

Whatever we decide here, if there's a non-trivial amount of logic we do something similar with other estimators that don't use this first stage wrapper; probably that should happen in a new utility method that this module and others can all use consistently (perhaps something like get_prediction(estimator, expected_discrete)).

Adding a warning when first stage is discrete target but model does not have predict_proba,
and raising an error when first stage target is continuous but model does have predict_proba.

@fverac fverac requested a review from kbattocchi January 5, 2024 19:15
@fverac fverac force-pushed the fverac/improve_discreteness_handling branch from 87ccacf to 0757d39 Compare January 5, 2024 19:31
Copy link
Collaborator

@kbattocchi kbattocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good; I've made a couple of minor suggestion but you can merge as soon as you are comfortable with it.

if len(self.outcome_transformer.classes_) > 2:
raise AttributeError(
"More than 2 outcome classes detected. This method currently only supports binary outcomes")
f"({self.outcome_transformer.classes_} outcome classes detected. \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're including the classes themselves rather than their count here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@@ -1482,3 +1482,29 @@ def jacify_featurizer(featurizer):
a function for calculating the jacobian
"""
return _TransformerWrapper(featurizer)


def single_strata_from_discrete_arrays(arrs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The singular for strata is stratum, so this name seems slightly weird. I think strata_from_discrete_arrays (since this gets the strata for all of the rows at once) is shorter and just as clear.

@fverac fverac merged commit ababb7e into main Jan 12, 2024
77 checks passed
@fverac fverac deleted the fverac/improve_discreteness_handling branch January 12, 2024 20:31
kbattocchi pushed a commit to kbattocchi/EconML that referenced this pull request Jan 23, 2024
Adds a binary_outcome keyword arg to most estimators, where if True then the outcome nuisance model will be a classifier.

Additionally add constraints to ensure nuisance model discreteness is handled appropriately by the user. 
If a nuisance model has a continuous target but a classifier is passed, then will raise an AttributeError.
Conversely, if a nuisance model has a discrete target but a regressor is passed, then a warning is issued.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants