Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add causal analysis solution #447

Merged
merged 3 commits into from
May 9, 2021
Merged

Add causal analysis solution #447

merged 3 commits into from
May 9, 2021

Conversation

kbattocchi
Copy link
Collaborator

No description provided.

@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch 7 times, most recently from 7471971 to 9043b97 Compare April 5, 2021 20:44
@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from 9043b97 to 32d321e Compare April 15, 2021 17:30
@kbattocchi kbattocchi marked this pull request as ready for review April 30, 2021 15:35
@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from 32d321e to da0ff45 Compare May 5, 2021 14:17
@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from da0ff45 to 98b09af Compare May 5, 2021 16:40
econml/solutions/causal_analysis/_causal_analysis.py Outdated Show resolved Hide resolved
econml/solutions/causal_analysis/_causal_analysis.py Outdated Show resolved Hide resolved
econml/solutions/causal_analysis/_causal_analysis.py Outdated Show resolved Hide resolved

return insights, result

if self.feature_names is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic maybe in some sklearn utility

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are in a warm start why are we not using the same feature names as in the first time.

We could be getting some weirdness here if user passes once a numpy array and then a pandas data frame with warm start

econml/solutions/causal_analysis/_causal_analysis.py Outdated Show resolved Hide resolved
@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from 98b09af to 9dc967b Compare May 7, 2021 16:22
categorical: array-like of int, str, or bool
The features which are categorical in nature, expressed as either column indices,
column names, or boolean flags indicating which columns to pick
heterogeneity_inds: array-like of int, str, or bool, or None or list of array-like elements or None, default None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion I think we should make the default of heterogneity inds be 'all', which means use all variables, and None or empty list, should mean use no variables for heterogeneity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's an empty list, forest option will not work? it requires X? and also all the local view will not work neither

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made sure that empty h_inds now work with linear final models (even with local effects, although there's no point since there are no features to cause variation so all samples have the same results); I have not changed the spec of the method (e.g. enabling 'all') because that seems too risky at this stage for this release. As Maggie notes, empty h_inds will not work with forest dml and I don't see any easy path to making it work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's just elaborate a bit more in the docstring that to use no features for heterogneiety use the empty list.

feature_names: list of str, default None
The names for all of the features in the data. Not necessary if the input will be a dataframe.
If None and the input is a plain numpy array, generated feature names will be ['X1', 'X2', ...].
upper_bound_on_cat_expansion: int, default 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this arg seems not being applied to any place?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a TODO later noting that it's not yet implemented but should be.

@kbattocchi kbattocchi closed this May 7, 2021
@kbattocchi kbattocchi reopened this May 7, 2021
@vsyrgkanis
Copy link
Collaborator

@vsyrgkanis Ok

I might even do the same uniformization even for local effects and maintain the feature_index inner index with the single feature name that we run the analysis for.

@py-why py-why deleted a comment from vsyrgkanis May 7, 2021
@py-why py-why deleted a comment from kbattocchi May 7, 2021
Copy link
Collaborator

@vsyrgkanis vsyrgkanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from c6f5f1e to 878b54d Compare May 8, 2021 02:35
@kbattocchi kbattocchi enabled auto-merge (rebase) May 8, 2021 02:36
@kbattocchi kbattocchi force-pushed the kebatt/causal_analysis branch from 878b54d to 6d4cb76 Compare May 8, 2021 03:14
@kbattocchi kbattocchi merged commit 472dd8e into master May 9, 2021
@kbattocchi kbattocchi deleted the kebatt/causal_analysis branch May 9, 2021 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants