Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Causal Discovery? #98

Closed
dcompgriff opened this issue Jan 13, 2020 · 14 comments
Closed

Implementation of Causal Discovery? #98

dcompgriff opened this issue Jan 13, 2020 · 14 comments
Labels
discussion Discussion about causal inference and DoWhy's roadmap. enhancement New feature or request

Comments

@dcompgriff
Copy link

I'm curious if anyone is interested in folding causal discovery algorithms into the dowhy package? I currently use the 'Causal Discovery Toolkit' (cdt) along with my own code for performing causal discovery. I think that for sufficiently complex problem domains, causal discovery is a necessary first half of causal analysis.

@amit-sharma
Copy link
Member

Thanks for the pointer @dcompgriff . The causal discovery toolbox (cdt) looks quite cool. I would definitely like to see causal discovery integrated with DoWhy.

However, @emrekiciman and I have been discussing on how exactly to integrate with DoWhy. One option is to use them upfront in the modelling stage. This has the benefit of helping people work with complex datasets, as you say. But many of the discovery algorithm do not handle unobserved confounders well, so any obtained graph may be susceptible to biases due to unobserved confounding. So we'll need some way of conveying to the users the exact assumptions on which the causal model is generated.

Another option is to let the user specify a graph in the model stage, but then use causal discovery algorithms to detect any obvious problems with the user's graph. Of course, to over-ride the user's graph, we will probably use only the edges on which the causal discovery algorithm is most certain about. This may need additional work (to identify which of the edges are more robust in the learnt causal graph), but may be a nice way to combine user's domain knowledge with the power of causal discovery algorithms. It may also convey to the user that causal discovery algorithms are better thought of as algorithmic suggestions, rather than the true correct graph. The downside, of course, is that the process (and the API) for doing this will look complicated. More generally, there's an opportunity to frame some of the causal discovery work as a refutation of the user's model.

What do you think about these two alternatives?
As a library, it might make sense for DoWhy to provide both options to the user, but it will be good to discuss how we would like the default experience to be.

@amit-sharma amit-sharma added discussion Discussion about causal inference and DoWhy's roadmap. enhancement New feature or request labels Jan 17, 2020
@emrekiciman
Copy link
Member

emrekiciman commented Jan 20, 2020 via email

@dcompgriff
Copy link
Author

  1. When to integrate causal discovery?
    Causal discovery can be done completely before construction of the 'CausalModel' object. For example, today I use causal discovery algorithms to generate a networkx graph, and then feed this graph structure into the CausalModel because I can convert the networkx graph ito a glm format. Truthfully, this could probably just be it's own sub-module of dowhy, one that doesn't even have to change the API already available because it doesn't touch anything in the stages of analysis after you have the graph defined.

  2. Discovery limitations.
    You bring up a great point about limitations of causal discovery, and these should for sure be outlined as either a warning or just in the documentation.

  3. Checking/refuting provided models?
    I think this may be interesting to include as a optional flag during the modeling stage. When the CausalModel object is first created, there can simply be an optional flag for whether to validate the model's graph using causal discovery algorithms.

  4. Ambiguous edges?
    The way I deal with ambiguous edges is to manually examine the graph output from causal discovery, and then attempt to orient the edges I can using domain knowledge. Not exactly automated, but then again, classical causal inference already has it's own assumptions on the entire graph, so I feel this is ok. However, I think i've seen some packages that will output causal estimates for all graphs (even with ambiguous edges) by enumerating the edges in the graph, and then applying causal estimation. I'm less of a fan of this for more tha 2 ambiguous edges however, and don't think this should be incorporated regardless because this can already be done with the current package by having the user do this enumeration.

@emrekiciman
Copy link
Member

emrekiciman commented Jan 24, 2020 via email

@dcompgriff
Copy link
Author

dcompgriff commented Jan 25, 2020

Speaking to the handling of ambiguous edges:
I think that if the current code for performing identification/estimation/sensitivity analysis requires a DAG, then adding error code (unless you have it already) for when a graph with bi-directional edges is passed is at least one way to deal with the issue of ambiguous edges, forcing users to at least set the direction of these edges themselves, or find causal estimates given both directions.

As for how to force users to evaluate their graphs constructed from causal discovery... There's only so much that can be done from an API perspective I think. Outputting a warning for sure would be good, but some of it may just come down to more documentation. While causal discovery and causal estimation have some nice theoretical foundations, I've found that I've needed to be more involved with validating the graphs output by causal discovery. From everyone I've talked to using this analysis in industry, I think it's still 'best practices' to sit down and visually validate proposed graphs. Causal discovery is useful, but not perfect. It can help in providing insight into unknown causal directions in some cases, but in others it's non-sensical. For example, i've had discovery algorithms try to tell me that 'number of products purchased' was causal of 'total users' for a customer in one of my projects. I think the best thing to do is to output warnings about the potential issues of discovered graphs, and provide good tutorial&API documentation discussing these issues. But either way I still feel discovery algorithms are valuable to have. I'm personally wary of fully specifying the causal graph structure myself without at least trying FIC (Fast IC), GES (Greedy Equivalence Search), and other algorithms.

@emrekiciman
Copy link
Member

emrekiciman commented Feb 1, 2020 via email

@dcompgriff
Copy link
Author

Sure. I can work on making this contribution. I've been meaning to implement some of these algorithms anyways because I have some challenges with the existing ones and the constraints they allow me to specify before performing discovery.

@emrekiciman
Copy link
Member

emrekiciman commented Feb 5, 2020 via email

@nsalas24
Copy link

Hey @dcompgriff,

You might find this repo useful as well: https://github.com/quantumblacklabs/causalnex

They implement the NO TEARS algorithm https://arxiv.org/abs/1803.01422

@amit-sharma
Copy link
Member

thanks for sharing the link to causalnex @nsalas24. That looks like an excellent library for structure learning.

@dcompgriff
Copy link
Author

Awesome, thanks @naslas24. I've been waiting for the quantum black folks to come out with their causal inference package. I'll definitely take a look at this.

@yangliu2
Copy link

I don't have anything to add. But yes, add the causal discovery part to the package. So people can use both parts in a unified framework. This is nice btw.

@BoltzmannBrain
Copy link

FWIW my team has found problems with the aforementioned CausalNex NOTEARS for causal discovery, summarized well by others here: https://arxiv.org/abs/2104.05441

If there's initiative for adding causal discovery to dowhy @amit-sharma, happy to help in some capacity.

@amit-sharma
Copy link
Member

yeah, I'd seen that paper too and realized that NOTEARS-like continuous optimizers are not ready yet for causal discovery.

Thanks for restarting this thread @BoltzmannBrain We just added an experimental implementation of causal discovery in DoWhy. It leans on the existing implementations of standard algorithms, and simply provides an API wrapper to standardize and allow multiple methods. Here's a notebook: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy_causal_discovery_example.ipynb

As you can see, this is very basic (and still the different methods do not agree). Would you like to try it out and see how we can extend it?

@py-why py-why locked and limited conversation to collaborators Sep 7, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion Discussion about causal inference and DoWhy's roadmap. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants