Implementation of Causal Discovery? #98

dcompgriff · 2020-01-13T21:43:35Z

I'm curious if anyone is interested in folding causal discovery algorithms into the dowhy package? I currently use the 'Causal Discovery Toolkit' (cdt) along with my own code for performing causal discovery. I think that for sufficiently complex problem domains, causal discovery is a necessary first half of causal analysis.

amit-sharma · 2020-01-17T05:01:59Z

Thanks for the pointer @dcompgriff . The causal discovery toolbox (cdt) looks quite cool. I would definitely like to see causal discovery integrated with DoWhy.

However, @emrekiciman and I have been discussing on how exactly to integrate with DoWhy. One option is to use them upfront in the modelling stage. This has the benefit of helping people work with complex datasets, as you say. But many of the discovery algorithm do not handle unobserved confounders well, so any obtained graph may be susceptible to biases due to unobserved confounding. So we'll need some way of conveying to the users the exact assumptions on which the causal model is generated.

Another option is to let the user specify a graph in the model stage, but then use causal discovery algorithms to detect any obvious problems with the user's graph. Of course, to over-ride the user's graph, we will probably use only the edges on which the causal discovery algorithm is most certain about. This may need additional work (to identify which of the edges are more robust in the learnt causal graph), but may be a nice way to combine user's domain knowledge with the power of causal discovery algorithms. It may also convey to the user that causal discovery algorithms are better thought of as algorithmic suggestions, rather than the true correct graph. The downside, of course, is that the process (and the API) for doing this will look complicated. More generally, there's an opportunity to frame some of the causal discovery work as a refutation of the user's model.

What do you think about these two alternatives?
As a library, it might make sense for DoWhy to provide both options to the user, but it will be good to discuss how we would like the default experience to be.

emrekiciman · 2020-01-20T18:48:45Z

Thanks @dcompgriff! Agreed with Amit. Yes, we’d love to integrate data-driven causal discovery as a first step in DoWhy, but it’s unclear how to mitigate the mismatch between a causal discovery algorithm that generates an ambiguous graph (e.g., undirected edges) or has other biases. If there are ambiguities post-causal discovery, we would either need to have an interactive element that allow data scientists to refine them out based on their domain knowledge, or perhaps would want to treat the ambiguities as inputs to new kinds of “sensitivity” analyses. Or some combination. Figuring out what to do here is the key to a good integration, I think. It would be great to get your thoughts. Alternatively, using the core elements of causal discovery to validate a given causal graph for consistency against the data would be another way to start. From: Amit Sharma <[email protected]> Sent: Thursday, January 16, 2020 9:02 PM To: microsoft/dowhy <[email protected]> Cc: Emre Kiciman <[email protected]>; Mention <[email protected]> Subject: Re: [microsoft/dowhy] Implementation of Causal Discovery? (#98) Thanks for the pointer @dcompgriff<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdcompgriff&data=02%7C01%7Cemrek%40microsoft.com%7Cea8d85bdfaad4385286b08d79b0a63ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637148341233468645&sdata=Jscha5gAwxEQmX0WsV5VryJext4AKARg%2BH78Jwg3c5g%3D&reserved=0> . The causal discovery toolbox (cdt<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDiviyan-Kalainathan%2FCausalDiscoveryToolbox&data=02%7C01%7Cemrek%40microsoft.com%7Cea8d85bdfaad4385286b08d79b0a63ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637148341233478640&sdata=F09r6bcDIT1oqEAnp5soltxFt3CQUsx86KRF9dBunVY%3D&reserved=0>) looks quite cool. I would definitely like to see causal discovery integrated with DoWhy. However, @emrekiciman<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Femrekiciman&data=02%7C01%7Cemrek%40microsoft.com%7Cea8d85bdfaad4385286b08d79b0a63ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637148341233488639&sdata=aR75hw3%2FR%2BdjEnp4OvbY%2BCyPO50OV39mSkYoVUf8SeQ%3D&reserved=0> and I have been discussing on how exactly to integrate with DoWhy. One option is to use them upfront in the modelling stage. This has the benefit of helping people work with complex datasets, as you say. But many of the discovery algorithm do not handle unobserved confounders well, so any obtained graph may be susceptible to biases due to unobserved confounding. So we'll need some way of conveying to the users the exact assumptions on which the causal model is generated. Another option is to let the user specify a graph in the model stage, but then use causal discovery algorithms to detect any obvious problems with the user's graph. Of course, to over-ride the user's graph, we will probably use only the edges on which the causal discovery algorithm is most certain about. This may need additional work (to identify which of the edges are more robust in the learnt causal graph), but may be a nice way to combine user's domain knowledge with the power of causal discovery algorithms. It may also convey to the user that causal discovery algorithms are better thought of as algorithmic suggestions, rather than the true correct graph. The downside, of course, is that the process (and the API) for doing this will look complicated. More generally, there's an opportunity to frame some of the causal discovery work as a refutation of the user's model. What do you think about these two alternatives? As a library, it might make sense for DoWhy to provide both options to the user, but it will be good to discuss how we would like the default experience to be. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fdowhy%2Fissues%2F98%3Femail_source%3Dnotifications%26email_token%3DABNUPUBURB6XKRBYY2CVBW3Q6E3UTA5CNFSM4KGJII42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGOTAA%23issuecomment-575465856&data=02%7C01%7Cemrek%40microsoft.com%7Cea8d85bdfaad4385286b08d79b0a63ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637148341233488639&sdata=1JW5u4W7L0iMbMD71WTkegq1KlhDKYcoFhaxYLC2ZGU%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUDP4ZNJX5Z4SG22QO3Q6E3UTANCNFSM4KGJII4Q&data=02%7C01%7Cemrek%40microsoft.com%7Cea8d85bdfaad4385286b08d79b0a63ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637148341233498634&sdata=FYeveJ%2BG9H8XrWaYHFQT0UYfOrdsJInKDkWpiu0Rw6I%3D&reserved=0>.

dcompgriff · 2020-01-20T19:11:02Z

When to integrate causal discovery?
Causal discovery can be done completely before construction of the 'CausalModel' object. For example, today I use causal discovery algorithms to generate a networkx graph, and then feed this graph structure into the CausalModel because I can convert the networkx graph ito a glm format. Truthfully, this could probably just be it's own sub-module of dowhy, one that doesn't even have to change the API already available because it doesn't touch anything in the stages of analysis after you have the graph defined.
Discovery limitations.
You bring up a great point about limitations of causal discovery, and these should for sure be outlined as either a warning or just in the documentation.
Checking/refuting provided models?
I think this may be interesting to include as a optional flag during the modeling stage. When the CausalModel object is first created, there can simply be an optional flag for whether to validate the model's graph using causal discovery algorithms.
Ambiguous edges?
The way I deal with ambiguous edges is to manually examine the graph output from causal discovery, and then attempt to orient the edges I can using domain knowledge. Not exactly automated, but then again, classical causal inference already has it's own assumptions on the entire graph, so I feel this is ok. However, I think i've seen some packages that will output causal estimates for all graphs (even with ambiguous edges) by enumerating the edges in the graph, and then applying causal estimation. I'm less of a fan of this for more tha 2 ambiguous edges however, and don't think this should be incorporated regardless because this can already be done with the current package by having the user do this enumeration.

emrekiciman · 2020-01-24T23:32:50Z

Thanks very much. This is a very useful discussion! 1. As long as the output of this stage is a directed graph (i.e., no ambiguities in the discovered causal relationships), I agree that this could be it’s own sub-module that runs before the construction of the causal model object. 2. … 3. Unless the validation steps are prohibitive, it would be better to run them automatically and at least warn users that there’s a potential issue. 4. How can we help users with this manual inspection step? Do we pause and force users to fix the graph manually? I agree with you re: automatically handling causal estimates given an ambiguous graph unless the edges are quite irrelevant to the causal factors being estimated. From: dcompgriff <[email protected]> Sent: Monday, January 20, 2020 11:11 AM To: microsoft/dowhy <[email protected]> Cc: Emre Kiciman <[email protected]>; Mention <[email protected]> Subject: Re: [microsoft/dowhy] Implementation of Causal Discovery? (#98) 1. When to integrate causal discovery? Causal discovery can be done completely before construction of the 'CausalModel' object. For example, today I use causal discovery algorithms to generate a networkx graph, and then feed this graph structure into the CausalModel because I can convert the networkx graph ito a glm format. Truthfully, this could probably just be it's own sub-module of dowhy, one that doesn't even have to change the API already available because it doesn't touch anything in the stages of analysis after you have the graph defined. 2. Discovery limitations. You bring up a great point about limitations of causal discovery, and these should for sure be outlined as either a warning or just in the documentation. 3. Checking/refuting provided models? I think this may be interesting to include as a optional flag during the modeling stage. When the CausalModel object is first created, there can simply be an optional flag for whether to validate the model's graph using causal discovery algorithms. 4. Ambiguous edges? The way I deal with ambiguous edges is to manually examine the graph output from causal discovery, and then attempt to orient the edges I can using domain knowledge. Not exactly automated, but then again, classical causal inference already has it's own assumptions on the entire graph, so I feel this is ok. However, I think i've seen some packages that will output causal estimates for all graphs (even with ambiguous edges) by enumerating the edges in the graph, and then applying causal estimation. I'm less of a fan of this for more tha 2 ambiguous edges however, and don't think this should be incorporated regardless because this can already be done with the current package by having the user do this enumeration. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fdowhy%2Fissues%2F98%3Femail_source%3Dnotifications%26email_token%3DABNUPUF5RDUQHSBBCCG44STQ6XZMNA5CNFSM4KGJII42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNSOHA%23issuecomment-576399132&data=02%7C01%7Cemrek%40microsoft.com%7Cf145a943fdc54b9ba9db08d79ddc7e7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637151442658508014&sdata=qFQM0xY1%2BtnBTDn1cIeBPrxFCmBOUfwNKYelf3zz6BI%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUEVCW63Q6KNRWKCNGDQ6XZMNANCNFSM4KGJII4Q&data=02%7C01%7Cemrek%40microsoft.com%7Cf145a943fdc54b9ba9db08d79ddc7e7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637151442658508014&sdata=GMIqgAz%2Bc3uQxYgxHiMkRsEViCH6W75N7dKGgo4W5sY%3D&reserved=0>.

dcompgriff · 2020-01-25T16:57:16Z

Speaking to the handling of ambiguous edges:
I think that if the current code for performing identification/estimation/sensitivity analysis requires a DAG, then adding error code (unless you have it already) for when a graph with bi-directional edges is passed is at least one way to deal with the issue of ambiguous edges, forcing users to at least set the direction of these edges themselves, or find causal estimates given both directions.

As for how to force users to evaluate their graphs constructed from causal discovery... There's only so much that can be done from an API perspective I think. Outputting a warning for sure would be good, but some of it may just come down to more documentation. While causal discovery and causal estimation have some nice theoretical foundations, I've found that I've needed to be more involved with validating the graphs output by causal discovery. From everyone I've talked to using this analysis in industry, I think it's still 'best practices' to sit down and visually validate proposed graphs. Causal discovery is useful, but not perfect. It can help in providing insight into unknown causal directions in some cases, but in others it's non-sensical. For example, i've had discovery algorithms try to tell me that 'number of products purchased' was causal of 'total users' for a customer in one of my projects. I think the best thing to do is to output warnings about the potential issues of discovered graphs, and provide good tutorial&API documentation discussing these issues. But either way I still feel discovery algorithms are valuable to have. I'm personally wary of fully specifying the causal graph structure myself without at least trying FIC (Fast IC), GES (Greedy Equivalence Search), and other algorithms.

emrekiciman · 2020-02-01T16:20:36Z

This is a great discussion. Yes, this seems like a nice way to start the integration: 1. Adding error code to check for bi-directional arrows and forcing users to set the direction of the edges at least; and, 2) warning users to sanity check their discovered graph. We do something like #2 already, warning users about unobserved confounders. If this is something you’d like to contribute, it would be a wonderful addition! From: dcompgriff <[email protected]> Sent: Saturday, January 25, 2020 8:57 AM To: microsoft/dowhy <[email protected]> Cc: Emre Kiciman <[email protected]>; Mention <[email protected]> Subject: Re: [microsoft/dowhy] Implementation of Causal Discovery? (#98) Speaking to the handling of ambiguous edges: I think that if the current code for performing identification/estimation/sensitivity analysis requires a DAG, then adding error code (unless you have it already) for when a graph with bi-directional edges is at least one way to deal with the issue of ambiguous edges, forcing users to at least set the direction of these edges themselves, or find causal estimates given both directions. As for how to force users to evaluate their graphs constructed from causal discovery... There's only so much that can be done from an API perspective I think. Outputting a warning for sure would be good, but some of it may just come down to more documentation. While causal discovery and causal estimation have some nice theoretical foundations, I've found that I've needed to be more involved with validating the graphs output by causal discovery. From everyone I've talked to using this analysis in industry, I think it's still 'best practices' to sit down and visually validate proposed graphs. Causal discovery is useful, but not perfect. It can help in providing insight into unknown causal directions in some cases, but in others it's non-sensical. For example, i've had discovery algorithms try to tell me that 'number of products purchased' was causal of 'total users' for a customer in one of my projects. I think the best thing to do is to output warnings about the potential issues of discovered graphs, and provide good tutorial&API documentation discussing these issues. But either way I still feel discovery algorithms are valuable to have. I'm personally wary of fully specifying the causal graph structure myself without at least trying FIC (Fast IC), GES (Greedy Equivalence Search), and other algorithms. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fdowhy%2Fissues%2F98%3Femail_source%3Dnotifications%26email_token%3DABNUPUHEPW3L67CZVPVWHYDQ7RVOZA5CNFSM4KGJII42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5APLQ%23issuecomment-578422702&data=02%7C01%7Cemrek%40microsoft.com%7Cf7d0c17c9ad445a36bda08d7a1b7a26e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637155682383082923&sdata=CvN2%2BAkBP5qMivDdTWk3sKm7lxTb1hUVWASqmudctgE%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUBSAF4X6MDKQWQG5FTQ7RVOZANCNFSM4KGJII4Q&data=02%7C01%7Cemrek%40microsoft.com%7Cf7d0c17c9ad445a36bda08d7a1b7a26e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637155682383092920&sdata=9zmdqRmUALOYLizI2yIMAPYQA7Rkg%2BPBvI06Mk8phGw%3D&reserved=0>.

dcompgriff · 2020-02-01T17:11:20Z

Sure. I can work on making this contribution. I've been meaning to implement some of these algorithms anyways because I have some challenges with the existing ones and the constraints they allow me to specify before performing discovery.

emrekiciman · 2020-02-05T07:09:33Z

Thank you! There are a lot of folks who would appreciate this functionality. Please let us know how it is going, and if you have questions or want to discuss as you work on this From: dcompgriff <[email protected]> Sent: Saturday, February 1, 2020 9:11 AM To: microsoft/dowhy <[email protected]> Cc: Emre Kiciman <[email protected]>; Mention <[email protected]> Subject: Re: [microsoft/dowhy] Implementation of Causal Discovery? (#98) Sure. I can work on making this contribution. I've been meaning to implement some of these algorithms anyways because I have some challenges with the existing ones and the constraints they allow me to specify before performing discovery. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fdowhy%2Fissues%2F98%3Femail_source%3Dnotifications%26email_token%3DABNUPUGM3YNZMBFENVOMDRDRAWULRA5CNFSM4KGJII42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRBZ3Q%23issuecomment-581049582&data=02%7C01%7Cemrek%40microsoft.com%7C3b934a82b5c94b5d706908d7a739c22a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637161738820955431&sdata=oRP73IIX%2FDOssdM8aAqTUCJ0vPbwHUbvwrpASfQmYSs%3D&reserved=0>, or unsubscribe<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUCZNPUSUE2QHMDD65LRAWULRANCNFSM4KGJII4Q&data=02%7C01%7Cemrek%40microsoft.com%7C3b934a82b5c94b5d706908d7a739c22a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637161738820965386&sdata=aNbd%2FabRsSCA%2FOXI3P8JW4jiGURHfZLDclE%2BatNLaCU%3D&reserved=0>.

nsalas24 · 2020-03-23T19:14:23Z

Hey @dcompgriff,

You might find this repo useful as well: https://github.com/quantumblacklabs/causalnex

They implement the NO TEARS algorithm https://arxiv.org/abs/1803.01422

amit-sharma · 2020-03-24T05:53:24Z

thanks for sharing the link to causalnex @nsalas24. That looks like an excellent library for structure learning.

dcompgriff · 2020-03-24T16:57:48Z

Awesome, thanks @naslas24. I've been waiting for the quantum black folks to come out with their causal inference package. I'll definitely take a look at this.

yangliu2 · 2021-01-23T18:37:04Z

I don't have anything to add. But yes, add the causal discovery part to the package. So people can use both parts in a unified framework. This is nice btw.

BoltzmannBrain · 2021-08-07T15:57:51Z

FWIW my team has found problems with the aforementioned CausalNex NOTEARS for causal discovery, summarized well by others here: https://arxiv.org/abs/2104.05441

If there's initiative for adding causal discovery to dowhy @amit-sharma, happy to help in some capacity.

amit-sharma · 2021-08-09T08:50:34Z

yeah, I'd seen that paper too and realized that NOTEARS-like continuous optimizers are not ready yet for causal discovery.

Thanks for restarting this thread @BoltzmannBrain We just added an experimental implementation of causal discovery in DoWhy. It leans on the existing implementations of standard algorithms, and simply provides an API wrapper to standardize and allow multiple methods. Here's a notebook: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy_causal_discovery_example.ipynb

As you can see, this is very basic (and still the different methods do not agree). Would you like to try it out and see how we can extend it?

amit-sharma added discussion Discussion about causal inference and DoWhy's roadmap. enhancement New feature or request labels Jan 17, 2020

py-why locked and limited conversation to collaborators Sep 7, 2021

amit-sharma closed this as completed Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Implementation of Causal Discovery? #98

Implementation of Causal Discovery? #98

dcompgriff commented Jan 13, 2020

amit-sharma commented Jan 17, 2020

emrekiciman commented Jan 20, 2020 via email

dcompgriff commented Jan 20, 2020

emrekiciman commented Jan 24, 2020 via email

dcompgriff commented Jan 25, 2020 •

edited

Loading

emrekiciman commented Feb 1, 2020 via email

dcompgriff commented Feb 1, 2020

emrekiciman commented Feb 5, 2020 via email

nsalas24 commented Mar 23, 2020

amit-sharma commented Mar 24, 2020

dcompgriff commented Mar 24, 2020

yangliu2 commented Jan 23, 2021

BoltzmannBrain commented Aug 7, 2021

amit-sharma commented Aug 9, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Implementation of Causal Discovery? #98

Implementation of Causal Discovery? #98

Comments

dcompgriff commented Jan 13, 2020

amit-sharma commented Jan 17, 2020

emrekiciman commented Jan 20, 2020 via email

dcompgriff commented Jan 20, 2020

emrekiciman commented Jan 24, 2020 via email

dcompgriff commented Jan 25, 2020 • edited Loading

emrekiciman commented Feb 1, 2020 via email

dcompgriff commented Feb 1, 2020

emrekiciman commented Feb 5, 2020 via email

nsalas24 commented Mar 23, 2020

amit-sharma commented Mar 24, 2020

dcompgriff commented Mar 24, 2020

yangliu2 commented Jan 23, 2021

BoltzmannBrain commented Aug 7, 2021

amit-sharma commented Aug 9, 2021

This issue was moved to a discussion.

dcompgriff commented Jan 25, 2020 •

edited

Loading