Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to generate causal graphs #68

Open
stevenlujpl opened this issue Oct 8, 2021 · 9 comments
Open

Add ability to generate causal graphs #68

stevenlujpl opened this issue Oct 8, 2021 · 9 comments
Assignees

Comments

@stevenlujpl
Copy link
Collaborator

No description provided.

@stevenlujpl stevenlujpl self-assigned this Oct 8, 2021
stevenlujpl added a commit that referenced this issue Oct 8, 2021
@stevenlujpl
Copy link
Collaborator Author

Hi @hannah-rae, @urebbapr , @wkiri , @emhuff ,

I've checked the initial implementation of causal graphs in the causal-graph branch.

Example outputs of causal graphs

The example outputs of causal graphs generated using sample_data/earth_fieldsamples/points_to_fit.csv (data_to_fit) and sample_data/earth_fieldsamples/kenya_points_to_predict.csv (data_to_score) are shown below. Please note that I filtered out 981 data points that contain missing values from the sample_data/earth_fieldsamples/points_to_fit.csv file.

  1. Cluster 0 causal graph
    causal_graph_cluster_0

  2. Cluster 1 causal graph
    causal_graph_cluster_1

  3. Cluster 2 causal graph
    causal_graph_cluster_2

  4. Cluster 3 causal graph
    causal_graph_cluster_3

  5. Cluster 4 causal graph
    causal_graph_cluster_4

  6. SOM clustering results
    SOM-demud.csv

Implementation summary

Causal graphs are currently implemented together with the kmeans or SOM clustering algorithm in the Results Organization module. This is how causal graphs are generated in the DES codebase, and for the initial implementation of causal graphs in DORA, I decided to do the same thing. I don't think clustering algorithms are necessary to generate causal graphs. It seems to me that we can generate causal graphs for individual data points instead of a group of data points. If generating causals graphs for individual data points is desired, I can add this ability in DORA. Please let me know what you think.

There is one issue that I don't know how to resolve yet. Causal graphs are generated using classes/functions in fges-py github repository, but this repository isn't installable (the authors don't provide a setup.py script). This isn't a big problem for us to use causal graphs on UMD/JPL machines. We can manually git clone the repository, and do something like sys.path.append("/PATH/TO/fges-py/") to import classes/functions we need. However, this will become a problem when we publish the DORA codebase to Pypi as a pip installable package. I will need to think more about how to resolve this problem. Please let me know if you have any suggestions.

Use causal graphs

For now, causal graphs must be generated with kmeans or SOM clustering algorithm. Please see the following example configs for Results Organization module:

  1. Use causal graphs with kmeans clustering algorithm
results: {
    kmeans: {
        n_clusters: 5,
        causal_graph: True
    }
}
  1. Use causal graphs with SOM clustering algorithm
results: {
    som: {
        n_clusters: 5,
        causal_graph: True
    }
}

There will be one causal graph generated per cluster group, and the causal graphs will be saved in the directory defined by out_dir option in the config file.

@stevenlujpl
Copy link
Collaborator Author

Please note that I am aware of the build failures (code formatting issues, please see the screenshot below) caused by the implementation of the causal graph. I can't fix these code formatting issues because I have to use sys.path.append('/PATH/TO/fges-py') so that I can import the classes/functions needed for causal graphs. I will come up with something to replace sys.path.append('/PATH/TO/fges-py') and fix the code formatting issues.

Screen Shot 2021-10-08 at 3 33 05 PM

@stevenlujpl
Copy link
Collaborator Author

stevenlujpl commented Oct 8, 2021

Below is a temporary solution to install DORA with causal graphs (for @hannah-rae to install it on UMD machine).

  1. Clone the fges-py github repository (https://github.com/eberharf/fges-py)
git clone https://github.com/eberharf/fges-py.git
  1. Pull the latest updates from causal-graph branch of the DORA repository
git pull origin causal-graph
  1. Replace the path in sys.path.append() with the path to fges-py repository on UMD machine.

sys.path.append("/Users/youlu/Desktop/dora/work/causal_graph/fges-py")
import SEMScore
import fges
import knowledge

  1. Go to the root directory of DORA repository, and run pip install . (please note the . at the end).

@stevenlujpl
Copy link
Collaborator Author

I changed the graph layout to be circular. With the circular layout, at least we can see what nodes are connected. Please take a look at the following examples, and let me know what you think. Thanks.
causal_graph_cluster_0
causal_graph_cluster_1
causal_graph_cluster_2
causal_graph_cluster_3
causal_graph_cluster_4

stevenlujpl added a commit that referenced this issue Oct 15, 2021
@wkiri
Copy link
Collaborator

wkiri commented Oct 18, 2021

@stevenlujpl I think these look great.

If you have time for tiny updates, I suggest
(1) highlighting (e.g. in red) any lines that connect to the "cluster" node (since they are of most immediate interest and I think the others are constant for all clusters),
(2) labeling "cluster" as "cluster X" to show the cluster index, and
(3) using a light color to fill the nodes (instead of dark blue) so that the black text on top is easier to read.

@stevenlujpl
Copy link
Collaborator Author

@wkiri Thanks for the comments. I've incorporated them into the code. In addition, I also added the sparsity parameter in the config file and seeded the SOM clustering algorithm. Please see the new graphs below (please note that the causal relations are different than the examples shown in the post above because the seed parameter used is different).
causal_graph_cluster_4

@wkiri
Copy link
Collaborator

wkiri commented Oct 18, 2021

@stevenlujpl The updated visualization looks fantastic!

@hannah-rae
Copy link
Contributor

@stevenlujpl Is this ready to be closed now?

@stevenlujpl
Copy link
Collaborator Author

@hannah-rae, Not yet. Currently, all the updates for causal graphs are in causal-graph branch. I am waiting to hear from Eric regarding whether the Caltech professor who developed fges-py will create a setup.py script to package the repository or not. Below are the items I need to complete before we can close this issue:

  • Depending on the answer from the Caltech professor:
    • If yes, I will need to update our own setup.py script to install the fges-py repository.
    • If no, I will need to fork the fges-py repository, create a setup.py for fges-py, and then update our own setup.py to install the forked fges-py repository
  • Resolve the flake8 format issues (see the post above)
  • Merge the code to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants