This repository provides Python code to reproduce experiments from the article Gravity-Inspired Graph Autoencoders for Directed Link Prediction published in the proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019).
We release Tensorflow implementations of the following four directed graph embedding models from the paper:
- Gravity-Inspired Graph Autoencoders
- Gravity-Inspired Graph Variational Autoencoders
- Source-Target Graph Autoencoders
- Source-Target Graph Variational Autoencoders
together with standard Graph Autoencoders (AE) and Graph Variational Autoencoders (VAE) models from Kipf and Welling (2016).
We evaluate all six models on the three directed link prediction tasks introduced in section 4.1 of our paper:
- General Directed Link Prediction
- Biased Negative Samples Directed Link Prediction
- Bidirectionality Prediction
Our code builds upon Thomas Kipf's original Tensorflow implementation of standard Graph AE/VAE.
Note (December 2023): Kudos to Claudio Moroni for developing a PyTorch implementation of these models, publicly available here.
python setup.py install
Requirements: tensorflow (1.x), networkx, numpy, scikit-learn, scipy
cd gravity_gae
python train.py --model=gcn_vae --dataset=cora --task=task_1
python train.py --model=gravity_gcn_vae --dataset=cora --task=task_1
The above commands will train a Graph VAE (line 2) and a Gravity-Inspired Graph VAE (line 3) on Cora dataset and will evaluate node embdeddings on Task 1: General Directed Link Prediction, with all parameters set to default values.
Parameter | Type | Description | Default Value |
---|---|---|---|
model |
string | Name of the model, among: - gcn_ae : Graph AE from Kipf and Welling (2016), with 2-layerGCN encoder and inner product decoder - gcn_vae : Graph VAE from Kipf and Welling (2016), with Gaussian distributions, 2-layer GCN encoders and inner product decoder - source_target_gcn_ae : Source-Target Graph AE, as introduced in section 2.6 of paper, with 2-layer GCN encoder and asymmetric inner product decoder - source_target_gcn_vae : Source-Target Graph VAE, as introduced in section 2.6, with Gaussian distributions, 2-layer GCN encoders and asymmetric inner product - gravity_gcn_ae : Gravity-Inspired Graph AE, as introduced in section 3.3 of paper, with 2-layer GCN encoder and gravity-inspired asymmetric decoder - gravity_gcn_vae : Gravity-Inspired Graph VAE, as introduced in section 3.4 of paper, with Gaussian distributions, 2-layer GCN encoders and gravity-inspired decoder |
gcn_ae |
dataset |
string | Name of the dataset, among: - cora : scientific publications citation network, from LINQS - citeseer : scientific publications citation network, from LINQS - google : hyperlink network from web pages, from KONECT Note: you can specify any additional graph dataset, in edgelist format, by editing input_data.py |
cora |
task |
string | Name of the link prediction evaluation task, among: - task_1 : General Directed Link Prediction - task_2 : Biased Negative Samples Directed Link Prediction - task_3 : Bidirectionality Prediction |
task_1 |
dropout |
float | Dropout rate | 0. |
epoch |
int | Number of epochs in model training | 200 |
features |
boolean | Include node features or not in GCN encoder | False |
lamb |
float | "Lambda" parameter from Gravity AE/VAE models as introduced in section 3.5 of paper, to balance mass and proximity terms' |
1. |
learning_rate |
float | Initial learning rate (with Adam optimizer) | 0.1 |
hidden |
int | Number of units in GCN encoder hidden layer | 64 |
dimension |
int | Dimension of GCN output. It is: - equal to embedding dimension for standard AE/VAE and Source-Target AE/VAE models - equal to (embedding dimension - 1) for gravity-inspired AE/VAE models, as the last dimension captures the "mass" parameter Dimension must be even for Source-Target AE/VAE model |
32 |
normalize |
boolean | For Gravity models: whether to normalize embedding vectors | False |
epsilon |
float | For Gravity models: add epsilon to L2 distances computations, for numerical stability | 0.01 |
nb_run |
integer | Number of model runs + tests | 1 |
prop_val |
float | Proportion of edges in validation set (for Task 1) | 5. |
prop_test |
float | Proportion of edges in test set (for Tasks 1, 2) | 10. |
validation |
boolean | Whether to report validation results at each epoch (for Task 1) | False |
verbose |
boolean | Whether to print full comments details | True |
Cora - Task 1
python train.py --dataset=cora --model=gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
Cora - Task 2
python train.py --dataset=cora --model=gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=64 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=0.05 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=0.05 --normalize=True --nb_run=5
Cora - Task 3
python train.py --dataset=cora --model=gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=source_target_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
python train.py --dataset=cora --model=gravity_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
Citeseer - Task 1
python train.py --dataset=citeseer --model=gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
Citeseer - Task 2
python train.py --dataset=citeseer --model=gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=0.05 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=0.05 --normalize=True --nb_run=5
Citeseer - Task 3
python train.py --dataset=citeseer --model=gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=source_target_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
python train.py --dataset=citeseer --model=gravity_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.1 --hidden=64 --dimension=33 --lamb=1.0 --nb_run=5
Google - Task 1
python train.py --dataset=google --model=gcn_vae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gcn_ae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_vae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=10.0 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_ae --task=task_1 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=10.0 --nb_run=5
Google - Task 2
python train.py --dataset=google --model=gcn_vae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gcn_ae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_vae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=0.05 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_ae --task=task_2 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=0.05 --normalize=True --epsilon=1.0 --nb_run=5
Google - Task 3
python train.py --dataset=google --model=gcn_vae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gcn_ae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=source_target_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=32 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_vae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=10.0 --nb_run=5
python train.py --dataset=google --model=gravity_gcn_ae --task=task_3 --epochs=200 --learning_rate=0.2 --hidden=64 --dimension=33 --lamb=10.0 --nb_run=5
Notes:
- Set
--nb_run=100
to report mean AUC and AP, along with standard errors, over 100 runs, as in the paper - We recommend GPU usage for faster learning
Please cite our paper if you use this code in your own work:
@inproceedings{salha2019gravity,
title={Gravity-Inspired Graph Autoencoders for Directed Link Prediction},
author={Salha, Guillaume and Limnios, Stratis and Hennequin, Romain and Tran, Viet Anh and Vazirgiannis, Michalis},
booktitle={ACM International Conference on Information and Knowledge Management (CIKM)},
year={2019}
}