The public code for paper A Graph Convolutional Encoder and Decoder Model for Rumor Detection which is accepted by DSAA 2020
-
data
After decompress data.rar, you can get three folds named Twitter15,Twitter16, Weibo. Each directory contains two types of file: feature file and label file.
For feature file, it's a delimited file using '\t', which includes information such as 'eid', 'indexP', 'indexC', 'max_degree', 'maxL' and 'Vec'.eid: root id indexP: index of parent indexC: index of current max_degree: the total number of the parent node in the tree maxL: the maximum length of all the texts from the tree Vec: list of index and count
For label file, every root id corresponds a label.
- Process
- getTwittergraph.py
To deal with feature file and record the relationship between each node. Meanwhile, store the feature matrix of each node. Finally save all the information into file as '.npy' format. - getWeibograph.py
Done same operation as getTwittergraph.py - rand5fold.py
To deal with label file and generate 5-fold lists for valid-set and train-set. - process.py
To define an own PyG graph dataset to get batchsize of data.
- getTwittergraph.py
- tools
- earlystopping.py
In the experiment, we set patience equal to 10, that means when the score doesn't improve for 10 iterations, we will early stop training and save the model result. - earlystopping2class.py
Done same operation as earlystopping.py but for Weibo dataset. - evaluate.py
Define some criteria like accuracy and F1 score.
- earlystopping.py
- model
- GAE.py Our base model using GAE as Decoder Module
- VGAE.py Our base model using VGAE as Decoder Module
- only_gcn.py Comparative trial
- MVAE.py Comparative trial
- add_root_info.py Trick to enhance better representation of data
- base_BU.py Reverse the data flow
- bidirect.py Try to use two directions of data flow
- Model_Twitter.py Main function to run on Twitter
- Model_Weibo.py Main function to run on Weibo
We implement our models using the same set of hyper parameters in our experiments. The batch size is 128. The hidden dim is 64. The total process is iterated upon 50 epochs. The learning rate is 5e-4. We randomly split the datasets and conduct a 5-fold cross-validation and use acc. and f1 as criteria.
After decompress data.rar, using command
python getTwittergraph.py
With two arguments, first stands for dataset's name, the latter is the name of the model ('GCN','GAE','VGAE' can be chosen)
python Model_Twitter.py Twitter15 VGAE
Here we only show part of result in the experiment, more details can be seen in the paper.
model_name \ acc. | ||
---|---|---|
baseline | 0.737 | 0.908 |
only GCN | 0.840 | 0.935 |
AE-GCN | 0.851 | 0.942 |
VAE-GCN | 0.856 | 0.944 |
Except the main experiment, we also try some tricks to improve model, however we get the worse effect.
model_name | result |
---|---|
only GCN | 0.8396 |
one-layer GCN | 0.8498 |
two-layers GCN | 0.8367 |
GAT | 0.7879 |
GCN add root | 0.7374 |
bidirect | 0.8294 |
GAE | 0.8498 |
Bottom-up direction GAE | 0.3535 |