This is the PyTorch implementation of the paper: HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction. Liliang Ren, Chenkai Sun, Heng Ji, Julia Hockenmaier. ACL 2021 Findings [PDF] [Poster] [Slides]
Our implementation is based on the Facebook's fairseq toolkit.
In this work, we propose a Hybrid Span Generator (HySPA) that invertibly maps the information graph to an alternating sequence of nodes and edge types, and directly generates such sequences via a hybrid span decoder which can decode both the spans and the types recurrently in linear time and space complexities. Extensive experiments on the ACE05 dataset show that our approach also significantly outperforms state-of-the-art on the joint entity and relation extraction task.For reproducibility, a cuda version of 10.2 is needed. The required environment can be installed and activated with the following command lines:
conda env create -f environment.yml
conda activate hyspa
The fairseq toolkit can be then installed with the command lines:
cd HySPA
pip install --editable ./
# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
NVIDIA's apex library is also needed:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
First follow the ACE05 data preprocessing pipeline of Dygie++ and then copy the json files of the preprocessed data from the processed-data
directory (NOT the collated-data
directory) under the Dygie++ repository to our directory HySPA/data-bin/ace05/
. Then preprocess the data:
cd fairseq/data
python ace_dataset.py
cd ../..
bash run_seqie_acehsd.sh
The trained model will be stored in the directory save/HSDtransformer_ace
, and the tensorboard log files are stored in the directory tb/
.
bash run_gen.sh
In the run_gen.sh
script, we first generate the alternating sequence with the fairseq sequence generation pipeline, and then use the evaluate.py
script to convert the alteranating sequences back to graphs and evaluate the results on the ACE05 test set.
@inproceedings{ren-etal-2021-hyspa,
title = "{H}y{SPA}: Hybrid Span Generation for Scalable Text-to-Graph Extraction",
author = "Ren, Liliang and
Sun, Chenkai and
Ji, Heng and
Hockenmaier, Julia",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.356",
doi = "10.18653/v1/2021.findings-acl.356",
pages = "4066--4078",
}