Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization

We provide the source code for the paper "Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization", accepted at ACL'19. If you find the code useful, please cite the following paper.

@inproceedings{cho-lebanoff-foroosh-liu:2019,
 Author = {Sangwoo Cho and Logan Lebanoff and Hassan Foroosh and Fei Liu},
 Title = {Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization},
 Booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
 Year = {2019}}

This repository contains the code for a similarity measure network using Capsule network.

Dependencies

This code is developed with the following environment:

Python 2.7
Keras 2.2.4
Tensorflow 1.12.0 backend
pip install -r requirements.txt

Train and evaluate on the CNN/DM summary pair dataset

Set up directory for training/testing data

$ git clone https://github.com/sangwoo3/summarization-dpp-capsnet.git & cd summarization-dpp-capsnet
$ mkdir data & cd data

Download the data

Download CNN/DM summary pair dataset from HERE and extract it under /data directory
- This summary dataset is pre-processed with 50k prevailing vocabularies in CNN/DM summary pair dataset. The label is 1 for a positive pair sentence, and 0 for a negative pair. The positive pair is a pair of a summary sentence and its most similar sentence in the source document that leads to the largest Rouge scores. The negative pair is a pair of the same summary sentence and a random sentence in the same document.
Download Glove word vectors of 50k vocabulary from HERE and place it under /data directory
- 6B tokens, 300d Glove word vectors are used LINK
If you want raw CNN/DM summary dataset, download from HERE.
- This data contains candiate summary sentences for each document. The data is pre-processed with the preprocess.py file to generate the above CNN/DM summary pair dataset.)

Training

$ python main_Capsnet.py

Testing

$ python main_Capsnet.py --testing

Testing on STS dataset

$ python main_Capsnet.py --testing --test_mode STS

Pre-trained Model

Download the pre-trained model from HERE and place it under /result/capnet_sim directory
- /result/capnet_sim is a default directory for training results
Download the model fine-tuned on STS dataset from HERE
- This model is trained on CNN/DM summary pair dataset and then fine-tuned on STS.
- It can be used to evaluate STS prediction accuracy.

System summary

We provide our best system summaries of DUC04 and TAC11. They are generated with DPP and in the system_summary directory. For DPP and multi-document dataset, we do not provide the code and dataset due to license. Please refer to DPP code and download DUC 03/04 and TAC 08/09/10/11 dataset with your request and approval.

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
result/capsnet_sim		result/capsnet_sim
system_summary		system_summary
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
capsulelayers.py		capsulelayers.py
cnn_dm.py		cnn_dm.py
main_Capsnet.py		main_Capsnet.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization

Dependencies

Train and evaluate on the CNN/DM summary pair dataset

Set up directory for training/testing data

Download the data

Training

Testing

Testing on STS dataset

Pre-trained Model

System summary

License

About

Releases

Packages

Languages

License

ucfnlp/summarization-dpp-capsnet

Folders and files

Latest commit

History

Repository files navigation

Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization

Dependencies

Train and evaluate on the CNN/DM summary pair dataset

Set up directory for training/testing data

Download the data

Training

Testing

Testing on STS dataset

Pre-trained Model

System summary

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages