Summarization Programs (ICLR 2023)

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Swarnadeep Saha, Shiyue Zhang, Peter Hase, and Mohit Bansal

Installation

This repository is tested on Python 3.8.12.
You should install SummarizationPrograms on a virtual environment. All dependencies can be installed as follows:

pip install -r requirements.txt

Dataset

We provide a small sample of the CNN/DM validation set in data folder. Each line contains the source document, the gold summary and unigram overlap percentages of each source sentence with respect to the summary. You can also pre-process your own Summarization dataset in the same format for running SP-Search.

For CNN/DM and XSum, we release the original samples and the searched programs (SP-Search) outputs here.

The documents folder contains the samples, pre-processed as discussed above. The SP-Search outputs are represented as follows.

Each line is a tab-separated entry consisting of the following:

Index (according to the original sample ID in documents folder)
Gold Summary (same as the summaries in documents folder)
SP_Search Summary (the searched summary that emulates the gold/human summary)
SP_Search program with intermediate generations (S1, S2, etc denote document sentences. I1, I2, etc denote intermediate generations after executing a neural module)
SP_Search program without intermediate generations (a more compact representation of the previous field. each tree is separated by square brackets)
ROUGE score between gold and SP_Search summary

RQ1: SP-Search

In order to identify Summarization Programs for human summaries, execute the following steps.

cd sp_search
python main.py

The pre-trained modules are available for download here. For paraphrase, we directly used the model available here. Download the other two modules and place them inside the modules directory.

Upon running the search, you will see outputs similar to what's there in the output folder. The sp_search.tsv file will save the Summarization Programs and the corresponding summaries. The folder sp_search will save the SPs in individual pdfs for visualization.

Compute ROUGE scores for the SP-Search summaries by running

cd scripts
python compute_spsearch_rouge.py

RQ2: SP Generation Models

Generating summaries via SP has three steps.

First, we fine-tune a BART model leveraging the SP-Search programs. Use the script scripts/train_sp_gen.sh. You need to process the training file in json format according to the HuggingFace transformers library.
Second, do inference from this model using sp_model/eval_sp.py to generate intermediate SPs.
Third, execute these generated programs via the pre-trained modules to obtain the final summaries. The script is sp_model/execute_sp.py.

Citation

@inproceedings{saha2023summarization,
  title={Summarization programs: Interpretable abstractive summarization with neural modular trees},
  author={Saha, Swarnadeep and Zhang, Shiyue and Hase, Peter and Bansal, Mohit},
  booktitle={ICLR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
data		data
output		output
scripts		scripts
sp_model		sp_model
sp_search		sp_search
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization Programs (ICLR 2023)

Installation

Dataset

RQ1: SP-Search

RQ2: SP Generation Models

Citation

About

Releases

Packages

Languages

License

swarnaHub/SummarizationPrograms

Folders and files

Latest commit

History

Repository files navigation

Summarization Programs (ICLR 2023)

Installation

Dataset

RQ1: SP-Search

RQ2: SP Generation Models

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages