SummEBR

Source code for the paper "Improving abstractive summarization with energy-based re-ranking" (D. Pernes, A. Mendes, and A. F. T. Martins). Presented at the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2022).

If you wish to use the code, please read the attached LICENSE.md.

Training

Training the EBR model involves the following steps:

Sampling candidates from an abstractive summarization model (BART or PEGASUS).
Scoring and ranking candidates according to the desired metric.
Fine-tuning BERT model using the sampled candidates and the ranking loss.

We provide a customizable train.sh script that can be used to train the EBR model.

Testing

We provide the model checkpoints and the candidate summaries used in the experimental evaluation. To download them, run:

curl ftp://ftp.priberam.com/SummEBR/ebr_models.tar.gz --user "ftp.priberam.com|anonymous":anonymous -o ./ebr_models.tar.gz

To reproduce the experiments:

If you haven't trained the model and want to use the data and checkpoints we provide:

Save data and checkpoints at ./data and ./checkpoints, respectively.

Compute ROUGE, QuestEval, and CTC scores for the test data you want to evaluate. E.g.:

 python scorer.py --source=./data/cnndm/bart/diverse-samples-test.jsonl --results_rouge=./data/cnndm/bart/results-rougel-test.jsonl --results_questeval=./data/cnndm/bart/results-questeval-test.jsonl --results_ctc=./data/cnndm/bart/results-ctc-test.jsonl

Use the desired EBR model to rank the test candidates. E.g.:

 python run-ranker.py --do_predict --gpus=1 -d ./data/cnndm/bart --metric=ctc_sum --checkpoint=./checkpoints/cnndm/bart/ebr-ctc_sum.ckpt --predictions_file=./data/cnndm/ebr-ctc_sum-predictions.jsonl

Get the results. E.g.:

 python score-ranked.py --predictions=./data/cnndm/ebr-ctc_sum-predictions.jsonl --scores_rouge=./data/results-rougel-test.jsonl --scores_questeval=./data/results-questeval-test.jsonl --scores_ctc=./data/results-ctc-test.jsonl

Citation

    @inproceedings{pernes-etal-2022-improving,
        title = "Improving abstractive summarization with energy-based re-ranking",
        author = "Pernes, Diogo  and
          Mendes, Afonso  and
          Martins, Andr{\'e} F. T.",
        booktitle = "Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)",
        month = dec,
        year = "2022",
        address = "Abu Dhabi, United Arab Emirates (Hybrid)",
        publisher = "Association for Computational Linguistics",
        url = "https://aclanthology.org/2022.gem-1.1",
        pages = "1--17",
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SummEBR

Training

Testing

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

SummEBR

Training

Testing

Citation