This task is one that makes a comprehensive summary of multiple input documents without losing important information. We consider multi-document summarization task for long sequence modeling. To make the task more challenging, we set the maximum source text and target summary lengths to be 4,096 and 400, respectively. We take ROUGE(R-N) as task metric.
Noncausal Self
Model | R-1↑ | R-2↑ | R-L↑ |
---|---|---|---|
local | 38.50 | 10.54 | 35.39 |
Performer | 34.85 | 6.54 | 31.88 |
cosFormer | 34.77 | 6.34 | 31.74 |
FlashAttention | 34.64 | 6.52 | 31.66 |
ProbSparse | 34.62 | 6.36 | 31.64 |
Nyströmformer | 34.45 | 6.30 | 31.56 |
LongShort | 34.35 | 6.41 | 31.55 |
LARA | 34.03 | 6.23 | 31.23 |
ABC | 33.80 | 6.07 | 30.98 |
S4D | - | - | - |
Causal Self
Model | R-1↑ | R-2↑ | R-L↑ |
---|---|---|---|
S4D | 34.90 | 6.65 | 31.98 |
FlashAttention | 34.25 | 6.24 | 31.32 |
LongShort | 33.55 | 6.27 | 30.71 |
local | 33.50 | 6.27 | 30.74 |
ABC | 30.17 | 5.48 | 27.92 |
Causal Cross
Model | R-1↑ | R-2↑ | R-L↑ |
---|---|---|---|
ABC | 32.22 | 5.55 | 29.53 |
Performer | 27.22 | 3.88 | 25.21 |
We use Multi-News dataset for evaluation. The source and target texts of Multi-News contain ~2300 and ~280 tokens on average, respectively.
We use Transformer as backbone model. To easily reproduce the results, you can follow the next steps.
We conduct experiments with ParaGen. The setup of ParaGen refers to document.
Our preprossed Multi-News dataset is downloaded from GoogleDrive. The origin dataset is on Huggingface. The mutli-news dataset is preprocessed with
paragen-preprocess --config configs/preprocess.yaml
which tokenizes sentences with bart-base
tokenizer and transform the tokens into index.
We use 4×80GB A100 GPU to train a Transformer model as follows:
cd examples/summarization
python -m torch.distributed.launch --nproc_per_node 4 paragen-run --config train.yaml --lib efficient_transformers --env.fp16 True
For faster training with fp16
, please specify --env.fp16 True
.
paragen-run --config eval.yaml --lib summ,efficient_transformers --env.fp16 True