Fabbri 2020

This dataset contains expert and Turker annotations for summaries on the CNN/DailyMail dataset as collected in [1]. The setup command will save the summaries and references for all of the systems and their corresponding annotations and input documents. See this Github repository for more details.

sacrerouge setup-dataset fabbri2020 <output-dir>

The output files are the following:

summaries.jsonl: The model output summaries with their input documents and the ground-truth references
summaries-with-crowd.jsonl: The model output summaries with their input documents and the ground-truth and ten crowdsourced references
metrics.jsonl: The expert and Turker annotations that correspond to summaries.jsonl and summaries-with-crowd.jsonl
all-summaries-preproc-refs.jsonl.gz: All of the model outputs across the entire CNN/DM test dataset. The corresponding reference was maintained for each model output, which is some preprocessed version of the original references that appear in summaries.jsonl. That is, the outputs are grouped by the instance_id, but each instance_id may have many different references due to model preprocessing differences.
all-summaries-orig-refs.jsonl.gz: All of the model outputs across the entire CNN/DM test dataset. This version uses the documents and references as extracted by the huggingface CNN/DM scripts. The documents and references should be common across the same instance_id.

For all-summaries-preproc-refs.jsonl.gz and all-summaries-orig-refs.jsonl.gz, the aligned system outputs have duplicate instances. We only keep the first occurrence of any instance and ensure that the summary which was judged is selected.

Notes:

The raw data does not identify which reference summary is the original ground-truth reference, but after checking a handful of instances, it appears as if it is always the first reference in the list of references. That first reference is the one included in summaries.jsonl. (Confirmed)
To make the crowd summaries distinct, each is given a summarizer_id of turker- followed by a number from 1 to 10. It is not necessarily the case that the summaries identified by turker-i were all written by the same person and should not be treated as such.

Correlations

Here are the correlations of some of the metrics implemented in this library to the responsiveness scores in this dataset.

Single-reference, summary-level

	Fabbri2020
	r	p	k
R1-P	0.13	0.12	0.09
R1-R	0.31	0.28	0.23
R1-F1	0.28	0.26	0.20
R2-P	0.15	0.13	0.09
R2-R	0.26	0.23	0.18
R2-F1	0.23	0.19	0.14
BERTScore-P	0.17	0.17	0.13
BERTScore-R	0.37	0.35	0.27
BERTScore-F1	0.29	0.28	0.22
MoverScore	0.28	0.24	0.18
QAEval-EM	0.23	0.23	0.19
QAEval-F1	0.30	0.29	0.22

Single-reference, system-level

	Fabbri2020
	r	p	k
R1-P	0.29	0.15	0.03
R1-R	0.55	0.56	0.42
R1-F1	0.61	0.62	0.50
R2-P	0.49	0.41	0.25
R2-R	0.65	0.78	0.57
R2-F1	0.64	0.60	0.43
BERTScore-P	0.18	0.11	0.02
BERTScore-R	0.84	0.91	0.75
BERTScore-F1	0.54	0.40	0.28
MoverScore	0.56	0.54	0.42
QAEval-EM	0.80	0.91	0.77
QAEval-F1	0.82	0.91	0.77

Multi-reference, summary-level

	Fabbri2020
	r	p	k
R1-P	0.13	0.14	0.10
R1-R	0.33	0.29	0.23
R1-F1	0.36	0.33	0.25
R2-P	0.20	0.21	0.16
R2-R	0.34	0.31	0.24
R2-F1	0.33	0.29	0.22
BERTScore-P	0.18	0.19	0.14
BERTScore-R	0.42	0.38	0.29
BERTScore-F1	0.31	0.31	0.24
MoverScore	0.33	0.27	0.21
QAEval-EM	0.33	0.29	0.22
QAEval-F1	0.40	0.35	0.27

Multi-reference, system-level

	Fabbri2020
	r	p	k
R1-P	0.03	0.08	0.02
R1-R	0.38	0.30	0.23
R1-F1	0.55	0.77	0.58
R2-P	0.34	0.26	0.13
R2-R	0.41	0.29	0.23
R2-F1	0.57	0.64	0.43
BERTScore-P	0.13	0.14	0.05
BERTScore-R	0.80	0.85	0.70
BERTScore-F1	0.41	0.48	0.38
MoverScore	0.46	0.36	0.30
QAEval-EM	0.60	0.58	0.43
QAEval-F1	0.62	0.65	0.48

References

[1] Fabbri, Alexander R and Kryscinski, Wojciech and McCann, Bryan and Xiong, Caiming and Socher, Richard and Radev, Dragomir. "SummEval: Re-evaluating Summarization Evaluation". 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fabbri2020.md

fabbri2020.md

Fabbri 2020

Correlations

References

Files

fabbri2020.md

Latest commit

History

fabbri2020.md

File metadata and controls

Fabbri 2020

Correlations

References