- The paper appeared in EMNLP 2020, and tries to look at the gap between automatic summarizers and human level summarizers.
- The contributions of the paper include quantifying sources of error and some empirical findings on the respective models that are tested.
- Findings
- Traditional methods are still strong baselines vs neural archs.
- Extractive better than abstractive in similar settings.
- Extractive models suffer from unnecessity
- Abstractive models suffer from intrinsic hallucination and omission
- Copy and coverage work generally for the intended purpose but copy causes redundancy to a certain degree wile coverage has issues in faithful content generation.
- Extractive-abstractive approaches reflect relative strengths and weaknesses of the two methods.
- Pretraining works better and achieves state of the art in both automatic and human evaluations.
- Polytope offers more fine grained information in quality evaluation
- It has some important empirical findings in case of not only what mechanisms work but what errors they introduce into the system
- The study could have been done on some other dataset, because research has shown that CNN daily mail has a leading bias, and some of the errors shown in the paper could have been eliminated or not present due to the consideration of another dataset.
- For example, the conclusion of statistical proving a strong benchmark. LEAD-3 is better in the case of CNN/daily mail because news articles have a bias of having summaries in the earlier sentences and this helps the cause.
- Consideration of wrong hybrid model/ no consideration of reinforcement learning models
- RL models have contributed to the growth of the field substantially and have had alternative ways of handling issues for example Chen el 2018 and Pasunuru et al 2017.