Skip to content

Latest commit

 

History

History
17 lines (17 loc) · 1.39 KB

FEQA.md

File metadata and controls

17 lines (17 loc) · 1.39 KB

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

  • Neural abstractive summarization models are prone to generate content inconsistent with the source document i.e unfaithful.
  • This paper tackles the problem of evaluating faithfulness and also propose an automatic QA based metric.
  • Findings
    • Current models are limited by a trade off between abstractiveness and faithfulness.
  • Abstractiveness-Faithfulness tradeoff
    • Factual errors occur more frequently as models generate more abstractive summary sentences.
    • CNNDM is more extractive than XSUM
      • Extraction scores of the references summaries shows that half the sentences in CNNDM are formed by deleting words in one of the source sentences.
    • Models trained on CNN/DM are near extractive while models trained on XSUM are significantly more abstractive.
    • Additional inductive bias is needed to condense multiple sentences by rephrasing.
  • FEQA
    • The model correlated better with human judgment compared to other automatic metrics but is limited by the quality of the QA model.
  • Findings
    • Entailment metric does not have a significant correlation with faithfulness.
    • Models might be good at copying important source content, but tend to concatenate unrelated spans and hallucinate details when generating abstractive sentences.