Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 443 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 443 Bytes

MLSUM-Catalan

A Catalan corpus based on https://github.com/recitalAI/MLSUM concepts.

Original context is from Vilaweb licensed under Attribution-NonCommercial-NoDerivs which allows sharing.

Files:

  • URLs used at urls/train.ca.txt.urls
  • Text and summaries: processed/ca_train.txt (2678 entries)

The text and summaries are in the same format that MLSum corpus (tab separated).