Add data: AMI DialSum Corpus

gcunhase · Dec 4, 2019 · 4c9feb8 · 4c9feb8
1 parent af79b5f
commit 4c9feb8
Show file tree

Hide file tree

Showing 7,832 changed files with 70,424 additions and 7 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 .idea/*
 *.pyc
 .DS_Store
-data/ami_*/*
+data/ami-*/*
 data/*/.DS_Store
 venv*/*
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 * Transforms into CNN-DailyMail News dataset (`.story` files with article and highlight in it)
 
 ### Contents
-[Requirements](#requirements) • [About AMI Meeting Corpus](#ami-corpus) • [How to Use](#how-to-use) • [How to Cite](#acknowledgement) 
+[Requirements](#requirements) • [About AMI Meeting Corpus](#ami-corpus) • [AMI DialSum Corpus](#ami-dialsum-meeting-corpus) • [How to Use](#how-to-use) • [How to Cite](#acknowledgement) 
 
 ## Requirements
 Tested on Python 3.6+, Ubuntu 16.04, Mac OS
@@ -86,16 +86,17 @@ python main_obtain_meeting2summary_data.py --summary_type abstractive
         * Return all the collected words as a paragraph
     * Output: `data/ami-summary/extractive/`
 
+## AMI DialSum Meeting Corpus
+* [DialSum](https://github.com/MiuLab/DialSum): modified version of the AMI Meeting Dataset
+* Use script `ami_dialsum_meeting_story.py`:
+    * This script takes 2 text files (`in` and `sum`) and formats it into a series of `.story` files compatible with the CNN/DM format
+    * Each line in file `in` corresponds to a meeting transcript with summary present in the same line in file `sum`/.
+
 ## Notes
 * XML reader in Python:
     * Minidom vs Element Tree: [Reading XML files in Python](http://stackabuse.com/reading-and-writing-xml-files-in-python/)
     * Minidom: XML parser for Python
 
-* Script `ami_dialsum_meeting_story.py`:
-    * This script takes 2 text files (`in` and `sum`) and formats it into a series of `.story` files compatible with the CNN/DM format
-    * Each line in file `in` corresponds to a meeting transcript with summary present in the same line in file `sum`/.
-    * Implemented to deal with a modified version of the AMI Meeting Dataset called [DialSum](https://github.com/MiuLab/DialSum).
-
 * TODO
     * Overlapping meeting transcript
     * Decision abstract

diff --git a/data/ami_dialsum_corpus/test/in b/data/ami_dialsum_corpus/test/in