This is a corpus of elicited controlled speech. The stimuli was a sequence of dialogues with intermittent fillers. This repository is for only the stimuli. The stimuli was designed to elicit intonation patterns for questions and answers in two Armenian dialects: Western Armenian (WA) and Eastern Armenian (EA). The recordings can be used for topics like intonation prosody, forced alignment, or ASR (Automatic Speech Recognition).
The dataset is is open-access at 8,852 dialogues, consisting of 23,711 utterances (individual sound files), for a total of 2.7GB and 8.5hrs. Each utterance has a sound file, a Praat TextGrid (with full linguistic annotation), and text file that has orthographic forms for easier ASR uses. Pronunciation dictionaries are provided for ASR or forced alignment purposes as well. We genereted a forced alignment for these recordings using a cross-language alignment thanks to Interlingual-MFA. See the Alignments folder.
If you use the data in any way, please cite us as:
Chakmakjian, Samuel and Hossep Dolatian. 2022. Speech corpus of Armenian question-answer dialogues.
A dialogue is made up of at least a question (Q) and an answer (A). Some dialogues include an interjection (I) and a negated verb (N). We call all these elements (Q, A, I, N) utterances.
The question and answer were SOV sentences. The dialogues were of three types, each with a different position of focus. Focus was either on the subject, object, or verb. Dialogues also varied in the choice of the object word. The object word could have either final stress, penultimate stress, or initial stress.
File utterance-metadata (in Excel and TSV versions) has metadata on the conditions for each recorded utterance.
The following is the template for the dialogues. The actual recordings vary in the TARGET word for the object. Note that for Western Armenian, our speakers were from Syria. They usually didn’t aspirate.
Type | Subject focus dialogue |
|||
---|---|---|---|---|
Question | IPA (WA) | *ov* | TARGET | əsɑv |
IPA (EA) | *ov* | TARGET | ɑsɑt͡sʰ | |
Gloss | who | TARGET | said | |
Translation | *Who* said TARGET? | |||
Orthography | Ո՞վ «TARGET» ըսաւ/ասաց։ | |||
Answer | IPA (WA) | *mɑɾjɑmə* | TARGET | əsɑv |
IPA (EA) | *mɑɾjɑmə* | TARGET | ɑsɑt͡sʰ | |
Gloss | Mariam | TARGET | said | |
Translation | *Mariam* said TARGET. | |||
Orthography | Մարիամը «TARGET» ըսաւ/ասաց։ |
Type | Object focus dialogue |
|||
---|---|---|---|---|
Question | IPA (WA) | mɑɾjɑmə | *int͡ʃ* | əsɑv |
IPA (EA) | mɑɾjɑmə | *int͡ʃʰ* | ɑsɑt͡sʰ | |
Gloss | Mariam | what | said | |
Translation | *What* did Mariam say? | |||
Orthography | Մարիամը ի՞նչ ըսաւ/ասաց։ | |||
Answer | IPA (WA) | mɑɾjɑmə | *TARGET* | əsɑv |
IPA (EA) | mɑɾjɑmə | *TARGET* | ɑsɑt͡sʰ | |
Gloss | Mariam | TARGET | said | |
Translation | Mariam said *TARGET*. | |||
Orthography | Մարիամը «TARGET» ըսաւ/ասաց։ |
Type | Verb focus dialogue |
|||
---|---|---|---|---|
Question | IPA (WA) | mɑɾjɑmə | TARGET | *ɡɑɾtɑt͡s* |
IPA (EA) | mɑɾjɑmə | TARGET | *kɑɾtʰɑt͡sʰ* | |
Gloss | Mariam | TARGET | read | |
Translation | Did Mariam *read* TARGET? | |||
Orthography | Մարիամը «TARGET» կարդա՞ց։ | |||
Interjection | IPA (WA) | vot͡ʃ | ||
IPA (EA) | vot͡ʃʰ | |||
Gloss | no | |||
Translation | No | |||
Orthography | Ոչ | |||
Answer | IPA (WA) | mɑɾjɑmə | TARGET | *əsɑv* |
IPA (EA) | mɑɾjɑmə | TARGET | *ɑsɑt͡sʰ* | |
Gloss | Mariam | TARGET | said | |
Translation | Mariam *said* TARGET. | |||
Orthography | Մարիամը «TARGET» ըսաւ/ասաց | |||
Negation | IPA (WA) | t͡ʃəɡɑɾtɑt͡s | ||
IPA (EA) | t͡ʃʰəkɑɾtʰɑt͡sʰ | |||
Gloss | not.read | |||
Translation | She didn't read. | |||
Orthography | չկարդաց։ |
In the typical case, each type of question and answer sentence had its own special intonational contour, summarized in the following table.
Focus type | Utterance | |
---|---|---|
Question (q) | Answer (a) | |
Subject focus (tS) | Pitch-rise on subject Post-focal deaccenting Final rise (WA) Final fall (EA) |
Pitch-rise on subject Post-focal deaccenting Final fall |
Object focus (tO) | Pitch-rise on object Post-focal deaccenting Final rise (WA) Final fall (EA) |
Pitch-rise on object Post-focal deaccenting Final fall |
Verb focus (tV) | Pitch-rise on verb = final rise Optional pre-focal deaccenting |
Optional pitch-rise on verb Final fall |
The TARGET word varies in its stress location. It has one of the following conditions.
Stress type (code) | Subcategory | Example WA | Example EA | Orthography | Translation |
---|---|---|---|---|---|
Final (s3) | dɑniki | tɑnikʰi | տանիքի | of the roof | |
Final (s3a) | adverb | sutoɾen | սուտորեն | falsely | |
Penult (s2) | ends in /-ə/ | kid͡zeɾə | ɡit͡seɾə | գիծերը | the lines |
Penult (s2s) | ends in /-əs/ | bɑdiʒəs | pɑtiʒəs | պատիժս | my punishment |
Penult (s2t) | ends in /-ət/ | mɑdidət | mɑtitət | մատիտդ | your punishment |
Initial (s1o) | ordinal | uteɾoɾt | utʰeɾoɾtʰ | ութերորդ | eighth |
Initial (s1a) | adverb | sudoɾen | սուտօրէն | falsely |
Recordings were made with 19 speakers: 10 for Eastern Armenian (5 female, 5 male) and 9 for Western Armenian (5 female, 4 male). In terms of origin, the Eastern Armenian speakers were from Yerevan, Armenia, while the Western Armenian speakers were from Aleppo, Syria. All 19 speakers were living in Yerevan during the time of the recording. Speaker metadata is in file speaker-metadata (in Excel and TSV versions).
The participants were recorded reading the dialogues on a PowerPoint presentation. In our annotation, we broke up each dialogue into its component utterances (Q, A, I, N) using a Praat script. Each utterance is found in the repository in the form of a sound file .wav
, a Praat TextGrid .TextGrid
, and a transcript file .txt
. Data is in the data folder.
We annotated the recordings with information on quality. Most recordings had little to no disfluencies or background noise. These are found in the data-few-issues.
Some recorded examples however had such problems. Files were annotated with the symbol _?
if they had a mild issue in data-moderate-issues, and _0
if they had a severe issue in data-severe-issues. We list such problems:
- Mild or moderate issues:
- focus-unclear: The intonation is ambiguous.
- laughing: The participant is laughing.
- noise-mild: There is mild background noise.
- pause-mild: There is a small felicitous pause in the middle of the sentence.
- pause-noise-mild: There is both mild background noise and a small pause.
- unclear-segments: A segment was pronounced unclearly.
- Severe issues:
- focus-wrong-intonation: The participant used the wrong intonation.
- noise-extreme: There is extreme background noise.
- pause-extreme: There is a long infelicitous pause in the middle of the sentence.
- pause-noise-extreme: There is both extreme noise and a long pause.
- not-template: The utterance was misread in a way that doesn't fit into our templates, such as omitting the subject.
- stutter-or-missing-sound: The participant stuttered in speech or omitted a sound.
We provided forced alignments using for the data-few-issues recordings. See the Alignments folder.
The recordings can be used for different purposes. We plan on using them for work on intonation phonetics and forced alignment. For phonetic studies, recordings with no or moderate issues can be suitable. But recordings with severe issues are not ideal or recommended. But for forced alignment, the recordings with severe issues might still be useful as a way to prevent overfitting or accommodating noisy data.
The transcript files .txt
are to make forced alignment tasks easier. The pronunciation dictionaries for Western Armenian and Eastern Armenian are for forced alignment purposes.
The dataset is made available to the research community licensed under the GNU General Public License v3.0.
Feel free to contact us at [email protected]
if you have any questions or concerns.