I generally followed the strategy in the Interlingual-MFA repo.
- I didn’t use any beams, but just speaker adaptation.
- The original dictionary was turned to lowercase
- I ran the Armenian data against english_mfa, spanish_mfa, french_mfa, and german_mfa. These models were trained over 1000s of hours of data.
- I kept a copy of the phone_mapping that I used. For non-English, I suspect that there are probably more accurate possible mappings. For example, Armenian /ʁ, ɾ, r/ are all mapped to French /ʁ/. You can see French’s phones and other languages’ phones here Fr, Gr, Sp, En.
- I also ran an Armenian-based aligner Hy just for fun, no beam changes
- Below are the snippets that I used.
Assume we want to use the French MFA model to align the Eastern Armenian corpus.
- Convert dictionary using as input a pre-existing pronunciation dictionary and phone mapping file, to create an intermediate pronunciation dictionary:
python convertPronDict.py $OriginalPronunciationDictionary.tsv $PhoneMappingFile.txt $IntermediatePronunciationDictionary.txt
- Validate the intermediate pronunciation dictionary using the Eastern Armenian corpus:
mfa validate $Corpus $IntermediatePronunciationDictionary.txt french_mfa --ignore_acoustics
- Align the Eastern Armenian corpus with the French intermediate pronunciation dictionary. Speaker IDs are the first 4 characters in a recording's filename:
mfa align $Corpus $IntermediatePronunciationDictionary.txt french_mfa /EA_Fr --clean --overwrite --Speaker_characters=4
- Convert the alignments from French to Armenian
python convertAlignments.py wordTranscriptions.pkl /EA_Fr
- It would be nice to play with inter-annotator agreement via these aligners
- Perhaps the non-English models can be improved by one of you refining the phone-mapping files, by using knowledge of European phonetics wrt to English/Armenian
- The pauses may be problematic.