This is the respository for the 2022-2023 Software Project (UE905 EC1) at IDMC (Nancy), under the supervision of Ajinkya KULKARNI, Esteban Marquer and Prof. Miguel Couceiro. The main objective of this project is building an application to help French learners to improve thier pronunciation.
This project involved four students in the second year of the Master's degree in Natural Language Processing:
- Soklong HIM
- Nora LINDVALL
- Maxime MÉLOUX
- Jorge VASQUEZ-MERCADO
In this project, we aim to create a tool that can help learners of French to improve their pronunciation of French vowels. This was done by creating an application that allows users to record vowels. A classifier then determines whether the vowel is pronounced correctly or not. If the pronunciation is incorrect, the user is provided with personalized feedback. In order to find a good classifier, we implemented two approaches: a linguistic one, based on formant extraction, and a deep learning one, based on mel-spectrograms and using a convolutional neural network architecture. After initially testing both models on the All Vowels corpus, consisting of 5,755 vowels, we built a web application and tested it in real-life conditions. The linguistic model proved more robust to real-life recording conditions and achieved good performance in most cases.
-
README.md
: this file contains important information for our project (you are here!). -
Articles
: this folder contain all the research papers we used for the literature review at the start of the project. -
Code
: this folder contains the Python code we used for our experiments and web application, in the form of Jupyter notebooks. In order of use:examples
contains sample recordings.
The following are auxillary/working files:
vowel_plot.ipynb
generates plots of vowels in formant space in our informal corpus.extract_formant.ipynb
extracts formants from a given audio file.Formant_vowel_prediction.ipynb
predicts a vowel from a set of formants, using reference formant valuesvowel_feedback_function.ipynb
generates feedback based on a perceived vs desired vowel.Audio_Spectrogram.ipynb
generates standard and mel-spectrograms from a given input file.
The following are the full implementation of our models:
linguistic_model_reference_formants.ipynb
implements the first full prediction model using reference formants. This approach was later abandoned.all_vowel_extract.ipynb
extracts full words or vowels from the input dataset and stores them into individual files. Information about the vowel's quality, position and speaker are encoded into the file name.generate_mel_spectrograms.ipynb
generates all mel-spectrogram images from a given dataset.audio_crop.ipynb
contains the implementation of the neural vowel extractor, which takes as input the full recording of a word (with possible leading/trailing silence) and outputs a cropped vowel only.linguistic_model.ipynb
implements the final formant-based classifiers, and compares their results.neural_network.ipynb
implements the final neural-based classifier, and displays its results.Demo.ipynb
combines all the elements above to get a vowel prediction from a full recording based on the chosen model.results_exploration.ipynb
generates quantitative data from the results of the final experimental setup.
-
Flask_VT
: this folder is for our web application. It contains Python and JavaScript code for the web application. In particular:models
contains the model files (same as inmodels
)static
contains the front-end part of the application, such asapp.js
(final application) andapp_eval.js
(application module)templates
contains the HTML pages of the application, such asindex.html
(final interface),eval.html
(application module) andprivacy.html
(Privacy policy)app.py
andapp_eval.py
contain the Python code for the final application, based onCode\Demo.ipynb
-
models
: This folder contains the binary files for our 2 main models, the neural network and linguistic models. The linguistic model is not included due to size limitations, but can be re-trained and saved usingCode\linguistic_model.ipynb
-
presentation
: In this folder are all the slides we presented during regular class sessions. -
report
: this folder contains our final report. If you want to check it on Overleaf. please click here -
requirements.txt
: this file contains all the libraries needed in order to run our code, including the web application. -
Demo.mp4
: this is a short video demo about our web application, based on our model.
- Clone our repository :
git clone https://github.com/himsoklong/Vowel_Tuner.git
- In other to run our code, including in Code or our Web app. we would recommend using a virtual environment. This can be done by following the instructions from the Python website
- Go into the project folder and install the needed packages with:
pip install -r requirements.txt
- Since the linguistic model is not small, you can either re-train it or download the pre-trained model from here. After that, move the downloaded file to the model directory for notebooks, and to the Flask_VT/models directory for the web application .
- To see our development process, you can check our code in the
Code
directory. You can run them from the terminal:
jupyter notebook
- You also run our web application to try our model by going to Flask_VT and typing the following command in your terminal:
flask run
This project mostly uses the All Vowels dataset, a private dataset recorded at LORIA.
This section contains additional corpora recorded from French speakers, for informative purposes. If you want to get access to them, please contact the owners.
- CFPP2000: Parisian French corpus
- MPF: Multicultural Parisian French corpus
- CFPQ: Québec French corpus
- Rhapsodie: Spoken French, annotated for prosody and syntax
- The Dresden Corpus: 32 German children learning French
- FLLOC: Dutch and English native speakers (teenagers) learning French
- The Newcastle Corpus: British high schoolers learning French
- TCD Corpus: 5 L2 French children from different countries
- The Reading Corpus: 16-year-old Welsh pupils learning French
- PAROLE: 40 L2 French students from different countries