Skip to content

himsoklong/Vowel_Tuner

Repository files navigation

Vowel Tuner

This is the respository for the 2022-2023 Software Project (UE905 EC1) at IDMC (Nancy), under the supervision of Ajinkya KULKARNI, Esteban Marquer and Prof. Miguel Couceiro. The main objective of this project is building an application to help French learners to improve thier pronunciation.

Team members

This project involved four students in the second year of the Master's degree in Natural Language Processing:

  • Soklong HIM
  • Nora LINDVALL
  • Maxime MÉLOUX
  • Jorge VASQUEZ-MERCADO

Abstract

In this project, we aim to create a tool that can help learners of French to improve their pronunciation of French vowels. This was done by creating an application that allows users to record vowels. A classifier then determines whether the vowel is pronounced correctly or not. If the pronunciation is incorrect, the user is provided with personalized feedback. In order to find a good classifier, we implemented two approaches: a linguistic one, based on formant extraction, and a deep learning one, based on mel-spectrograms and using a convolutional neural network architecture. After initially testing both models on the All Vowels corpus, consisting of 5,755 vowels, we built a web application and tested it in real-life conditions. The linguistic model proved more robust to real-life recording conditions and achieved good performance in most cases.

Repository structure

  • README.md: this file contains important information for our project (you are here!).

  • Articles: this folder contain all the research papers we used for the literature review at the start of the project.

  • Code: this folder contains the Python code we used for our experiments and web application, in the form of Jupyter notebooks. In order of use:

    The following are auxillary/working files:

    The following are the full implementation of our models:

    • linguistic_model_reference_formants.ipynb implements the first full prediction model using reference formants. This approach was later abandoned.
    • all_vowel_extract.ipynb extracts full words or vowels from the input dataset and stores them into individual files. Information about the vowel's quality, position and speaker are encoded into the file name.
    • generate_mel_spectrograms.ipynb generates all mel-spectrogram images from a given dataset.
    • audio_crop.ipynb contains the implementation of the neural vowel extractor, which takes as input the full recording of a word (with possible leading/trailing silence) and outputs a cropped vowel only.
    • linguistic_model.ipynb implements the final formant-based classifiers, and compares their results.
    • neural_network.ipynb implements the final neural-based classifier, and displays its results.
    • Demo.ipynb combines all the elements above to get a vowel prediction from a full recording based on the chosen model.
    • results_exploration.ipynb generates quantitative data from the results of the final experimental setup.
  • Flask_VT: this folder is for our web application. It contains Python and JavaScript code for the web application. In particular:

    • models contains the model files (same as in models)
    • static contains the front-end part of the application, such as app.js (final application) and app_eval.js (application module)
    • templates contains the HTML pages of the application, such as index.html (final interface), eval.html (application module) and privacy.html (Privacy policy)
    • app.py and app_eval.py contain the Python code for the final application, based on Code\Demo.ipynb
  • models: This folder contains the binary files for our 2 main models, the neural network and linguistic models. The linguistic model is not included due to size limitations, but can be re-trained and saved using Code\linguistic_model.ipynb

  • presentation: In this folder are all the slides we presented during regular class sessions.

  • report: this folder contains our final report. If you want to check it on Overleaf. please click here

  • requirements.txt: this file contains all the libraries needed in order to run our code, including the web application.

  • Demo.mp4: this is a short video demo about our web application, based on our model.

Install instructions

  1. Clone our repository :
git clone https://github.com/himsoklong/Vowel_Tuner.git
  1. In other to run our code, including in Code or our Web app. we would recommend using a virtual environment. This can be done by following the instructions from the Python website
  2. Go into the project folder and install the needed packages with:
pip install -r requirements.txt
  1. Since the linguistic model is not small, you can either re-train it or download the pre-trained model from here. After that, move the downloaded file to the model directory for notebooks, and to the Flask_VT/models directory for the web application .

Usage

  1. To see our development process, you can check our code in the Code directory. You can run them from the terminal:
jupyter notebook
  1. You also run our web application to try our model by going to Flask_VT and typing the following command in your terminal:
flask run

Dataset information

This project mostly uses the All Vowels dataset, a private dataset recorded at LORIA.

This section contains additional corpora recorded from French speakers, for informative purposes. If you want to get access to them, please contact the owners.

L1:

  1. CFPP2000: Parisian French corpus
  2. MPF: Multicultural Parisian French corpus
  3. CFPQ: Québec French corpus
  4. Rhapsodie: Spoken French, annotated for prosody and syntax

L2:

  1. The Dresden Corpus: 32 German children learning French
  2. FLLOC: Dutch and English native speakers (teenagers) learning French
  3. The Newcastle Corpus: British high schoolers learning French
  4. TCD Corpus: 5 L2 French children from different countries
  5. The Reading Corpus: 16-year-old Welsh pupils learning French
  6. PAROLE: 40 L2 French students from different countries

Other:

  1. librivox

About

This is software project respository.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •