Skip to content

Mshivam2409/ForcedAlignment

Repository files navigation

Forced Audio Alignment

Installation

Source

  • Install Kaldi.
  • Setup KALDI_ROOT in path.sh.
  • Install SLIRM and ensure SLIRM binaries are in PATH.
  • Download and build Kaldi Pretrained model using build_model.sh. This repo currently relies on Aspire Chain model. You can accordingly replace the model from here in build_model.sh.
  • Install Python Dependencies using requirements.txt.
  • Make the directories for storage.
    mkdir -p s3/text s3/audio s3/faligned
    
  • Run the python server using python server.py

Docker

  • Build and run the provided dockerfile (It can take a long time to build. Prebuilt images like one in packages may not work due to different platforms).

Usage

  • The server runs on port 5000 by default. (Remember to bind it when using docker.)
  • Navigate to http://localhost:5000 to view the html form.
  • Enter the quiz bowl text as well as the audio file and click on upload.
  • Also add a unique id to reference this question.
  • Upon successful response, enter the same id in the play audio input box and click on play. The text would appear as the audio is being spoken with the current word being highlighted.
  • Unknown words would appear as [noise].
  • You can check out the .vtt files generated for every audio as a universal vtt format subtitle file for easy usage.
  • You can use the audio and transcipt in example folder for testing.

View

Demo

It uses the files given in example folder

View


Personal Notes

Qanta & QB-Interface

  • I was able to get qanta up and running as was able to fix some issues in the way.

    • Qanta Config Mismatch here

    • NLTK Download here

    • StopIteration error for torchtext in python ^3.7

  • I achieved an accuracy of ~47% on DAN guesser. The notebook is available here as a example installation of qanta.

  • I tried intergrating with the interface, but I ran into several issues (also due to abscence of a readme) while running the qanta & qb_interface, some were :

    • Twisted Web now only requires the path of folder i.e. web instead of HTML for these lines

    • The db.sqlite downloaded using dataset.py has only 7 columns causing this to fail. I started with an empty sqlite db which surpassed this error but I again ran into more.

  • The work here is modular i.e. additional 3 routes for a web server and the frontend is built using ReactJS so there's only need to import additional scripts and add a <div id=root> tag for react render. So I beleive it could be easily integrated into qb_interface once the above issues are fixed.

  • The qb_interface uses websocket protocol for transfering text data. However I beleive websockets are not that efficient in transfering audio owing to higher latency. A better protocol like WebRTC/MPEG-DASH could be implemented for the same.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages