Forced Audio Alignment

Installation

Source

Install Kaldi.
Setup KALDI_ROOT in path.sh.
Install SLIRM and ensure SLIRM binaries are in PATH.
Download and build Kaldi Pretrained model using build_model.sh. This repo currently relies on Aspire Chain model. You can accordingly replace the model from here in build_model.sh.
Install Python Dependencies using requirements.txt.
Make the directories for storage.
```
mkdir -p s3/text s3/audio s3/faligned
```
Run the python server using python server.py

Docker

Build and run the provided dockerfile (It can take a long time to build. Prebuilt images like one in packages may not work due to different platforms).

Usage

The server runs on port 5000 by default. (Remember to bind it when using docker.)
Navigate to http://localhost:5000 to view the html form.
Enter the quiz bowl text as well as the audio file and click on upload.
Also add a unique id to reference this question.
Upon successful response, enter the same id in the play audio input box and click on play. The text would appear as the audio is being spoken with the current word being highlighted.
Unknown words would appear as [noise].
You can check out the .vtt files generated for every audio as a universal vtt format subtitle file for easy usage.
You can use the audio and transcipt in example folder for testing.

Demo

It uses the files given in example folder

View

Personal Notes

Qanta & QB-Interface

I was able to get qanta up and running as was able to fix some issues in the way.
- Qanta Config Mismatch here
- NLTK Download here
- StopIteration error for torchtext in python ^3.7
I achieved an accuracy of ~47% on DAN guesser. The notebook is available here as a example installation of qanta.
I tried intergrating with the interface, but I ran into several issues (also due to abscence of a readme) while running the qanta & qb_interface, some were :
- Twisted Web now only requires the path of folder i.e. web instead of HTML for these lines
- The db.sqlite downloaded using dataset.py has only 7 columns causing this to fail. I started with an empty sqlite db which surpassed this error but I again ran into more.
The work here is modular i.e. additional 3 routes for a web server and the frontend is built using ReactJS so there's only need to import additional scripts and add a <div id=root> tag for react render. So I beleive it could be easily integrated into qb_interface once the above issues are fixed.
The qb_interface uses websocket protocol for transfering text data. However I beleive websockets are not that efficient in transfering audio owing to higher latency. A better protocol like WebRTC/MPEG-DASH could be implemented for the same.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
conf		conf
downloads		downloads
example		example
g2p		g2p
img		img
scripts		scripts
static/js		static/js
steps		steps
utils		utils
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
align.sh		align.sh
build_model.sh		build_model.sh
create_L.sh		create_L.sh
create_new_lexicon.py		create_new_lexicon.py
extract_likelihood_per_frame.py		extract_likelihood_per_frame.py
map_kaldi_transitionids.py		map_kaldi_transitionids.py
path.sh		path.sh
readme.md		readme.md
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forced Audio Alignment

Installation

Source

Docker

Usage

Demo

Personal Notes

Qanta & QB-Interface

About

Releases

Packages

Languages

Mshivam2409/ForcedAlignment

Folders and files

Latest commit

History

Repository files navigation

Forced Audio Alignment

Installation

Source

Docker

Usage

Demo

Personal Notes

Qanta & QB-Interface

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages