- Install Kaldi.
- Setup
KALDI_ROOT
inpath.sh
. - Install SLIRM and ensure SLIRM binaries are in
PATH
. - Download and build Kaldi Pretrained model using
build_model.sh
. This repo currently relies on Aspire Chain model. You can accordingly replace the model from here inbuild_model.sh
. - Install Python Dependencies using
requirements.txt
. - Make the directories for storage.
mkdir -p s3/text s3/audio s3/faligned
- Run the python server using
python server.py
- Build and run the provided dockerfile (It can take a long time to build. Prebuilt images like one in packages may not work due to different platforms).
- The server runs on port
5000
by default. (Remember to bind it when using docker.) - Navigate to http://localhost:5000 to view the html form.
- Enter the quiz bowl text as well as the audio file and click on upload.
- Also add a unique id to reference this question.
- Upon successful response, enter the same id in the play audio input box and click on play. The text would appear as the audio is being spoken with the current word being highlighted.
- Unknown words would appear as [noise].
- You can check out the
.vtt
files generated for every audio as a universalvtt
format subtitle file for easy usage. - You can use the audio and transcipt in example folder for testing.
It uses the files given in example folder
-
I was able to get qanta up and running as was able to fix some issues in the way.
-
I achieved an accuracy of ~47% on DAN guesser. The notebook is available here as a example installation of qanta.
-
I tried intergrating with the interface, but I ran into several issues (also due to abscence of a readme) while running the qanta & qb_interface, some were :
-
Twisted Web now only requires the path of folder i.e.
web
instead of HTML for these lines -
The db.sqlite downloaded using dataset.py has only 7 columns causing this to fail. I started with an empty sqlite db which surpassed this error but I again ran into more.
-
-
The work here is modular i.e. additional 3 routes for a web server and the frontend is built using ReactJS so there's only need to import additional scripts and add a
<div id=root>
tag for react render. So I beleive it could be easily integrated into qb_interface once the above issues are fixed. -
The qb_interface uses websocket protocol for transfering
text
data. However I beleive websockets are not that efficient in transfering audio owing to higher latency. A better protocol likeWebRTC/MPEG-DASH
could be implemented for the same.