Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

korp to speechdb #5

Open
fbanados opened this issue Sep 20, 2024 · 4 comments
Open

korp to speechdb #5

fbanados opened this issue Sep 20, 2024 · 4 comments

Comments

@fbanados
Copy link
Member

Access speech db exact entries from searches in korp.

@fbanados
Copy link
Member Author

@aarppe works for words
Screenshot 2024-09-20 at 5 38 09 PM

And for lemmas by clicking the fish that appears next to the lemma (will change formatting next week)

@aarppe
Copy link

aarppe commented Sep 20, 2024

Great to see that works in principle!

In practice what I had in mind is linking the passages in Korp to the original recordings those transcriptions are based on (rather than generally finding a recording for an individual word). Of course, that means that we have to get the actual original recordings, and then uploaded to Korp (which only applies for some of the texts).

@fbanados
Copy link
Member Author

That can be done. Special fields can be added per-corpus as well, so we can only have links to speech-db for those corpora once their data is up.

Would you like me to remove the current speech-db fields from the korp instance?

@aarppe
Copy link

aarppe commented Sep 23, 2024

Let us keep that feature there now for the possibility of demonstrations - eventually, we'd want to have that feature available primarily for the original audio (or a rerecorded substitute).

Actually, currently the only "corpus" for which we have all the (original) audio are the example sentences in the Maskwacîs collection, and come to think of it, parts of the (Mason) Bible that we haven't yet made available via Korp. The Bloomfield materials were collected about a hundred years ago, with Bloomfield writing down what he heard, so no original audio exists. For the Ahenakew-Wolfart corpus original audio exists, but we have received that for only for a few texts, and that unofficially from Arden. In both cases, we would need to undertake a respeaking of the contents, which is something we discussed with Arden during the summer, potentially having Dolores Sand read out some of the texts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants