Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pro neslyšící #47

Open
lkrata opened this issue Nov 19, 2024 · 4 comments
Open

Pro neslyšící #47

lkrata opened this issue Nov 19, 2024 · 4 comments

Comments

@lkrata
Copy link

lkrata commented Nov 19, 2024

Dobrý den, nic neslyším a Mimiuchi používám na převod jakékoli řeči do písemné podoby (https://www.youtube.com/watch?v=t-OpwT7mZlU&t=8s). Poslední týden rozpoznaný text hrozně poskakuje, jak se opravuje. To je dobré na to, když někdo slyší a s textem potom dále pracuje.
Ale neslyšící si čte to poslední rozpoznané, případné chyby se domyslí z kontextu a k opravenému se nevrací. Prima by bylo, kdyby šly ty automatické opravy volitelně vypnout. Prozatímní text mám stejnou barvu (#FFFFFF) jako hotový. Jen to děsně poskakuje a nedá se číst.
S díky Kráťa www.kochlear.cz

@naeruru
Copy link
Owner

naeruru commented Nov 19, 2024

one quick fix to solve this is to just turn the "alpha" of interim text color all the way to 0. see pic

image

@lkrata
Copy link
Author

lkrata commented Nov 20, 2024

I tried, no text is displayed at all.

@naeruru
Copy link
Owner

naeruru commented Nov 20, 2024

Ah I think I understand now. This is a bit of a limitation of Web speech api (what I currently use to transcribe audio). It sends back what it THINKS someone is saying as they speak, and thus it might correct itself as someone speaks more because it better understands the context. This returned text that web speech gives is not something I control, and there is not a way for me to tell if its correct or not, because it does not know either. I can double check the outputs but I don't think I will be able to get anywhere with it unfortunately.

One future solution to this is that I plan to include another speech to text library, which might be a lot more accurate (but uses more resources and runs on your GPU).

@lkrata
Copy link
Author

lkrata commented Nov 21, 2024

Ještě tak před měsícem to neposkakovalo (viz https://www.youtube.com/watch?v=t-OpwT7mZlU)

mimiuchi-poskakuje.mp4

. Nedá se nějak uživatelsky to opravování vypnout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants