-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing a PyPI module #6
Comments
Hi! Your project looks really interesting, I should try it out. At first I thought that pymorphy2 might be enough as well, but after reading the paper I realized that it apparently does not consider the context of words, so unlike Spacy it can't detect plural/singular for words where this depends solely on context (like лица), which would reduce the accuracy of the stresser by a lot. So I probably can't do without Spacy. I really want to publish the module on Pypi, but first I planned to rework the database to make it smaller (right now it is needlessly large) and optimize the performance. And probably add some additional data. But it will definitely happen. |
What is the database? Does it generate a database beforehand, and then use it to label word stress? |
Also, is there a reason why fb2 cannot be supported directly? It is simply an XML file with the text in it, arguably much simpler than epub. |
You are right about the FB2 support, looks reasonable, I have added it to my TODO list. The database is being generated by my other project (You can find the stabler tested version in the releases here I think). I use this database also to create a Stardict dictionary (link in this post), which your program supports if I read correctly. So it would probably be really cool if this all worked together. Hosting it on a server is a good idea. Future versions will probably be faster as well. |
Interesting. I have also made one such dictionary from the kaikki.org dump, though a simple version with only the definitions (no examples) Also, there seems to be significant overlaps in the work we do :-) |
Yeah, the Kaikki data is really great. In my version I tried to get all the inflections and to link them properly up with the definitions (which sometimes is complicated when you have links that you would have to click several times to get at the original definition in Wiktionary, like with some diminutives). I also spent quite a long time trying to add the OpenRussian data, which has some additional words. I too don't have examples or parts of speech though. |
I didn't get to updating the dictionaries yet, so it might download old versions of them, but the package has been installable through I thought about it a bit and in principle it should be pretty easy to host everything on a server, I only have no experience with Docker stuff and getting it on a VPS and everything, so that would be the largest challenge. And I didn't get around to optimizing it fully yet, but at least I performed some benchmarks and there really does not appear to exist a better option that I am aware of 🙂. |
Hello!
I am the maintainer for VocabSieve. It would be great if you you can publish this to PyPI for programmatic use. All the existing ones (like
russtress
) does not consider context and make mistakes quite often.On another note, is it really necessary to have
spacy
for this? It is a rather large dependency and in my experience works somewhat slowly. Have you triedpymorphy2
? It seems to be able to tag words too.The text was updated successfully, but these errors were encountered: