Repository for Frequency Word List Generator and processed files
In early days I hosted the generated files on OneDrive with my blog https://invokeit.wordpress.com/frequency-word-lists/ linking to it. Moving forward, the code and the generated outputs are on GitHub.
The data used to generate this lists can be found at http://opus.lingfil.uu.se/OpenSubtitles2016.php
Frequency lists are on the {word}{space}{numer_of_occurences_in_corpus}
. By example, in file en_50k.txt
:
you 22484400
i 19975318
the 17594291
to 13200962
...
These data are reused by various widely used opensource projects, among which Wikipedia, input methods and autocomplete keyoards, etc.
MIT License for codes.
CC-by-sa-3.0 for contents.
If you like to contribute towards my project, you can donate using PayPal button