An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
-
Updated
Dec 5, 2024 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
My book list
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
A list of Indonesian NLP resources.
A web-based engine for creating and annotating textual corpora
data resource untuk NLP bahasa indonesia
A curated list of NLP resources for Hungarian
Crawler for linguistic corpora
🕷️ The pipeline for the OSCAR corpus
Kanji usage frequency data collected from various sources
Data for the quantitative study of (Vedic) Sanskrit
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
An advanced, extensible web front-end for the Manatee-open corpus search engine
Large silver standart Russian corpus with NER, morphology and syntax markup
A textual corpus database for the digital humanities.
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
My solutions to selected exercises to "Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit" by Steven Bird, Ewan Klein, and Edward Loper.
A set of workflows for corpus building through OCR, post-correction and normalisation
Add a description, image, and links to the corpus-linguistics topic page so that developers can more easily learn about it.
To associate your repository with the corpus-linguistics topic, visit your repo's landing page and select "manage topics."