Releases
4.0.0
✨ Features
Queries for countless data types for countless languages were expanded and added ❤️
Scribe-Data is now a fully functional CLI.
Querying Wikidata lexicographical data can be done via the get
command (#159 ).
The output type of queries can be in JSON, CSV, TSV and SQLite, with converting output types also being possible (#145 , #146 )
Output paths can be set for query results (#144 ).
The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself (#186 , #157 ).
Total Wikidata lexemes for languages and data types can be derived with the total
command (#147 ).
Interactive and total commands can be used via an interactive mode with the --interactive
argument (#158 , #203 ).
Outputs were standardized to assure that the CLI experience is consistent
The machine translation process has been removed to make way for the Wiktionary based implementation (#292 ).
Package metadata files were standardized for languages, data types and Wikidata lexeme forms.
CLI commands have an argument check that can suggest correct languages and data types (#341 ).
🐞 Bug Fixes
Wikidata query process stages no longer trigger the tqdm progress bar when they're unsuccessful (#155 ).
✅ Tests
Tests have been written for the CLI to assure that it's functionality remains consistent.
Workflows were created to assure that the Wikidata queries and project structure are consistent to assure package functionality (#339 , #357 )
Project queries and its structure have been updated to match the rules developed for the checks.
📝 Documentation
The CLI's functionality has been fully documented (#152 , #208 ).
Documentation was created to show how to write Scribe-Data queries (#395 ).
♻️ Code Refactoring
word_type
has been switched to data_type
throughout the codebase (#160 ).
Case, gender and annotation utility functions were removed as the formatting process that used them has changed.
The SPARQLWrapper access method has been extracted to the Wikidata utils and is imported into the files that need it (#164 ).
Export data paths have been converted to centrally saved variables to reduce hard coded string repetition.
Many files were renamed including update_data.py
being renamed query_data.py
Paths within the package have been updated to work for all operating systems via pathlib
(#125 ).
The language formatting scripts have been dramatically simplified given changes to export paths all being the same.
The update_files
directory was removed in preparation of other means of showing data totals.
The language_data_extraction
directory was moved under the Wikidata directory as it's only used for those processes now (#446 ).
The emoji keyword process was centralized to simplify project maintenance (#359 ).
PyICU was removed as a dependency and a process was made to install it and its needed dependencies given the operating system of the user (#196 ).
The data formatting step was centralized such that we only have one for all languages (#142 ).
Sub-query processes are now no longer hard coded such that we'd need to maintain the total possible sub-queries within the query_data.py
process.
You can’t perform that action at this time.