Two csv
files must be provided.
A frequency csv
with columns:
input
: Lomaji inputfreq
: Raw frequency countchhan_id
: Lowest ID of any entry in the Chhan with this input
A conversions csv
with columns:
input
: Lomaji inputoutput
: Any text outputweight
: To order differentoutput
s with the sameinput
category
: An integer (0 = Default, 1 = Fallback, 2 = Extended)annotation
: Hint text to display during candidate selection
An optional plaintext list of toneless syllables may be provided, with
one syllable per line. All syllables from the input
columns of
both frequency and conversion files, and this additional syllables
list (if provided) will be included in the final output.
All data inputs are automatically deduplicated according to the following constraints:
- frequency: UNIQUE(input)
- conversions: UNIQUE(input, output), FOREIGN KEY(input) ON frequency(input)
The khiin
library will automatically build the SQLite database using these
CSVs during first run. The SQLite file will be saved into the user's app data
directory. There is also a simple CLI tool for building the database during
development.
To build the database using all default options:
cargo make build-db
# Or, after the first build:
cargo make rebuild-db
This will output the database into the resources
folder for inspection. For
more options, you can build the CLI tool directly, and run it to see the options:
cargo make build-db-cli
./target/debug/khiin_db_cli -h
The emoji table is taken directly from Unicode's Full Emoji List, v14.0.
- Smileys 🙂
- People & Body 👍
- Animals & Nature 🐱
- Food & Drink 🍌
- Travel & Places 🌍
- Activities ⚾
- Objects 🔔
- Symbols 🚻
- Flags 🏴☠️
Note: Not yet available with the rust dbgen tool. If you need it, revert back to 4e79459