-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmojiSpider.py Refactor #15
Comments
Here's a mock up of what the new SQL schema might look like under 3: New SQL Schema-- Emojis table holds all the actual emoji data
CREATE TABLE Emojis (
name VARCHAR,
codepoint VARCHAR PRIMARY KEY
);
-- Thumbnails table associates emojis with the paths to their thumbails, in the
-- various different styles that are supported.
CREATE TABLE Thumbnails (
emoji VARCHAR,
apple VARCHAR, -- path to apple-style thumbnail
twemoji VARCHAR, -- path to twemoji-style thumbnail
noto VARCHAR, -- path to noto-style thumbnail
blobmoji VARCHAR, -- path to blobmoji-style thumbnail
FOREIGN KEY (emoji) -- emoji should be a codepoint from the Emojis table
REFERENCES Emojis (codepoint)
);
-- SkinToneVariants table associates emojis that are skin-tone variants of each
-- other.
CREATE TABLE SkinToneVariants (
'default' VARCHAR, -- 👌 default skin tone variant
light VARCHAR, -- 👌🏻 light skin tone variant
medium_light VARCHAR, -- 👌🏼 medium-light skin tone variant
medium VARCHAR, -- 👌🏽 medium skin tone variant
medium_dark VARCHAR, -- 👌🏾 medium-dark skin tone variant
dark VARCHAR, -- 👌🏿 dark skin tone variant
-- each skintone variant should be a codepoint from the Emojis table.
FOREIGN KEY ('default', light, medium_light, medium, medium_dark, dark)
REFERENCES Emojis (
codepoint, codepoint, codepoint,
codepoint, codepoint, codepoint
)
);
-- Keywords table holds keywords associated w/ emojis. Can be multiple keywords
-- per emoji.
CREATE TABLE KeyWords (
emoji VARCHAR,
keyword VARCHAR,
FOREIGN KEY (emoji) -- emoji should be a codepoint from the Emojis table
REFERENCES Emojis (codepoint)
);
-- Keywords table holds shortcodes associated w/ emojis. Can be multiple
-- shortcodes per emoji.
CREATE TABLE ShortCodes (
emoji VARCHAR,
shortcode VARCHAR,
FOREIGN KEY (emoji) -- emoji should be a codepoint from the Emojis table
REFERENCES Emojis (codepoint)
);
-- Create Indices for common lookups
CREATE INDEX idx_emoji_thumbnail ON Thumbnail (emoji);
CREATE INDEX idx_default_skintone ON SkinToneVariants ('default');
CREATE INDEX idx_emoji_keyword ON KeyWords (emoji);
CREATE INDEX idx_emoji_shortcode ON ShortCodes (emoji); Advantages:
Caveats:
|
I like the ideas you outlined here 👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ATM EmojiSpider.py pulls from 2 sources - a unicode website, and emojipedia.org. The unicode website appears to be built primarily for human viewing, and NOT for parsing. It also has codepoints that don't really match up with the actual emoji representations, since the
fe0f
sequence (which can be used to control for text vs emoji presentation) is often omitted.I propose a few changes:
These changes would make the code base more readable, and it would also enable users to more easily hack the SQLITE DB to add things they want. It might also make it easier to deliver new features.
The text was updated successfully, but these errors were encountered: