-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to derive the actual number of words per line for each chapter? #36
Comments
yes, your understanding is correct. and yes, it builds one line at a time. |
Are you sure this kind of data is not already available in some XML/JSON resource? I have done indirectly some node.js based development but I don't recognize the commands installation notes like the following:
are these expected to be executed inside some CLI? or some linux distro? |
you don't need to do any of those commands nor run this script itself - just download the database and import it and write a script yourself. |
By database you mean download the sql folder in this repo. I have MySQL Workbench, it's a beast I never got acquainted with all of its terms Open Model, ??? Which of these files should I be attempting to open? Would you suggest a better tool than MySQL Workbench 5.2.44 CE? By me writing scrips you mean write SQL queries to retrieve info, perhaps from glyph_line_page table? Thank you for holding my hands thru this. |
Will tajweed markings (eg. small-meen etc..) be appearing as separate rows in this table or lumped with the previous word (as a single glyph)? I feel like this is terrible, I would have to query and group count on ayah_number and minus one for the aya_number (hindi thingy) to get my word count??? I have a feeling I am going about this the wrong/difficult way |
This would be its data, matching the 7 words + 2 tajweed markers + 1 verse-number = 10 tokens How can I derive/detect that glyph_id = 264 is a verse number, I do not want to count??? |
Specifically for Page#2 this database is about this particular layout Matching query
The raw/net count of tokens per line follows: I happen to be working with the Tajweed version page2 is a bit different, that's ALRIGHT I will handle that. Again my current road block is detecting a token is a verse number??? |
the glyph table will tell you what "type" the glyph is - so you can exclude the ayah markers that way. |
awesome al7amdulillah! make sure to not include other things like pauses (so just include words). |
|
If I understand the main page description these "scripts" render from a font an image of the page then builds the rectangle bounds for each words (glyphs) generated (Correct?)
Does it also build one line bitmap at a time for a 15 lines/rows per page madina mushaf?
It sounds overly complex if all I want is the word count per line for each chapter.
Something like the following (showing word count for Fatiha, Baqara,)
The text was updated successfully, but these errors were encountered: