Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transliterateBrahmic doesn't recognize multi-byte characters #26

Open
chchch opened this issue Nov 17, 2020 · 2 comments
Open

transliterateBrahmic doesn't recognize multi-byte characters #26

chchch opened this issue Nov 17, 2020 · 2 comments

Comments

@chchch
Copy link

chchch commented Nov 17, 2020

Hi,

It looks like the transliterateBrahmic function was written before some scripts were added to Unicode, which are recognized by Javascript as multi-byte characters (e.g., grantha). This is probably why you're having trouble with transliterating from "superscripted" Tamil. To be able to transliterate from multi-byte or multi-character "Brahmic" scripts, you'll need to change the transliterateBrahmic function to mirror transliterateRoman. I'm using an older version of Sanscript.js, but it should look something like this:

var transliterateBrahmic = function(data, map, options) {
        var buf = [],
            consonants = map.consonants,
            hadRomanConsonant = false,
            letters = map.letters,
            marks = map.marks,
            dataLength = data.length,
            maxTokenLength = map.maxTokenLength,
            tempLetter,
            tokenBuffer = '',
            toRoman = map.toRoman,
            skippingTrans = false;

        for (var i = 0, L; (L = data.charAt(i)) || tokenBuffer; i++) {
            // Fill the token buffer, if possible.
            var difference = maxTokenLength - tokenBuffer.length;
            if (difference > 0 && i < dataLength) {
                tokenBuffer += L;
                if (difference > 1) {
                    continue;
                }
            }

            // Match all token substrings to our map.
            for (var j = 0; j < maxTokenLength; j++) {
                var token = tokenBuffer.substr(0,maxTokenLength-j);

                if((tempLetter = marks[token]) !== undefined && !skippingTrans) {
                    buf.push(tempLetter);
                    hadRomanConsonant = false;
                    tokenBuffer = tokenBuffer.substr(maxTokenLength-j);
                    break;
                } 
                else if((tempLetter = letters[token])) {
                    if (hadRomanConsonant) {
                        buf.push('a');
                        hadRomanConsonant = false;
                    }
                    buf.push(tempLetter);
                    hadRomanConsonant = toRoman && (token in consonants);
                    tokenBuffer = tokenBuffer.substr(maxTokenLength-j);
                    break;

                } else if (j === maxTokenLength - 1) {
                    if (hadRomanConsonant) {
                        buf.push('a');
                        hadRomanConsonant = false;
                    }
                    buf.push(token);
                    tokenBuffer = tokenBuffer.substr(1);
                }
            }
        }
        if (hadRomanConsonant) {
            buf.push('a');
        }
        return buf.join('');
    };

To make this work with your version of Sanscript.js, you'll need to change buf.push('a') to buf.push(map.toSchemeA). You'll probably also want to add in a check for the "#" character to skip transliteration.

It might also be nice to re-name the "transliterateBrahmic" function to something like "transliterateAbugida", since that's the issue here (it's about the inherent vowel) and "transliterateRoman" to something like "transliterateAlphabet" (no inherent vowel).

@vvasuki
Copy link
Member

vvasuki commented Nov 18, 2020

Why not send a pull request after ensuring that the tests pass?

@chchch
Copy link
Author

chchch commented Nov 18, 2020

I'm using an older version of the script that I've customized. I can send a pull request eventually, but it might take a while!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants