Wrong diacritics for Devanagari -> ISO/IAST/ITRANS #43

bwasty · 2021-11-19T11:21:09Z

I found several issues with transliterating diacritics from Devanagari (Hindi):

कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)
फ़ोन -> pha़ōna (iso/iast)
सड़क -> saḍa़ka (iso/iast)
ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)

By the way, great project, wrote 2 small tools with it already:

vvasuki · 2021-11-19T11:47:47Z

I found several issues with transliterating diacritics from Devanagari (Hindi):

कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)

What should this be in ISO?

सड़क -> saḍa़ka (iso/iast)

What should this be in ISO?

फ़ोन -> pha़ōna (iso/iast)

f is expected I suppose. Contribute a fix?

ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)

Contribute a fix?

By the way, great project, wrote 2 small tools with it already:

https://lipyantar.glitch.me/

https://forum.languagelearningwithnetflix.com/t/bookmarklet-for-hindi-transliterations/6279

bwasty · 2021-11-19T12:47:36Z

कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)

What should this be in ISO?

m̐k. Same in IAST (according to this. Here ˜ is shown, though the discussion page suggests m̐ is correct)

सड़क -> saḍa़ka (iso/iast)

What should this be in ISO?

saṛaka in ISO. For IAST it's not specified - so remove the dangling dot maybe? or use the same? For ITRANS it should be .Da or .Ra.

Related: ढ़ should become ṛha in ISO and .Dha/Rha in ITRANS.

फ़ोन -> pha़ōna (iso/iast)

f is expected I suppose. Contribute a fix?

Yes, for ISO and ITRANS. For IAST it's not specified - maybe do the same anyway?

ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)

Contribute a fix?

I'm not sure I understand Devanagari well enough yet (literally started learning a week ago), but I might try :)

vvasuki · 2021-11-19T14:12:45Z

कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)

What should this be in ISO?

m̐k. Same in IAST (according to this. Here ˜ is shown, though the discussion page suggests m̐ is correct)

No - you seem to be confusing कँ with कॅ.

bwasty · 2021-11-19T14:58:01Z

Ah, right, damn. Wikipedia shows ê for ॲ and ऍ.
The unicode block shows a few more characters with a 'candra', but I guess they have no transliteration?

vvasuki · 2021-11-20T16:15:04Z

Basically, problem is that transliterateBrahmic assumes that it's ok to transliterate character by character. It does not consider max token length (unlike https://github.com/indic-transliteration/indic_transliteration_py/blob/99fe6b2fd5b220794d1709e3297c919d58c4cfcc/indic_transliteration/sanscript/brahmic_mapper.py ). Porting the python code might work.

bwasty · 2021-11-20T17:43:18Z

Ok, I'll look into that after having a stab at #42 (since that 'annoys' me more and I found this interesting paper)

bwasty changed the title ~~Wrong diacritics for Devanagari <-> ISO/IAST/ITRANS~~ Wrong diacritics for Devanagari -> ISO/IAST/ITRANS Nov 19, 2021

bwasty mentioned this issue Nov 19, 2021

Nuqta not working even in Romanized Scripts that support it #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong diacritics for Devanagari -> ISO/IAST/ITRANS #43

Wrong diacritics for Devanagari -> ISO/IAST/ITRANS #43

bwasty commented Nov 19, 2021

vvasuki commented Nov 19, 2021

bwasty commented Nov 19, 2021

vvasuki commented Nov 19, 2021

bwasty commented Nov 19, 2021

vvasuki commented Nov 20, 2021

bwasty commented Nov 20, 2021

Wrong diacritics for Devanagari -> ISO/IAST/ITRANS #43

Wrong diacritics for Devanagari -> ISO/IAST/ITRANS #43

Comments

bwasty commented Nov 19, 2021

vvasuki commented Nov 19, 2021

bwasty commented Nov 19, 2021

vvasuki commented Nov 19, 2021

bwasty commented Nov 19, 2021

vvasuki commented Nov 20, 2021

bwasty commented Nov 20, 2021