diff --git a/devanagari/hi.css b/devanagari/hi.css index dfa461c47..71fe77d0b 100755 --- a/devanagari/hi.css +++ b/devanagari/hi.css @@ -29,4 +29,21 @@ -@media print { #freeText { font-size: 18px; } } \ No newline at end of file +@media print { #freeText { font-size: 18px; } } + + + + + + +.useBlockExamples .charExample .ex { + font-size: 3.3rem; + line-height: 1.2; + } +.useBlockExamples .charExample.inline .ex { + font-size: 1.4rem; + } + + + + diff --git a/devanagari/hi.html b/devanagari/hi.html index b6917995a..d5257a1a4 100755 --- a/devanagari/hi.html +++ b/devanagari/hi.html @@ -53,9 +53,9 @@

Contents

Updated - 19 November, 2022 + 13 December, 2022 -

+

@@ -108,9 +108,9 @@

Usage & history

-
+ - + @@ -118,19 +118,32 @@

Usage & history

Basic features

-

Devanagari is an abugida. Consonant letters have an inherent vowel sound. Combining vowel-signs are attached to the consonant to indicate that a different vowel follows the consonant. See the table in the right-hand column for a brief overview of features of the Hindi language.

-

Devanagari text runs left-to-right in horizontal lines.

-

Orthographic syllables (as opposed to phonetic syllables) play a significant role in Devanagari. An orthographic syllable starts at the beginning of any cluster of consonants and incorporates the whole cluster plus any following vowels and diacritics.

-

Phonetically, Hindi, like other Indic languages, has four forms of plosives, illustrated here with the bilabial stop: unvoiced p, voiced b, aspirated , and murmured . It also has a set of retroflex consonants. These are all represented separately in the orthography.

-

The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters.

-

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

-

As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

-

Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such.

-

The Hindi orthography has an inherent vowel, and represents vowels using 9-11 vowel-signs, including 1 prescript and no circumgraphs. All vowel-signs are combining marks, and are stored after the base character.

-

There are 10-12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds.

-

There are no composite vowels.

-

Vowels may be nasalised, using the candrabindu diacritic.

-

Hindi uses native number digits.

+

Devanagari is an abugida. Consonant letters have an inherent vowel sound. Combining vowel signs are attached to the consonant to indicate that a different vowel follows the consonant. See the table in the right-hand column for a brief overview of features of the Hindi language.

+ +

Devanagari text runs left-to-right in horizontal lines.

+ +

Orthographic syllables (as opposed to phonetic syllables) play a significant role in Devanagari. An orthographic syllable starts at the beginning of any cluster of consonants and incorporates the whole cluster plus any following vowels and diacritics.

+ +

Phonetically, Hindi, like other Indic languages, has four forms of plosives, illustrated here with the bilabial stop: unvoiced p, voiced b, aspirated , and murmured . It also has a set of retroflex consonants. These are all represented separately in the orthography. ❯ consonants

+ +

The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters. ❯ extendedC

+ +

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used. ❯ clusters

+ +

As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

+ +

Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such. ❯ finals

+ +

The Hindi orthography has an inherent vowel, and represents vowels using 9-11 vowel signs, including 1 pre-base and no circumgraphs. All vowel signs are combining marks, and are stored after the base character. ❯ vowels

+ +

There are 10-12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds. ❯ standalone

+ +

There are no composite vowels.

+ +

Vowels may be nasalised, using the candrabindu diacritic. ❯ nasalisation

+ +

Hindi uses native number digits. ❯ numbers

+

The Unicode Devanagari block contains more characters than other indic scripts, partly because it serves as a pivot script for transliterations of other scripts.

@@ -165,7 +178,7 @@

Consonants

Independent vowels

इ␣ई␣उ␣ऊ␣ए␣अ␣ओ␣ऐ␣औ␣आ
-
ऍ␣ऑ
+
ऍ␣ऑ
@@ -177,7 +190,7 @@

Vocalic

Other

-
ऽ␣ॐ
+
ऽ␣ॐ
@@ -700,6 +713,214 @@

Consonant sounds

Vowels

+

The Hindi orthography has an inherent vowel, and represents vowels using 9-11 vowel signs, including 1 pre-base and no circumgraphs. All vowel signs are combining marks, and are stored after the base character.

+ +

There are 10-12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds.

+ +

There are no composite vowels.

+ +

Vowels may be nasalised, using the candrabindu diacritic.

+ + + + + + + + +
+

Inherent vowel

+

+ +

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written using just the consonant letter. eg.

+ +

ka [U+0915 DEVANAGARI LETTER KA]

+ +
+ + + + + + + + + + +
+

Vowel signs

+ +

+ +

Non-inherent vowel sounds that follow a consonant are represented using vowel signs, eg.

+ +

की ki [U+0915 DEVANAGARI LETTER KA + U+0940 DEVANAGARI VOWEL SIGN II]

+ +

Devanagari vowel signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel signs with multiple parts. All vowel signs are typed and stored after the base consonant, and the font puts them in the correct place for display.

+

An orthography that uses vowel signs is different from one that uses simple diacritics or letters for vowels, in that the vowel signs are generally attached to an orthographic syllable, rather than just applied to the letter of the immediately preceding consonant. In other words, pre-base vowel sign components are rendered before a whole consonant cluster if that cluster is rendered as a conjunct (see prebase_vowels for an example).

+

Half the vowel signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant.

+ +

See also vocalics.

+
+ + + + + + +
+

Combining marks used for vowels

+ +

Hindi uses the following dedicated combining marks for vowels.

+ +
ि␣ी␣ु␣ू␣े␣ो␣ै␣ौ␣ा
+ +

It also includes 2 vowel signs used for sounds in foreign (especially English) loan words.

+ +
ॅ␣ॉ
+
+ + + + + + +
+

Pre-base vowel sign

+ +
ि
+ +

One vowel sign appears to the left of the base consonant letter or cluster, eg. +दिन +

+

This is a combining mark that is always typed and stored after the base consonant(s), ie. the codepoints follow the order in which the items are pronounced. The rendering process places the glyph before the base consonant without changing the code points.

+

It is actually placed before the start of an orthographic syllable. In fig_prebase the sequence of glyphs for the orthographic syllable is rendered VCC, whereas the pronunciation is CCV. In conjuncts with 3 consonants, it will still be rendered before the consonants.

+ + + +
+शक्ति +
+
A prebase vowel, pronounced after a consonant cluster, but rendered to the left of the conjunct.
+
detailsशक्ति
+
+
+ + +

However, if the cluster is split by a visible virama, this creates two syllables and the pre-base vowel sign appears after the last consonant with the virama. The sequence of displayed glyphs is now CVC. If the conjunct contains 3 consonants, the displayed order will be CCVC. +

+ + +
+शक्ति +
+
The same word, but without the conjunct. The vowel is now rendered to the left of the last consonant in the cluster.
+
detailsशक्ति
+
+
+
+ + + + + + + +
+

Standalone vowels

+

+ +

Devanagari represents standalone vowels using a set of ‘independent vowel’ letters. The set contains a character to represent the inherent vowel sound.

+

Independent vowels used by Hindi:

+ +
इ␣ई␣उ␣ऊ␣ए␣अ␣ओ␣ऐ␣औ␣आ
+ +

Two more are used for sounds in loan words.

+ +
ऍ␣ऑ
+ +

The following combinations are also regarded“” as letters of the alphabet.

+ +
अं␣अः␣अँ
+ +

Note the sound difference between the use of a standalone vowel vs. a vowel sign after a consonant: +नई nị̄ nəiː नी niː

+
+ + + + + + + + + + +
+

Nasalisation

+ +
ँ␣ं
+ +

Any vowel in Hindi can be nasalised, except for the vocalics.s

+

Nasalisation is usually indicated using [U+0901 DEVANAGARI SIGN CANDRABINDU], eg. +मुँह +

+

When a vowel sign rises above the head line, the glyph for this character may be simplified to just a dots, which can be written using + [U+0902 DEVANAGARI SIGN ANUSVARA] instead of candrabindu, eg. +हैं +

+

The distinction between use of [U+0901 DEVANAGARI SIGN CANDRABINDU] and [U+0902 DEVANAGARI SIGN ANUSVARA] is not always clearly defined. For example, snake can be written in both of the following ways: +साँप +सांप +

+
+ + + + + + + +
+

Vowel lengthening

+ +
+ +

An extra-long, sustained vowel sound can be indicated using [U+093D DEVANAGARI SIGN AVAGRAHA], eg. +आईऽऽऽ! <aiii!> +

+

This was originally used as a vowel elision marker in Sanskrit.

+
+ + + + + + + +
+

Consonants with no following vowel

+

The inherent vowel is not always pronounced. For example in Hindi it is not usually pronounced at the end of a word, +although a ghost echo may appear after a word-final cluster of consonants, eg. +योग्य +राष्ट्र +

+

In addition Hindi has a general rule that when a word has three or more syllables and ends in a vowel other than the inherent a, the penultimate vowel is not pronounced, eg. compareसमझ smjʱ səməɟʱ समझा smjʱā səmɟʱaːandरहन rhn rəhən रहना rhnā rəhnaː

+

(For a number of reasons, +however, this rule does not always hold.)

+ +
+ +

Devanagari uses [U+094D DEVANAGARI SIGN VIRAMA] (called halant in Hindi) to kill the inherent vowel after a consonant. The virama is rarely seen. As just mentioned, no virama is used at the end of a word, or in the penultimate syllable where the above rules apply. The virama is also usually hidden when the consonant is part of a consonant cluster (see clusters). The virama is visible, however, if it isn't followed by a consonant, eg. the following explicitly represents just the sound k,क्

+
+ + + + + + + +

Vowel sounds to characters

@@ -713,56 +934,56 @@

Plain vowels

d
-

[U+0940 DEVANAGARI VOWEL SIGN II], eg. तीन.

+

[U+0940 DEVANAGARI VOWEL SIGN II]

 
i
-

[U+0908 DEVANAGARI LETTER II], eg. ईंट.

+

[U+0908 DEVANAGARI LETTER II]

 
i
-

[U+0907 DEVANAGARI LETTER I], eg. इन्सान.

+

[U+0907 DEVANAGARI LETTER I]

ʊ
d
-

[U+0941 DEVANAGARI VOWEL SIGN U], eg. सुस्त.

+

[U+0941 DEVANAGARI VOWEL SIGN U]

 
i
-

[U+0909 DEVANAGARI LETTER U], eg. उड़ना.

+

[U+0909 DEVANAGARI LETTER U]

 
i
-

[U+090A DEVANAGARI LETTER UU], eg. ऊपर.

+

[U+090A DEVANAGARI LETTER UU]

@@ -773,28 +994,28 @@

Plain vowels

d
-

[U+0947 DEVANAGARI VOWEL SIGN E], eg. बेटा.

+

[U+0947 DEVANAGARI VOWEL SIGN E]

 
i
-

[U+090F DEVANAGARI LETTER E], eg. एक.

+

[U+090F DEVANAGARI LETTER E]

d
-

[U+094B DEVANAGARI VOWEL SIGN O], eg. टोपी.

+

[U+094B DEVANAGARI VOWEL SIGN O]

 
i
-

[U+0913 DEVANAGARI LETTER O], eg. ओस.

+

[U+0913 DEVANAGARI LETTER O]

@@ -886,52 +1107,72 @@

Plain vowels

◌̃
-

[U+0901 DEVANAGARI SIGN CANDRABINDU], eg. दाँत.

-

[U+0902 DEVANAGARI SIGN ANUSVARA]  hiᵑͫdi hiⁿͫdi

+

[U+0901 DEVANAGARI SIGN CANDRABINDU]

+

[U+0902 DEVANAGARI SIGN ANUSVARA]

Sources: Wikipedia, and Google Translate.

+ + +
+

Vocalics

+

+

In Devanagari, vocalics are available both as vowel signs and independent vowels.

+

Hindi generally uses just one vocalic.

+
ृ␣ऋ
+
+

Other vocalics are used for Sanskrit.

+
ॄ␣ॢ␣ॣ␣ॠ␣ऌ␣ॡ
+
+
-
+ +
+

Consonants

+

Phonetically, Hindi, like other Indic languages, has four forms of plosives, illustrated here with the bilabial stop: unvoiced p, voiced b, aspirated , and murmured . It also has a set of retroflex consonants. These are all represented separately in the orthography.

-
-

Inherent vowel

-

+

The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters.

-

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter [U+0915 DEVANAGARI LETTER KA].

-
+

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

+

As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

+

Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such.

- +
+

Basic consonants

+ + +

Basic set of consonants, used for Hindi and Sanskrit. (Phonetic information for Hindi.)

+
प␣फ␣ब␣भ␣त␣थ␣द␣ध␣ट␣ठ␣ड␣ढ␣क␣ख␣ग␣घ
-
-

Vowel-signs

-

+
च␣छ␣ज␣झ
-

Non-inherent vowel sounds that follow a consonant are represented using vowel-signs, eg. kiː is written की [U+0915 DEVANAGARI LETTER KA + U+0940 DEVANAGARI VOWEL SIGN II].

-

An orthography that uses vowel-signs is different from one that uses simple diacritics or letters for vowels, in that the vowel-signs are generally attached to the syllable, rather than just applied to the letter of the immediately preceding consonant (see prescript_vowels for an example).

-

Devanagari vowel-signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel-signs with multiple parts. All vowel-signs are typed and stored after the base consonant, and the font puts them in the correct place for display.

-

Half the vowel-signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant.

+
व␣स␣श␣ष␣ह
-

See also vocalics.

+
म␣न␣ञ␣ण␣ङ
+ +
व␣र␣ल␣य
+ +

Hindi also counts 3 character combinations as consonantal letters of the alphabet.

+
त्र␣ज्ञ␣क्ष
@@ -939,161 +1180,271 @@

Vowel-signs

-
-

Combining marks used for vowels

+
+

Repertoire extension

-

Hindi uses the following dedicated combining marks for vowels.

-
ि␣ी␣ु␣ू␣े␣ो␣ै␣ौ␣ा
+
-

It also includes 2 vowel-signs used for sounds in foreign (especially English) loan words.

-
ॅ␣ॉ
-
+

[U+093C DEVANAGARI SIGN NUKTA] is used to represent foreign sounds, eg. in the following example the dot changes to ख़ x +ख़ारीदारी +

+ +

A list of graphemes used in Hindi that combine nukta with an existing consonant. These are all counted as letters of the Hindi alphabet. The 5th one is very rare.

+
क़␣फ़␣ज़␣झ़␣श़␣ख़␣ग़␣ड़␣ढ़
+

The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

+

The Unicode block also contains the following precomposed code points for the sequences used in Hindi.

+ +
क़␣फ़␣ज़␣ख़␣ग़␣ड़␣ढ़
+

The Unicode Standard recommends not to use the precomposed code points for Hindi, but instead to use the base+nukta sequences. See also nukta_encoding for more information.

+
-
-

Pre-base vowel-sign

-
ि
-

One vowel-sign appears to the left of the base consonant letter or cluster, eg. -दिन -

-

This is a combining mark that is always typed and stored after the base consonant(s). The font places the glyph before the base consonant.

-

It is actually placed before the start of the syllable. This means that a word with a consonant cluster at the start displays the pre-base vowel more than one consonant character away from the place where it is pronounced, eg. -शक्ति -

-

Note, however, that if the cluster is split by a visible virama, this creates two syllables and the pre-base vowel-sign appears after the consonant with the virama. If you click on the example below, you'll see that the characters and code point orders are the same as for the previous example (apart from the addition of the ZWNJ to force the virama to appear), but the location of the pre-base vowel-sign is now immediately before the consonant after which it is pronounced. -शक्‌ति -

-
+
+

Final consonants

+

Although traditionally classified as vowels, 2 diacritics represent syllable-final consonant sounds.

+
ं␣ः
-
-

Standalone vowels

-

+

Nasal sounds m n ŋ that are homorganic with a following consonant are commonly written using [U+0902 DEVANAGARI SIGN ANUSVARA]. This mark is positioned over the previous consonant, eg. +हिंदी +

+

Most words that use the anusvara can also be written using the consonant itself, eg. +हिन्दी +

+

In some cases, however, the anusvara form is more common. For example, the first of the two following alternatives is much more common +पंजाब +*पञ्जाब +

+

Some words, mostly Sanskrit loan words, may end with a voiceless h after a vowel which can be written using [U+0903 DEVANAGARI SIGN VISARGA], eg. +पुनः +दुःखी +

-

Devanagari represents standalone vowels using a set of independent vowel letters. The set contains a character to represent the inherent vowel sound.

-

Independent vowels used by Hindi:

+

See also the candrabindu diacritic, which nasalises a vowel.

+
+ + + + + + +
+

Consonant clusters

-
इ␣ई␣उ␣ऊ␣ए␣अ␣ओ␣ऐ␣औ␣आ
+

+

-

Two more are used for sounds in loan words.

+

The absence of a vowel sound between two or more consonants can be visually indicated in one of the following ways.

-
ऍ␣ऑ
+
    +
  1. Create a conjunct. There are a number of possibilities here:
      -

      The following combinations are also counted as letters of the alphabet.

      +
    1. Half-forms : Reduce the shape of all consonants in the cluster except the last to a 'half-form' by removing the vertical stroke.
    2. -
      अं␣अः␣अँ
      +
    3. Stacking : Reduce a non-initial consonant in size and shape and position it below the first.
    4. -

      Note the sound difference between the use of a standalone vowel vs. a vowel-sign after a consonant:नई nị̄ nəiː नी niː

      -
+
  • Special ligation : Create a fusion of the two shapes, where one or other of the components may not be easily recognisable.
  • +
  • The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  • + +
  • Show a visible virama below the non-final consonants in the cluster.
  • +
  • No indication, although there are usually generalised pronunciation rules that allow readers to spot these locations. Examples of these rules are given in the section about the inherent vowel.
  • + +

    See also doubling.

    -
    -

    Nasalisation

    +
    +

    Conjunct formation

    -
    ँ␣ं
    +

    See a table of 2-consonant clusters.
    The table allows you to test results for various fonts.

    -

    Any vowel in Hindi can be nasalised, except for the vocalics.s

    -

    Nasalisation is usually indicated using [U+0901 DEVANAGARI SIGN CANDRABINDU], eg. -मुँह -

    -

    When a vowel-sign rises above the head line, the glyph for this character may be simplified to just a dots, which can be written using - [U+0902 DEVANAGARI SIGN ANUSVARA] instead of candrabindu, eg. -हैं -

    -

    The distinction between use of [U+0901 DEVANAGARI SIGN CANDRABINDU] and [U+0902 DEVANAGARI SIGN ANUSVARA] is not always clearly defined. For example, snake can be written in both of the following ways: -साँप -सांप +

    + +

    To produce a conjunct, [U+094D DEVANAGARI SIGN VIRAMA] is added between the consonants in the cluster. There are exceptions, but this type of virama is usually not displayed, eg. the sequence + + [U+0915 DEVANAGARI LETTER KA + U+094D DEVANAGARI SIGN VIRAMA + U+0937 DEVANAGARI LETTER SSA] produces +क्ष

    + + +

    The font usually determines which visual method is used, although it is possible to influence this (see joiner).

    + +

    Click on the figures below to see which characters are being shown.

    + + +
    +

    Conjoined half-forms

    +

    A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Devanagari consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly. The last consonant in the cluster retains its full shape.

    +
    +
    +तव→त्व +कक→क्क +तसव→त्स्व +
    +
    Examples of conjuncts formed by using half-forms.
    +
    +

    A small number of half-forms are only minimally different from side-by-side characters.

    -
    -

    Vowel lengthening

    +
    +
    +छग +छ्ग +
    +
    An example of a conjunct with a subtle difference between separate consonants with intervening vowel (left), and a conjunct cluster (right). The difference is highlighted on the left.
    +
    -
    +
    + + + +
    +

    Vertical stacks

    +

    This is more common for Sanskrit, and few modern fonts reorder glyphs in this way, or do so for a limited number of combinations.

    -

    An extra-long, sustained vowel sound can be indicated using [U+093D DEVANAGARI SIGN AVAGRAHA], eg. -आईऽऽऽ! <aiii!> -

    -

    This was originally used as a vowel elision marker in Sanskrit.

    +
    +
    +कक→क्क +दब→द्ब +हव→ह्व +
    +
    Conjuncts formed by subjoining non-initial consonants. +
    +
    +
    + + + +
    +

    Ligated conjuncts

    +

    Typically, only a small number of clusters are combined in a way that makes it difficult to spot the component parts. This is, however, the default for two particular clusters: क्ष k͓ʂ kṣ ज्ञ ɟ͓ɲ ɡj

    + +
    +
    +कष→क्ष +जञ→ज्ञ +कत→क्त +
    +
    Conjuncts formed by ligation.
    +
    + + + +
    +

    Conjuncts with ra

    +

    When [U+0930 DEVANAGARI LETTER RA] follows another consonant, it is typically rendered as a small, diagonal line to the left, eg. क्र ग्र भ्रAfter 6 consonants, however, it is rendered as an upside-down v shape below, ie. ट्र ठ्र ड्र ढ्र ड़्र छ्रAfter [U+0924 DEVANAGARI LETTER TA] it produces त्र

    +
    +
    +कर→क्र +टर→ट्र +तर→त्र +
    +
    Conjuncts formed by a following ra.
    +
    +

    When ra precedes another consonant, it is rendered as a small hook above the vertical line in the cluster, eg. र्क r͓k र्ल r͓lWhere it precedes a cluster using half-forms, it is aligned with the vertical line of the trailing consonant, eg. र्स्प r͓s͓pHowever, if there is a spacing vowel sign with a vertical line to the right of the cluster, it aligns with that, eg. र्का r͓kā र्की r͓kī(This illustrates how the basic units of the script are orthographic syllables.)

    +
    +
    +र्क +र्ल +र्स्प +र्का +
    +
    The horizontal position of the hook for conjuncts formed by a preceding ra follows the main vertical bar of the syllable.
    +
    +
    + + + +
    +

    Visible virama

    +

    The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, as illustrated in fig_virama_visible.

    +
    +
    +ङ्ख +ङ्ख +
    +
    A consonant cluster for which there exists a conjunct form in the Tiro Hindi font (left), but not in the Noto Serif Devanagari font (right). The latter indicates that this is a cluster by showing a visible virama.
    +
    -
    -

    Consonants with no following vowel

    -

    The inherent vowel is not always pronounced. For example in Hindi it is not usually pronounced at the end of a word, -although a ghost echo may appear after a word-final cluster of consonants, eg. -योग्य -राष्ट्र +

    Examples of clusters that the default font used for this page is unable to render as a conjunct form: +स्विट्ज़रलैंड +रीट्वीट

    -

    In addition Hindi has a general rule that when a word has three or more syllables and ends in a vowel other than the inherent a, the penultimate vowel is not pronounced, eg. compareसमझ smjʱ səməɟʱ समझा smjʱā səmɟʱaːandरहन rhn rəhən रहना rhnā rəhnaː

    -

    (For a number of reasons, -however, this rule does not always hold.)

    +

    An important consequence of representing clusters in this way is that the syllable boundaries are different. For example, if we follow the cluster with a left-positioned vowel sign, it will now appear after the virama, rather than before the cluster, eg. compare the position of the pre-base vowel sign in fig_virama_vowel. This change is also reflected in segmentation of the text for line-breaking, inter-character spacing, etc.

    -
    +
    +
    +ङ्खि +ङ्खि +
    +
    Positioning of the pre-base vowel sign in relation to the same consonant cluster where a conjunct forms (left) vs. where a visible virama appears (right).
    +
    -

    Devanagari uses [U+094D DEVANAGARI SIGN VIRAMA] (called halant in Hindi) to kill the inherent vowel after a consonant. The virama is rarely seen. As just mentioned, no virama is used at the end of a word, or in the penultimate syllable where the above rules apply. The virama is also usually hidden when the consonant is part of a consonant cluster (see clusters). The virama is visible, however, if it isn't followed by a consonant, eg. the following explicitly represents just the sound k,क्

    +

    A visible virama may also be used with a single consonant, to indicate that it is to be pronounced without the inherent vowel, eg. क् k

    + + + +
    +

    Consonant lengthening

    +

    Lengthened (geminated) consonants are indicated in the script using the same mechanisms as for clusters.

    +

    Most native consonants may be lengthened, but not , ɽ, ɽʱ, or ɦ. Geminate consonants are always medial and preceded by one of ə, ɪ, or ʊ.wp,#Consonants

    +
    - -
    -

    Vocalics

    -

    -

    In Devanagari, vocalics are available both as vowel-signs and independent vowels.

    -

    Hindi generally uses just one vocalic.

    -
    ृ␣ऋ
    -
    -

    Other vocalics are used for Sanskrit.

    -
    ॄ␣ॢ␣ॣ␣ॠ␣ऌ␣ॡ
    -
    -
    +
    +

    Using ZWJ & ZWNJ

    +

    ZWNJ It's possible to prevent the formation of conjuncts using U+200C ZERO WIDTH NON-JOINER (ZWNJ). For example:

    + +

    ZWJ To produce a half-form, rather than a ligated form, use U+200D ZERO WIDTH JOINER (ZWJ). For example, +क्‍ष   →   क्ष

    +

    It can also be used to produce standalone half-forms (for educational text) such as +घ्‍

    +
    + - -
    -

    Consonants

    -

    The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters.

    -

    Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

    -

    As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

    -

    Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such.

    @@ -1113,67 +1464,67 @@

    Stops

    p
    -

    [U+092A DEVANAGARI LETTER PA], eg. पानी.

    +

    [U+092A DEVANAGARI LETTER PA]

    b
    -

    [U+092C DEVANAGARI LETTER BA], eg. बहुत.

    +

    [U+092C DEVANAGARI LETTER BA]

    -

    [U+0925 DEVANAGARI LETTER THA], eg. थूकना.

    +

    [U+0925 DEVANAGARI LETTER THA]

    ʈ
    ʈʱ
    ɖ
    @@ -1185,31 +1536,31 @@

    Stops

    k
    -

    [U+0915 DEVANAGARI LETTER KA], eg. कुत्ता.

    +

    [U+0915 DEVANAGARI LETTER KA]

    ɡ
    -

    [U+0917 DEVANAGARI LETTER GA], eg. गर्दन.

    +

    [U+0917 DEVANAGARI LETTER GA]

    ɡʱ
    q
    -

    क़ [U+0915 DEVANAGARI LETTER KA + U+093C DEVANAGARI SIGN NUKTA], eg. क़लम.

    +

    क़ [U+0915 DEVANAGARI LETTER KA + U+093C DEVANAGARI SIGN NUKTA]

    [U+0958 DEVANAGARI LETTER QA] (decomposes in NFC and doesn't recompose) 

    @@ -1226,25 +1577,25 @@

    Affricates

    t͡ʃ
    t͡ʃʱ
    d͡ʒ
    -

    [U+091C DEVANAGARI LETTER JA], eg. जानवर.

    +

    [U+091C DEVANAGARI LETTER JA]

    d͡ʒʱ
    @@ -1258,33 +1609,33 @@

    Fricatives

    f
    -

    फ़ [U+092B DEVANAGARI LETTER PHA + U+093C DEVANAGARI SIGN NUKTA], eg. सफ़ेद.

    +

    फ़ [U+092B DEVANAGARI LETTER PHA + U+093C DEVANAGARI SIGN NUKTA]

    [U+095E DEVANAGARI LETTER FA]  (decomposes in NFC and doesn't recompose) 

    v
    -

    [U+0935 DEVANAGARI LETTER VA] as an allophone of ʋ, eg. व्रत.

    +

    [U+0935 DEVANAGARI LETTER VA] as an allophone of ʋ

    s
    -

    [U+0938 DEVANAGARI LETTER SA], eg. सूरज.

    +

    [U+0938 DEVANAGARI LETTER SA]

    z
    -

    ज़ [U+091C DEVANAGARI LETTER JA + U+093C DEVANAGARI SIGN NUKTA], eg. नज़दीक.

    +

    ज़ [U+091C DEVANAGARI LETTER JA + U+093C DEVANAGARI SIGN NUKTA]

    [U+095B DEVANAGARI LETTER ZA]   (decomposes in NFC and doesn't recompose) 

    ʃ
    -

    [U+0936 DEVANAGARI LETTER SHA], eg. बारिश.

    +

    [U+0936 DEVANAGARI LETTER SHA]

    @@ -1302,7 +1653,7 @@

    Fricatives

    x
    -

    ख़ [U+0916 DEVANAGARI LETTER KHA + U+093C DEVANAGARI SIGN NUKTA], eg. ख़ून.

    +

    ख़ [U+0916 DEVANAGARI LETTER KHA + U+093C DEVANAGARI SIGN NUKTA]

    [U+0959 DEVANAGARI LETTER KHHA]   (decomposes in NFC and doesn't recompose) 

    @@ -1322,7 +1673,7 @@

    Fricatives

    ɦ
    -

    [U+0939 DEVANAGARI LETTER HA], eg. हड्डी.

    +

    [U+0939 DEVANAGARI LETTER HA]

    @@ -1338,15 +1689,15 @@

    Nasals

    m
    -

    [U+092E DEVANAGARI LETTER MA], eg. मछली.

    +

    [U+092E DEVANAGARI LETTER MA]

    [U+0902 DEVANAGARI SIGN ANUSVARA] when followed by a labial consonant.

    n
    -

    [U+0928 DEVANAGARI LETTER NA], eg. नाक.

    -

    [U+0902 DEVANAGARI SIGN ANUSVARA] when followed by an alveolar consonant, eg. ठंडा.

    +

    [U+0928 DEVANAGARI LETTER NA]

    +

    [U+0902 DEVANAGARI SIGN ANUSVARA] when followed by an alveolar consonant

    @@ -1365,7 +1716,7 @@

    Nasals

    ŋ

    [U+0919 DEVANAGARI LETTER NGA]

    -

    [U+0902 DEVANAGARI SIGN ANUSVARA] when followed by a velar consonant, eg. टांग

    +

    [U+0902 DEVANAGARI SIGN ANUSVARA] when followed by a velar consonant 

    @@ -1380,13 +1731,13 @@

    Other

    ʋ
    -

    [U+0935 DEVANAGARI LETTER VA], eg. त्वचा, हवा.

    +

    [U+0935 DEVANAGARI LETTER VA]

    w
    -

    [U+0935 DEVANAGARI LETTER VA] as a variant of ʋ commonly occuring between a consonant and vowel, eg. पकवान.

    +

    [U+0935 DEVANAGARI LETTER VA] as a variant of ʋ commonly occuring between a consonant and vowel

    @@ -1398,355 +1749,55 @@

    Other

    ɾ
    -

    [U+0930 DEVANAGARI LETTER RA], eg. रात rāt ɾɑːt̪ night.

    +

    [U+0930 DEVANAGARI LETTER RA]

    ɽ
    -

    ड़ [U+0921 DEVANAGARI LETTER DDA + U+093C DEVANAGARI SIGN NUKTA], eg. बड़ा.

    +

    ड़ [U+0921 DEVANAGARI LETTER DDA + U+093C DEVANAGARI SIGN NUKTA]

    [U+095C DEVANAGARI LETTER DDDHA]     (decomposes in NFC and doesn't recompose) 

    ɽʱ
    -

    ढ़ [U+0922 DEVANAGARI LETTER DDHA + U+093C DEVANAGARI SIGN NUKTA], eg. गाढ़ा.

    +

    ढ़ [U+0922 DEVANAGARI LETTER DDHA + U+093C DEVANAGARI SIGN NUKTA]

    [U+095D DEVANAGARI LETTER RHA]    (decomposes in NFC and doesn't recompose) 

    Sources: Wikipedia, and Google Translate.

    +
    -
    - - - - - -
    -

    Basic consonants

    - - -

    Basic set of consonants, used for Hindi and Sanskrit. (Phonetic information for Hindi.)

    - -
    प␣फ␣ब␣भ␣त␣थ␣द␣ध␣ट␣ठ␣ड␣ढ␣क␣ख␣ग␣घ
    - -
    च␣छ␣ज␣झ
    -
    व␣स␣श␣ष␣ह
    +
    +

    Encoding choices

    +

    This section looks at alternative strategies for typing and storing vowel signs and independent vowels used by Hindi, taking into consideration the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

    -
    म␣न␣ञ␣ण␣ङ
    -
    व␣र␣ल␣य
    - -

    Hindi also counts 3 character combinations as consonantal letters of the alphabet.

    -
    त्र␣ज्ञ␣क्ष
    -
    - - - - - - -
    -

    Repertoire extension

    - - -
    - - -

    [U+093C DEVANAGARI SIGN NUKTA] is used to represent foreign sounds, eg. in the following example the dot changes to ख़ x -ख़ारीदारी -

    - -

    A list of graphemes used in Hindi that combine nukta with an existing consonant. These are all counted as letters of the Hindi alphabet. The 5th one is very rare.

    - -
    क़␣फ़␣ज़␣झ़␣श़␣ख़␣ग़␣ड़␣ढ़
    - -

    The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

    -

    The Unicode block also contains the following precomposed code points for the sequences used in Hindi.

    - - -
    क़␣फ़␣ज़␣ख़␣ग़␣ड़␣ढ़
    - -

    The Unicode Standard recommends not to use the precomposed code points for Hindi, but instead to use the base+nukta sequences. See also nukta_encoding for more information.

    -
    - - - - - - - -
    -

    Final consonants

    - -

    Although traditionally classified as vowels, 2 diacritics represent syllable-final consonant sounds.

    - -
    ं␣ः
    - -

    Nasal sounds m n ŋ that are homorganic with a following consonant are commonly written using [U+0902 DEVANAGARI SIGN ANUSVARA]. This mark is positioned over the previous consonant, eg. -हिंदी -

    -

    Most words that use the anusvara can also be written using the consonant itself, eg. -हिन्दी -

    -

    In some cases, however, the anusvara form is more common. For example, the first of the two following alternatives is much more common -पंजाब -*पञ्जाब -

    -

    Some words, mostly Sanskrit loan words, may end with a voiceless h after a vowel which can be written using [U+0903 DEVANAGARI SIGN VISARGA], eg. -पुनः -दुःखी -

    - -

    See also the candrabindu diacritic, which nasalises a vowel.

    -
    - - - - - - -
    -

    Consonant clusters

    - -

    -

    - -

    The absence of a vowel sound between two or more consonants can be visually indicated in one of the following ways.

    - -
      -
    1. Create a conjunct. There are a number of possibilities here:
        - -
      1. Half-forms : Reduce the shape of all consonants in the cluster except the last to a 'half-form' by removing the vertical stroke.
      2. - -
      3. Stacking : Reduce a non-initial consonant in size and shape and position it below the first.
      4. - -
      5. Special ligation : Create a fusion of the two shapes, where one or other of the components may not be easily recognisable.
      6. - -
      7. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
      8. -
    2. - -
    3. Show a visible virama below the non-final consonants in the cluster.
    4. - -
    5. No indication, although there are usually generalised pronunciation rules that allow readers to spot these locations. Examples of these rules are given in the section about the inherent vowel.
    6. -
    - -

    See also doubling.

    - - - - - - -
    -

    Conjunct formation

    - -

    See a table of 2-consonant clusters.
    The table allows you to test results for various fonts.

    - -
    - -

    To produce a conjunct, [U+094D DEVANAGARI SIGN VIRAMA] is added between the consonants in the cluster. There are exceptions, but this type of virama is usually not displayed, eg. the sequence + + [U+0915 DEVANAGARI LETTER KA + U+094D DEVANAGARI SIGN VIRAMA + U+0937 DEVANAGARI LETTER SSA] produces -क्ष -

    - - -

    The font usually determines which visual method is used, although it is possible to influence this (see joiner).

    - -

    Click on the figures below to see which characters are being shown.

    -
    - - - - - - -
    -

    Conjoined half-forms

    -

    A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Devanagari consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly. The last consonant in the cluster retains its full shape.

    - -
    -
    -तव→त्व -कक→क्क -तसव→त्स्व -
    -
    Examples of conjuncts formed by using half-forms.
    -
    - -

    A small number of half-forms are only minimally different from side-by-side characters.

    - -
    -
    -छग -छ्ग -
    -
    An example of a conjunct with a subtle difference between separate consonants with intervening vowel (left), and a conjunct cluster (right). The difference is highlighted on the left.
    -
    - -
    - - - -
    -

    Vertical stacks

    -

    This is more common for Sanskrit, and few modern fonts reorder glyphs in this way, or do so for a limited number of combinations.

    - -
    -
    -कक→क्क -दब→द्ब -हव→ह्व -
    -
    Conjuncts formed by subjoining non-initial consonants. -
    -
    -
    - - - -
    -

    Ligated conjuncts

    -

    Typically, only a small number of clusters are combined in a way that makes it difficult to spot the component parts. This is, however, the default for two particular clusters: क्ष k͓ʂ kṣ ज्ञ ɟ͓ɲ ɡj

    - -
    -
    -कष→क्ष -जञ→ज्ञ -कत→क्त -
    -
    Conjuncts formed by ligation.
    -
    -
    - - - -
    -

    Conjuncts with ra

    -

    When [U+0930 DEVANAGARI LETTER RA] follows another consonant, it is typically rendered as a small, diagonal line to the left, eg. क्र ग्र भ्रAfter 6 consonants, however, it is rendered as an upside-down v shape below, ie. ट्र ठ्र ड्र ढ्र ड़्र छ्रAfter [U+0924 DEVANAGARI LETTER TA] it produces त्र

    - -
    -
    -कर→क्र -टर→ट्र -तर→त्र -
    -
    Conjuncts formed by a following ra.
    -
    - - - -

    When ra precedes another consonant, it is rendered as a small hook above the vertical line in the cluster, eg. र्क r͓k र्ल r͓lWhere it precedes a cluster using half-forms, it is aligned with the vertical line of the trailing consonant, eg. र्स्प r͓s͓pHowever, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. र्का r͓kā र्की r͓kī(This illustrates how the basic units of the script are orthographic syllables.)

    - -
    -
    -र्क -र्ल -र्स्प -र्का -
    -
    The horizontal position of the hook for conjuncts formed by a preceding ra follows the main vertical bar of the syllable.
    -
    -
    - - - -
    -

    Visible virama

    -

    The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, as illustrated in fig_virama_visible.

    - -
    -
    -ङ्ख -ङ्ख -
    -
    A consonant cluster for which there exists a conjunct form in the Tiro Hindi font (left), but not in the Noto Serif Devanagari font (right). The latter indicates that this is a cluster by showing a visible virama.
    -
    - -

    Examples of clusters that the default font used for this page is unable to render as a conjunct form: -स्विट्ज़रलैंड -रीट्वीट -

    -

    An important consequence of representing clusters in this way is that the syllable boundaries are different. For example, if we follow the cluster with a left-positioned vowel-sign, it will now appear after the virama, rather than before the cluster, eg. compare the position of the pre-base vowel-sign in fig_virama_vowel. This change is also reflected in segmentation of the text for line-breaking, inter-character spacing, etc.

    - -
    -
    -ङ्खि -ङ्खि -
    -
    Positioning of the pre-base vowel-sign in relation to the same consonant cluster where a conjunct forms (left) vs. where a visible virama appears (right).
    -
    - -

    A visible virama may also be used with a single consonant, to indicate that it is to be pronounced without the inherent vowel, eg. क् k

    -
    -
    - - - - - -
    -

    Consonant lengthening

    -

    Lengthened (geminated) consonants are indicated in the script using the same mechanisms as for clusters.

    -

    Most native consonants may be lengthened, but not , ɽ, ɽʱ, or ɦ. Geminate consonants are always medial and preceded by one of ə, ɪ, or ʊ.wp,#Consonants

    -
    - - - - - - - -
    -

    Using ZWJ & ZWNJ

    -

    ZWNJ It's possible to prevent the formation of conjuncts using U+200C ZERO WIDTH NON-JOINER (ZWNJ). For example:

    - -

    ZWJ To produce a half-form, rather than a ligated form, use U+200D ZERO WIDTH JOINER (ZWJ). For example, -क्‍ष   →   क्ष

    -

    It can also be used to produce standalone half-forms (for educational text) such as -घ्‍

    -
    -
    - - - - - - - -
    -

    Encoding choices

    -

    This section looks at alternative strategies for typing and storing vowel-signs and independent vowels used by Hindi, taking into consideration the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

    - - -
    -

    Vowel-signs

    -

    The single code points on the left should be used, and not the sequences on the right, because they are not made the same by normalisation. Therefore the content will be regarded as different, which will affect searching and other operations on the text.

    +
    +

    Vowel signs

    +

    The single code points on the left should be used, and not the sequences on the right, because they are not made the same by normalisation. Therefore the content will be regarded as different, which will affect searching and other operations on the text.

    @@ -1925,10 +1976,12 @@

    Numbers, dates, currency, etc

    The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.cldr

    An interesting feature of large numbers written in India is that they apply groupings of two, rather than three, digits between commas (even when using european digits).

    -
    -

    20,00,000

    -
    Two million, written with Indian comma separators.
    -
    + + +
    +

    20,00,000

    +
    Two million, written with Indian comma separators.
    +
    @@ -2011,7 +2064,7 @@

    Context-based shaping

    Multiple combining characters

    -

    Diacritics regularly combine with a vowel-sign attached to the same consonant or consonant cluster. The example below shows two combining characters that are positioned above the base character in a very common form of the verb 'to be'. One is [U+0948 DEVANAGARI VOWEL SIGN AI​], and the other the nasalisation mark [U+0902 DEVANAGARI SIGN ANUSVARA​].

    +

    Diacritics regularly combine with a vowel sign attached to the same consonant or consonant cluster. The example below shows two combining characters that are positioned above the base character in a very common form of the verb 'to be'. One is [U+0948 DEVANAGARI VOWEL SIGN AI​], and the other the nasalisation mark [U+0902 DEVANAGARI SIGN ANUSVARA​].

    हैं @@ -2074,10 +2127,10 @@

    Graphemes

    This section is still undergoing research and development.

    -

    Grapheme clusters alone are not sufficient to represent typographic units in Hindi in all circumstances. Conjuncts are common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and in-word line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with [U+094D DEVANAGARI SIGN VIRAMA​], +

    Grapheme clusters alone are not sufficient to represent typographic units in Hindi in all circumstances. Conjuncts are common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and in-word line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with [U+094D DEVANAGARI SIGN VIRAMA​], which has an Indic Syllabic Category of Virama.

    -

    However, Hindi needs to interpret the virama (halant) in two different ways for segmentation: (1) as a simple vowel-killer, and (2) as a conjunct initiator, depending on whether or not it is rendered visibly.

    +

    However, Hindi needs to interpret the virama (halant) in two different ways for segmentation: (1) as a simple vowel-killer, and (2) as a conjunct initiator, depending on whether or not it is rendered visibly.

    @@ -2094,7 +2147,7 @@

    Grapheme clusters

  • Final consonant marks (see finals and nasalisation)
  • Virama (halant) (see clusters and novowel)
  • -

    Any of the above may occur after a consonant base. Independent vowel bases usually only have final consonant marks. There is usually only one vowel-sign per base consonant. A virama only occurs after a consonant and optional nukta.

    +

    Any of the above may occur after a consonant base. Independent vowel bases usually only have final consonant marks. There is usually only one vowel sign per base consonant. A virama only occurs after a consonant and optional nukta.

    The following examples show a variety of grapheme clusters:

    Click on the text version of these words to see more detail about the composition.

    @@ -2114,13 +2167,13 @@

    Grapheme clusters

    -

    Orthographic syllables

    +

    Larger typographic units

    (Consonant Nukta? Virama)* Grapheme_cluster

    Hindi commonly stacks or conjoins glyphs, to form conjuncts. The conjuncts represent consonant clusters or gemination.

    -

    Grapheme clusters terminate after a sequence of marks that ends with a pangkon, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, in-word line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

    +

    Grapheme clusters terminate after a sequence of marks that ends with a virama, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, in-word line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

    Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with [U+094D DEVANAGARI SIGN VIRAMA​], and the final grapheme cluster begins with a consonant.

    The following are examples. Some examples were shown in the previous section: here the conjunct is treated as a single typographic unit.

    @@ -2160,7 +2213,7 @@

    Complicating factors

    What's important to notice here is that it is normally possible to break a line after the virama when the virama is visible. This is currently difficult to manage because the decision as to whether the text is segmented into 2 graphemes or one depends only on the capabilities of the font used (ie. the rendered result); the code point sequence is identical for both cases, and gives no clues to which approach to segmentation is applicable.

    -

    Visible viramas can also affect vowel-sign positioning. For the purposes of illustration, let's take the previous example and replace the vowel-signs with ones that are displayed before the base. Observe the placement of the pre-base vowel in fig_prebase_position. In the conjunct form on the left, the vowel-sign is rendered to the left of the whole conjunct. If the sequence is not rendered as a conjunct, as in the second example, the pre-base glyph precedes the VA, not the TA.

    +

    Visible viramas can also affect vowel sign positioning. For the purposes of illustration, let's take the previous example and replace the vowel signs with ones that are displayed before the base. Observe the placement of the pre-base vowel in fig_prebase_position. In the conjunct form on the left, the vowel sign is rendered to the left of the whole conjunct. If the sequence is not rendered as a conjunct, as in the second example, the pre-base glyph precedes the VA, not the TA.

    रिट्विट @@ -2194,7 +2247,7 @@

    Browser behaviour

    Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, for all browsers.

    Line-break. See this test. The CSS sets the value of the line-break property to anywhere. Change the size of the box to slowly move the line break point.
    -Gecko appears to segment on grapheme cluster boundaries, except for inside the 3rd word, where it wraps first a vowel-sign, then wraps the rest of the conjunct plus the previous grapheme cluster as a single unit. WebKit and Blink both wrap on orthographic syllable boundaries.

    +Gecko appears to segment on grapheme cluster boundaries, except for inside the 3rd word, where it wraps first a vowel sign, then wraps the rest of the conjunct plus the previous grapheme cluster as a single unit. WebKit and Blink both wrap on orthographic syllable boundaries.

    diff --git a/devanagari/images/fig_prebase.svg b/devanagari/images/fig_prebase.svg new file mode 100644 index 000000000..304302a46 --- /dev/null +++ b/devanagari/images/fig_prebase.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/devanagari/images/fig_prebase_split.svg b/devanagari/images/fig_prebase_split.svg new file mode 100644 index 000000000..71c503404 --- /dev/null +++ b/devanagari/images/fig_prebase_split.svg @@ -0,0 +1 @@ + \ No newline at end of file