PR review changes

neo4j · Jan 15, 2024 · 215eb9c · 215eb9c
1 parent 5d44934
commit 215eb9c
Showing 1 changed file with 20 additions and 12 deletions.
diff --git a/modules/ROOT/pages/functions/string.adoc b/modules/ROOT/pages/functions/string.adoc
@@ -153,13 +153,19 @@ RETURN ltrim('   hello')
 [[functions-normalize]]
 == normalize()
 
-`normalize()` returns the given `STRING` normalized using the `NFC` normalization form.
+`normalize()` returns the given `STRING` normalized using the `NFC` Unicode normalization form.
+
+[NOTE]
+====
+Unicode normalization is a process that transforms different representations of the same string into a standardized form.
+For more information, see the documentation for link:https://unicode.org/reports/tr15/#Norm_Forms[Unicode normalization forms].
+====
 
 The `normalize()` function is useful for converting `STRING` values into comparable forms.
 When comparing two `STRING` values, it is their Unicode codepoints that are compared.
-In Unicode, a codepoint for a character that looks the same may actually be represented by two, or more, different codepoints.
-For example, the character `<` can be represented as `\uFE64` or `\u003C`. Visually, the character may look the same,
-but if compared, Cypher will return false as `\uFE64` does not equal `\u003C`. Using the `normalize()` function one can
+In Unicode, a codepoint for a character that looks the same may be represented by two, or more, different codepoints.
+For example, the character `<` can be represented as `\uFE64` (﹤) or `\u003C` (<). Visually, the character may look the same,
+but if compared, Cypher will return false as `\uFE64` does not equal `\u003C`. Using the `normalize()` function, it is possible to
 normalize the codepoint `\uFE64` to `\u003C`, creating a single codepoint representation, allowing them to be successfully compared.
 
 *Syntax:*
@@ -225,16 +231,18 @@ RETURN normalize('\u212B') = '\u00C5' AS result
 `normalize()` returns the given `STRING` normalized using the specified normalization form.
 The normalization form can be of type `NFC`, `NFD`, `NFKC` or `NFKD`.
 
-There are two main types of normalization forms. One is based on the concept of canonical equivalence, and the other is based on compatibility.
+There are two main types of normalization forms:
 
-The two forms `NFC` (default) and `NFD` are forms of canonical equivalence. This means that codepoints which represent the same abstract character will
-be normalized to the same codepoint. The same abstract character means that the character has the same visual appearance and behavior.
-The difference between `NFC` and `NFD` is that `NFC` form will always give the *composed* canonical form, one where combined codes are replaced with the single representation, if possible.
-Whereas, `NFD` gives the *decomposed* form, this is the opposite of composed, converting combined codepoints into the split form if possible.
+*  *Canonical equivalence*: The `NFC` (default) and `NFD` are forms of canonical equivalence.
+This means that codepoints that represent the same abstract character will
+be normalized to the same codepoint (and have the same appearance and behavior).
+The `NFC` form will always give the *composed* canonical form (in which the combined codes are replaced with a single representation, if possible).
+The`NFD` form gives the *decomposed* form (the opposite of the composed form, which converts the combined codepoints into a split form if possible).
 
-The two forms `NFKC` and `NFKD` are forms of compatibility normalization. All canonically equivalent sequences are compatible, but not all compatible sequences are canonical.
-This means that a character normalized in `NFC` or `NFD` should also be normalized in `NFKC` and `NFKD`, and other characters that may have a slightly different visual appearance,
-but are considered close enough in appearance to be compatibly equivalent.
+* *Compatability normalization*: `NFKC` and `NFKD` are forms of compatibility normalization.
+All canonically equivalent sequences are compatible, but not all compatible sequences are canonical.
+This means that a character normalized in `NFC` or `NFD` should also be normalized in `NFKC` and `NFKD`.
+Other characters with only slight differences in appearance should be compatibly equivalent.
 
 For example, the Greek Upsilon with Acute and Hook Symbol `ϓ` can be represented by the Unicode codepoint: `\u03D3`.