Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for new normalize function. #784

Merged
merged 7 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,30 @@ New features are added to the language continuously, and occasionally, some feat
This section lists all of the features that have been removed, deprecated, added, or extended in different Cypher versions.
Replacement syntax for deprecated and removed features are also indicated.

[[cypher-deprecations-additions-removals-5.17]]
== Neo4j 5.17

=== New features

[cols="2", options="header"]
|===
| Feature
| Details

a|
label:functionality[]
label:new[]

[source, cypher, role=noheader]
----
RETURN normalize("string", NFC)
----

| Introduction of a xref::functions/string.adoc#functions-normalize[normalize()] function.
This function normalizes a `STRING` according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

|===

[[cypher-deprecations-additions-removals-5.16]]
== Neo4j 5.16

Expand Down Expand Up @@ -146,7 +170,7 @@ label:updated[]
MATCH (n:Label) WHERE $param IS :: STRING NOT NULL AND n.prop = $param
----

| `IS :: STRING NOT NULL` is now an xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes-type-predicate-expressions[index-compatible predicate].
| `IS :: STRING NOT NULL` is now an xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes-type-predicate-expressions[index-compatible predicate].

|===

Expand Down
8 changes: 7 additions & 1 deletion modules/ROOT/pages/functions/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,12 @@ These functions are used to manipulate strings or to create a string representat
| `ltrim(input :: STRING) :: STRING`
| Returns the given `STRING` with leading whitespace removed.

1.2+| xref::functions/string.adoc#functions-normalize[`normalize()`]
| `normalize(input :: STRING) :: STRING`
| Returns the given `STRING` normalized according to the normalization form `NFC`. label:new[Introduced in 5.17]
| `normalize(input :: STRING, normalForm = NFC :: [NFC, NFD, NFKC, NFKD]) :: STRING`
| Returns the given `STRING` normalized according to the specified normalization form. label:new[Introduced in 5.17]

1.1+| xref::functions/string.adoc#functions-replace[`replace()`]
| `replace(original :: STRING, search :: STRING, replace :: STRING) :: STRING`
| Returns a `STRING` in which all occurrences of a specified search `STRING` in the given `STRING` have been replaced by another (specified) replacement `STRING`.
Expand Down Expand Up @@ -773,7 +779,7 @@ Graph functions provide information about the constituent graphs in composite da
|===
| Function | Signature | Description
1.1+| xref:functions/graph.adoc#functions-graph-by-elementid[`graph.byElementId()`] | `USE graph.byElementId(elementId :: STRING)` | Resolves the constituent graph to which a given element id belongs.
label:new[Introduced in Neo4j 5.13]
label:new[Introduced in 5.13]
1.1+| xref:functions/graph.adoc#functions-graph-byname[`graph.byName()`] | `USE graph.byName(name :: STRING)` | Resolves a constituent graph by name.
1.1+| xref:functions/graph.adoc#functions-graph-names[`graph.names()`] | `graph.names() :: LIST<STRING>` | Returns a list containing the names of all graphs in the current composite database.
1.1+| xref:functions/graph.adoc#functions-graph-names[`graph.propertiesByName()`] | `graph.propertiesByName(name :: STRING) :: MAP` | Returns a map containing the properties associated with the given graph.
Expand Down
171 changes: 171 additions & 0 deletions modules/ROOT/pages/functions/string.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,177 @@ RETURN ltrim(' hello')

======



[[functions-normalize]]
== normalize()

_This feature was introduced in Neo4j 5.17._

`normalize()` returns the given `STRING` normalized using the `NFC` Unicode normalization form.

[NOTE]
====
Unicode normalization is a process that transforms different representations of the same string into a standardized form.
For more information, see the documentation for link:https://unicode.org/reports/tr15/#Norm_Forms[Unicode normalization forms].
====

The `normalize()` function is useful for converting `STRING` values into comparable forms.
When comparing two `STRING` values, it is their Unicode codepoints that are compared.
In Unicode, a codepoint for a character that looks the same may be represented by two, or more, different codepoints.
For example, the character `<` can be represented as `\uFE64` (﹤) or `\u003C` (<).
To the human eye, the characters may appear identical.
However, if compared, Cypher will return false as `\uFE64` does not equal `\u003C`.
Using the `normalize()` function, it is possible to
normalize the codepoint `\uFE64` to `\u003C`, creating a single codepoint representation, allowing them to be successfully compared.

*Syntax:*

[source, syntax, role="noheader"]
----
normalize(input)
----

*Returns:*

|===

| `STRING`

|===

*Arguments:*

[options="header"]
|===
| Name | Description

| `input`
| An expression that returns a `STRING`.

|===

*Considerations:*

|===

| `normalize(null)` returns `null`.

|===


.+normalize()+
======

.Query
[source, cypher, indent=0]
----
RETURN normalize('\u212B') = '\u00C5' AS result
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===

| +result+
| +true+
1+d|Rows: 1

|===

======


[[functions-normalize-with-normal-form]]
== normalize(), with specified normal form

_This feature was introduced in Neo4j 5.17._

`normalize()` returns the given `STRING` normalized using the specified normalization form.
The normalization form can be of type `NFC`, `NFD`, `NFKC` or `NFKD`.

There are two main types of normalization forms:

* *Canonical equivalence*: The `NFC` (default) and `NFD` are forms of canonical equivalence.
This means that codepoints that represent the same abstract character will
be normalized to the same codepoint (and have the same appearance and behavior).
The `NFC` form will always give the *composed* canonical form (in which the combined codes are replaced with a single representation, if possible).
The`NFD` form gives the *decomposed* form (the opposite of the composed form, which converts the combined codepoints into a split form if possible).

* *Compatability normalization*: `NFKC` and `NFKD` are forms of compatibility normalization.
All canonically equivalent sequences are compatible, but not all compatible sequences are canonical.
This means that a character normalized in `NFC` or `NFD` should also be normalized in `NFKC` and `NFKD`.
Other characters with only slight differences in appearance should be compatibly equivalent.

For example, the Greek Upsilon with Acute and Hook Symbol `ϓ` can be represented by the Unicode codepoint: `\u03D3`.

* Normalized in `NFC`: `\u03D3` Greek Upsilon with Acute and Hook Symbol (ϓ)
* Normalized in `NFD`: `\u03D2\u0301` Greek Upsilon with Hook Symbol + Combining Acute Accent (ϓ)
* Normalized in `NFKC`: `\u038E` Greek Capital Letter Upsilon with Tonos (Ύ)
* Normalized in `NFKD`: `\u03A5\u0301` Greek Capital Letter Upsilon + Combining Acute Accent (Ύ)

In the compatibility normalization forms (`NFKC` and `NFKD`) the character is visibly different as it no longer contains the hook symbol.

*Syntax:*

[source, syntax, role="noheader"]
----
normalize(input, normalForm)
----

*Returns:*

|===

| `STRING`

|===

*Arguments:*

[options="header"]
|===
| Name | Description

| `input`
| An expression that returns a `STRING`.


| `normalForm`
| A keyword specifying the normal form, can be `NFC`, `NFD`, `NFKC` or `NFKD`.

|===

*Considerations:*

|===

| `normalize(null, NFC)` returns `null`.

|===


.+normalize()+
======

.Query
[source, cypher, indent=0]
----
RETURN normalize('\uFE64', NFKC) = '\u003C' AS result
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===

| +result+
| +true+
1+d|Rows: 1

|===

======

[[functions-replace]]
== replace()

Expand Down