Skip to content

Commit

Permalink
Dev normalize predicate (#804)
Browse files Browse the repository at this point in the history
Add a new Normalization Predicate Expression.

RETURN "string" IS NORMALIZED;

---------

Co-authored-by: Jens Pryce-Åklundh <[email protected]>
  • Loading branch information
gem-neo4j and JPryce-Aklundh authored Jan 22, 2024
1 parent aadc733 commit b55982d
Show file tree
Hide file tree
Showing 4 changed files with 147 additions and 1 deletion.
29 changes: 29 additions & 0 deletions modules/ROOT/pages/clauses/where.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,35 @@ The `name` and `age` for `Peter` are are returned because his name contains "ete
|===


[[match-string-is-normalized]]
=== Checking if a `STRING` `IS NORMALIZED`

The `IS NORMALIZED` operator (introduced in Neo4j 5.17) is used to check whether the given `STRING` is in the `NFC` Unicode normalization form:

.Query
[source, cypher]
----
MATCH (n:Person)
WHERE n.name IS NORMALIZED
RETURN n.name AS normalizedNames
----

The given `STRING` values contain only normalized Unicode characters, therefore all the matched `name` properties are returned.
For more information, see the section about the xref:syntax/operators.adoc#match-string-is-normalized[normalization operator].

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| normalizedNames
| 'Andy'
| 'Timothy'
| 'Peter'
2+|Rows: 1
|===

Note that the `IS NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NORMALIZED` returns `null`.

[[match-string-negation]]
=== String matching negation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,23 @@ RETURN normalize("string", NFC)
| Introduction of a xref::functions/string.adoc#functions-normalize[normalize()] function.
This function normalizes a `STRING` according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

a|
label:functionality[]
label:new[]

[source, cypher, role=noheader]
----
IS [NOT] [NFC \| NFD \| NFKC \| NFKD] NORMALIZED
----

[source, cypher, role=noheader]
----
RETURN "string" IS NORMALIZED
----

| Introduction of an xref::syntax/operators.adoc#match-string-is-normalized[IS NORMALIZED] operator.
The operator can be used to check if a `STRING` is normalized according to the specified normalization form, which can be of type `NFC`, `NFD`, `NFKC`, or `NFKD`.

|===

[[cypher-deprecations-additions-removals-5.16]]
Expand Down
3 changes: 3 additions & 0 deletions modules/ROOT/pages/functions/string.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ RETURN normalize('\u212B') = '\u00C5' AS result
======

To check if a `STRING` is normalized, use the xref:syntax/operators.adoc#match-string-is-normalized[`IS NORMALIZED`] operator.

[[functions-normalize-with-normal-form]]
== normalize(), with specified normal form
Expand Down Expand Up @@ -319,6 +320,8 @@ RETURN normalize('\uFE64', NFKC) = '\u003C' AS result
======

To check if a `STRING` is normalized in a specific Unicode normal form, use the xref:syntax/operators.adoc#match-string-is-normalized-specified-normal-form[`IS NORMALIZED`] operator with a specified normalization form.

[[functions-replace]]
== replace()

Expand Down
99 changes: 98 additions & 1 deletion modules/ROOT/pages/syntax/operators.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This page contains an overview of the available Cypher operators.
| xref::syntax/operators.adoc#query-operators-comparison[Comparison operators] | `+=+`, `+<>+`, `+<+`, `+>+`, `+<=+`, `+>=+`, `IS NULL`, `IS NOT NULL`
| xref::syntax/operators.adoc#query-operators-comparison[String-specific comparison operators] | `STARTS WITH`, `ENDS WITH`, `CONTAINS`, `=~` (regex matching)
| xref::syntax/operators.adoc#query-operators-boolean[Boolean operators] | `AND`, `OR`, `XOR`, `NOT`
| xref::syntax/operators.adoc#query-operators-string[String operators] | `+` (string concatenation)
| xref::syntax/operators.adoc#query-operators-string[String operators] | `+` (string concatenation), `IS NORMALIZED`
| xref::syntax/operators.adoc#query-operators-temporal[Temporal operators] | `+` and `-` for operations between durations and temporal instants/durations, `*` and `/` for operations between durations and numbers
| xref::syntax/operators.adoc#query-operators-map[Map operators] | `.` for static value access by key, `[]` for dynamic value access by key
| xref::syntax/operators.adoc#query-operators-list[List operators] | `+` (list concatenation), `IN` to check existence of an element in a list, `[]` for accessing element(s) dynamically
Expand Down Expand Up @@ -543,6 +543,7 @@ RETURN number
The string operators comprise:

* concatenating strings: `+`
* checking if a string is normalized: `IS NORMALIZED`


[[syntax-concatenating-two-strings]]
Expand All @@ -563,6 +564,102 @@ RETURN 'neo' + '4j' AS result
|===


[[match-string-is-normalized]]
=== Checking if a `STRING` `IS NORMALIZED`

_This feature was introduced in Neo4j 5.17._

The `IS NORMALIZED` operator is used to check whether the given `STRING` is in the `NFC` Unicode normalization form:

[NOTE]
====
Unicode normalization is a process that transforms different representations of the same string into a standardized form.
For more information, see the documentation for link:https://unicode.org/reports/tr15/#Norm_Forms[Unicode normalization forms].
====

.Query
[source, cypher]
----
RETURN "the \u212B char" IS NORMALIZED AS normalized
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| normalized
| false
2+|Rows: 1
|===

Because the given `STRING` contains a non-normalized Unicode character (`\u212B`), `false` is returned.

To normalize a `STRING`, use the xref:functions/string.adoc#functions-normalize[normalize()] function.

Note that the `IS NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NORMALIZED` returns `null`.

[[match-string-is-not-normalized]]
=== Checking if a `STRING` `IS NOT NORMALIZED`

_This feature was introduced in Neo4j 5.17._

The `IS NOT NORMALIZED` operator is used to check whether the given `STRING` is not in the `NFC` Unicode normalization form:

.Query
[source, cypher]
----
RETURN "the \u212B char" IS NOT NORMALIZED AS notNormalized
----

.Result
[role="queryresult",options="header,footer",cols="1*<m"]
|===
| notNormalized
| true
2+|Rows: 1
|===

Because the given `STRING` contains a non-normalized Unicode character (`\u212B`), and is not normalized, `true` is returned.

To normalize a `STRING`, use the xref:functions/string.adoc#functions-normalize[normalize()] function.

Note that the `IS NOT NORMALIZED` operator returns `null` when used on a non-`STRING` value.
For example, `RETURN 1 IS NOT NORMALIZED` returns `null`.


[[match-string-is-normalized-specified-normal-form]]
==== Using `IS NORMALIZED` with a specified normalization type

It is possible to define which Unicode normalization type is used (the default is `NFC`).

The available normalization types are:

* `NFC`
* `NFD`
* `NFKC`
* `NFKD`

.Query
[source, cypher]
----
WITH "the \u00E4 char" as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
myString IS NFD NORMALIZED AS nfdNormalized
----

The given `STRING` contains the Unicode character: `\u00E4`, which is considered normalized in `NFC` form, but not in `NFD` form.

.Result
[role="queryresult",options="header,footer",cols="2*<m"]
|===
| nfcNormalized | nfdNormalized
| true | false
2+|Rows: 2
|===

It is also possible to specify the normalization form when using the negated normalization operator.
For example, `RETURN "string" IS NOT NFD NORMALIZED`.

[[query-operators-temporal]]
== Temporal operators

Expand Down

0 comments on commit b55982d

Please sign in to comment.