Skip to content

Commit

Permalink
Add note for ORDER BY and guaranteed row-order (#775)
Browse files Browse the repository at this point in the history
Necessary because of changes that might be coming as a result of
parallel runtime.
  • Loading branch information
JPryce-Aklundh authored Nov 3, 2023
1 parent 70457cb commit 220a263
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 13 deletions.
5 changes: 5 additions & 0 deletions modules/ROOT/pages/clauses/clause_composition.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ The output of a clause is a new state of the graph and a new table of intermedia
The first clause takes as input the state of the graph before the query and an empty table of intermediate results.
The output of the last clause is the result of the query.

[NOTE]
====
Unless xref:clauses/order-by.adoc[] is used, Neo4j does not guarantee the row order of a query result.
====

.Table of intermediate results between read clauses
======
Expand Down
33 changes: 20 additions & 13 deletions modules/ROOT/pages/clauses/order-by.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,15 @@
[[query-order]]
= ORDER BY

`ORDER BY` is a sub-clause following `RETURN` or `WITH`, and it specifies that the output should be sorted and how.

`ORDER BY` is a sub-clause following `RETURN` or `WITH`, and it specifies how the output of a clause should be sorted.
`ORDER BY` relies on comparisons to sort the output, see xref::syntax/operators.adoc#cypher-ordering[Ordering and comparison of values].
You can sort on many different values, e.g. node/relationship properties, the node/relationship ids, or on most expressions.
If you do not specify what to sort on, there is a risk that the results are arbitrarily sorted and therefore it is best practice to be specific when using `ORDER BY`.

In terms of scope of variables, `ORDER BY` follows special rules, depending on if the projecting `RETURN` or `WITH` clause is either aggregating or `DISTINCT`.
If it is an aggregating or `DISTINCT` projection, only the variables available in the projection are available.
If the projection does not alter the output cardinality (which aggregation and `DISTINCT` do), variables available from before the projecting clause are also available.
When the projection clause shadows already existing variables, only the new variables are available.

Lastly, it is not allowed to use aggregating expressions in the `ORDER BY` sub-clause if they are not also listed in the projecting clause.
This last rule is to make sure that `ORDER BY` does not change the results, only the order of them.
[NOTE]
====
Unless `ORDER BY` is used, Neo4j does not guarantee the row order of a query result.
====

The performance of Cypher queries using `ORDER BY` on node properties can be influenced by the existence and use of an index for finding the nodes.
If the index can provide the nodes in the order requested in the query, Cypher can avoid the use of an expensive `Sort` operation.
Read more about this capability in xref::appendix/tutorials/advanced-query-tuning.adoc#advanced-query-tuning-example-index-backed-order-by[Index-backed ORDER BY].

The following graph is used for the examples below:

Expand Down Expand Up @@ -234,3 +226,18 @@ The list of names built from the `collect` aggregating function contains the nam
1+d|Rows: 1
|===

== Ordering aggregated or DISTINCT results

In terms of scope of variables, `ORDER BY` follows special rules, depending on if the projecting `RETURN` or `WITH` clause is either aggregating or `DISTINCT`.
If it is an aggregating or `DISTINCT` projection, only the variables available in the projection are available.
If the projection does not alter the output cardinality (which aggregation and `DISTINCT` do), variables available from before the projecting clause are also available.
When the projection clause shadows already existing variables, only the new variables are available.

It is also not allowed to use aggregating expressions in the `ORDER BY` sub-clause if they are not also listed in the projecting clause.
This rule is to make sure that `ORDER BY` does not change the results, only the order of them.

== ORDER BY and indexes

The performance of Cypher queries using `ORDER BY` on node properties can be influenced by the existence and use of an index for finding the nodes.
If the index can provide the nodes in the order requested in the query, Cypher can avoid the use of an expensive `Sort` operation.
Read more about this capability in xref::appendix/tutorials/advanced-query-tuning.adoc#advanced-query-tuning-example-index-backed-order-by[Index-backed ORDER BY].
6 changes: 6 additions & 0 deletions modules/ROOT/pages/clauses/unwind.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
The `UNWIND` clause makes it possible to transform any list back into individual rows.
These lists can be parameters that were passed in, previously `collect`-ed result, or other list expressions.

[NOTE]
====
Neo4j does not guarantee the row order produced by `UNWIND`.
The only clause that guarantees a specific row order is xref:clauses/order-by.adoc[].
====

Common usage of the `UNWIND` clause:

* Create distinct lists.
Expand Down

0 comments on commit 220a263

Please sign in to comment.