Skip to content

Commit

Permalink
Add information about aggregations on zero rows (neo4j#750)
Browse files Browse the repository at this point in the history
  • Loading branch information
JPryce-Aklundh committed Sep 27, 2023
1 parent 2c7700c commit 1099570
Showing 1 changed file with 42 additions and 44 deletions.
86 changes: 42 additions & 44 deletions modules/ROOT/pages/functions/aggregating.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,13 @@
[[query-functions-aggregating]]
= Aggregating functions

== Introduction
An aggregating function performs a calculation over a set of values, returning a single value.
Aggregation can be computed over all the matching paths, or it can be further divided by introducing xref:functions/aggregating.adoc#grouping-keys[grouping keys].

Aggregating functions take a set of values and calculate an aggregated value over them.
Aggregation can be computed over all the matching paths, or it can be further divided by introducing grouping keys.
Grouping keys are non-aggregating expressions that are used to group the values going into the aggregating functions.

For example, given the following query containing two return expressions, `n` and `+count(*)+`:

[source, cypher, role=test-skip]
----
RETURN n, count(*)
----

The first, `n` is not an aggregating function, so it will be the grouping key.
The latter, `count(*)` is an aggregating function.
The matching paths will be divided into different buckets, depending on the grouping key.
The aggregating function will then be run on these buckets, calculating an aggregate value per bucket.

The input expression of an aggregating function can contain any expression, including expressions that are not grouping keys.
However, not all expressions can be composed with aggregating functions.
The example below will throw an error since `n.x`, which is not a grouping key, is combined with the aggregating function `count(*)`.
For more information, see xref:functions/aggregating.adoc#grouping-keys[Grouping keys].

[source, cypher, role=test-skip]
----
RETURN n.x + count(*)
----

To sort the result set using aggregating functions, the aggregation must be included in the `ORDER BY` sub-clause following the `RETURN` clause.

The `DISTINCT` operator works in conjunction with aggregation.
It is used to make all values unique before running them through an aggregating function.
More information about `DISTINCT` can be found in xref::syntax/operators.adoc#query-operators-aggregation[Syntax -> Aggregation operators].
[TIP]
====
To learn more about how Cypher handles aggregations performed on zero rows, refer to link:https://neo4j.com/developer/kb/understanding-aggregations-on-zero-rows//[Neo4j Knowledge Base -> Understanding aggregations on zero rows].
====

== Example graph

Expand Down Expand Up @@ -1061,29 +1035,41 @@ The sum of the two supplied Durations is returned:


[[grouping-keys]]
== Grouping keys
== Aggregating expressions and grouping keys

Aggregating expressions are expressions which contain one or more aggregating functions.
*Aggregating expressions* are expressions which contain one or more aggregating functions.
A simple aggregating expression consists of a single aggregating function.
For instance, `sum(x.a)` is an aggregating expression that only consists of the aggregating function `sum( )` with `x.a` as its argument.
Aggregating expressions are also allowed to be more complex, where the result of one or more aggregating functions are input arguments to other expressions.
For instance, `0.1 * (sum(x.a) / count(x.b))` is an aggregating expression that contains two aggregating functions, `sum( )` with `x.a` as its argument and `count( )` with `x.b` as its argument.
Both are input arguments to the division expression.

*Grouping keys* are non-aggregating expressions that are used to group the values going into the aggregating functions.
For example, given the following query containing two return expressions, `n` and `+count(*)+`:

[source, cypher, role=test-skip]
----
RETURN n, count(*)
----

For aggregating expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements.
Specifically, each sub-expression in an aggregating expression has to be either:
The first, `n` is not an aggregating function, so it will be the grouping key.
The latter, `count(*)` is an aggregating function.
The matching paths will be divided into different buckets, depending on the grouping key.
The aggregating function will then be run on these buckets, calculating an aggregate value per bucket.

* an aggregating function, e.g. `sum(x.a)`,
* a constant, e.g. `0.1`,
* a parameter, e.g. `$param`,
* a grouping key, e.g. the `a` in `RETURN a, count(*)`
* a local variable, e.g. the `x` in `count(*) + size([ x IN range(1, 10) | x ])`, or
* a sub-expression, all operands of which have to be allowed in an aggregating expression.
The input expression of an aggregating function can contain any expression, including expressions that are not grouping keys.
However, not all expressions can be composed with aggregating functions.
The example below will throw an error since `n.x`, which is not a grouping key, is combined with the aggregating function `count(*)`.

[source, cypher, role=test-skip]
----
RETURN n.x + count(*)
----

To sort the result set using aggregating functions, the aggregation must be included in the `ORDER BY` sub-clause following the `RETURN` clause.

[[grouping-key-examples]]
=== Examples of aggregating expressions
=== Examples

.Simple aggregation without any grouping keys
======
Expand Down Expand Up @@ -1222,4 +1208,16 @@ RETURN groupingKey, groupingKey - max(f.age)
| +116+ | +45+
2+d|Rows: 1
|===
======
======

=== Rules for aggregating expressions

For aggregating expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements.
Specifically, each sub-expression in an aggregating expression has to be either:

* an aggregating function, e.g. `sum(x.a)`.
* a constant, e.g. `0.1`.
* a parameter, e.g. `$param`.
* a grouping key, e.g. the `a` in `RETURN a, count(*)`.
* a local variable, e.g. the `x` in `count(*) + size([ x IN range(1, 10) | x ])`.
* a sub-expression, all operands of which have to be allowed in an aggregating expression.

0 comments on commit 1099570

Please sign in to comment.