diff --git a/modules/ROOT/pages/functions/aggregating.adoc b/modules/ROOT/pages/functions/aggregating.adoc index a6352ad13..9925a3ffd 100644 --- a/modules/ROOT/pages/functions/aggregating.adoc +++ b/modules/ROOT/pages/functions/aggregating.adoc @@ -3,39 +3,13 @@ [[query-functions-aggregating]] = Aggregating functions -== Introduction +An aggregating function performs a calculation over a set of values, returning a single value. +Aggregation can be computed over all the matching paths, or it can be further divided by introducing xref:functions/aggregating.adoc#grouping-keys[grouping keys]. -Aggregating functions take a set of values and calculate an aggregated value over them. -Aggregation can be computed over all the matching paths, or it can be further divided by introducing grouping keys. -Grouping keys are non-aggregating expressions that are used to group the values going into the aggregating functions. - -For example, given the following query containing two return expressions, `n` and `+count(*)+`: - -[source, cypher, role=test-skip] ----- -RETURN n, count(*) ----- - -The first, `n` is not an aggregating function, so it will be the grouping key. -The latter, `count(*)` is an aggregating function. -The matching paths will be divided into different buckets, depending on the grouping key. -The aggregating function will then be run on these buckets, calculating an aggregate value per bucket. - -The input expression of an aggregating function can contain any expression, including expressions that are not grouping keys. -However, not all expressions can be composed with aggregating functions. -The example below will throw an error since `n.x`, which is not a grouping key, is combined with the aggregating function `count(*)`. -For more information, see xref:functions/aggregating.adoc#grouping-keys[Grouping keys]. - -[source, cypher, role=test-skip] ----- -RETURN n.x + count(*) ----- - -To sort the result set using aggregating functions, the aggregation must be included in the `ORDER BY` sub-clause following the `RETURN` clause. - -The `DISTINCT` operator works in conjunction with aggregation. -It is used to make all values unique before running them through an aggregating function. -More information about `DISTINCT` can be found in xref::syntax/operators.adoc#query-operators-aggregation[Syntax -> Aggregation operators]. +[TIP] +==== +To learn more about how Cypher handles aggregations performed on zero rows, refer to link:https://neo4j.com/developer/kb/understanding-aggregations-on-zero-rows//[Neo4j Knowledge Base -> Understanding aggregations on zero rows]. +==== == Example graph @@ -1061,29 +1035,41 @@ The sum of the two supplied Durations is returned: [[grouping-keys]] -== Grouping keys +== Aggregating expressions and grouping keys -Aggregating expressions are expressions which contain one or more aggregating functions. +*Aggregating expressions* are expressions which contain one or more aggregating functions. A simple aggregating expression consists of a single aggregating function. For instance, `sum(x.a)` is an aggregating expression that only consists of the aggregating function `sum( )` with `x.a` as its argument. Aggregating expressions are also allowed to be more complex, where the result of one or more aggregating functions are input arguments to other expressions. For instance, `0.1 * (sum(x.a) / count(x.b))` is an aggregating expression that contains two aggregating functions, `sum( )` with `x.a` as its argument and `count( )` with `x.b` as its argument. Both are input arguments to the division expression. +*Grouping keys* are non-aggregating expressions that are used to group the values going into the aggregating functions. +For example, given the following query containing two return expressions, `n` and `+count(*)+`: + +[source, cypher, role=test-skip] +---- +RETURN n, count(*) +---- -For aggregating expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements. -Specifically, each sub-expression in an aggregating expression has to be either: +The first, `n` is not an aggregating function, so it will be the grouping key. +The latter, `count(*)` is an aggregating function. +The matching paths will be divided into different buckets, depending on the grouping key. +The aggregating function will then be run on these buckets, calculating an aggregate value per bucket. -* an aggregating function, e.g. `sum(x.a)`, -* a constant, e.g. `0.1`, -* a parameter, e.g. `$param`, -* a grouping key, e.g. the `a` in `RETURN a, count(*)` -* a local variable, e.g. the `x` in `count(*) + size([ x IN range(1, 10) | x ])`, or -* a sub-expression, all operands of which have to be allowed in an aggregating expression. +The input expression of an aggregating function can contain any expression, including expressions that are not grouping keys. +However, not all expressions can be composed with aggregating functions. +The example below will throw an error since `n.x`, which is not a grouping key, is combined with the aggregating function `count(*)`. +[source, cypher, role=test-skip] +---- +RETURN n.x + count(*) +---- + +To sort the result set using aggregating functions, the aggregation must be included in the `ORDER BY` sub-clause following the `RETURN` clause. [[grouping-key-examples]] -=== Examples of aggregating expressions +=== Examples .Simple aggregation without any grouping keys ====== @@ -1222,4 +1208,16 @@ RETURN groupingKey, groupingKey - max(f.age) | +116+ | +45+ 2+d|Rows: 1 |=== -====== \ No newline at end of file +====== + +=== Rules for aggregating expressions + +For aggregating expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements. +Specifically, each sub-expression in an aggregating expression has to be either: + +* an aggregating function, e.g. `sum(x.a)`. +* a constant, e.g. `0.1`. +* a parameter, e.g. `$param`. +* a grouping key, e.g. the `a` in `RETURN a, count(*)`. +* a local variable, e.g. the `x` in `count(*) + size([ x IN range(1, 10) | x ])`. +* a sub-expression, all operands of which have to be allowed in an aggregating expression.