Skip to content

Commit

Permalink
Implement Table.add_group_number, with operations Unique and `Equ…
Browse files Browse the repository at this point in the history
…al_Count` (#11818)

* wip

* one test

* wip

* wip

* more tests

* changelog

* docs, more tests

* switch in enso

* counter overflow

* run under db tests

* fmt

* merge

* move group + order into method

* wip

* bucket

* Revert "bucket"

This reverts commit 7ada84c.

* aliases

* review

(cherry picked from commit 739ee3a)
  • Loading branch information
GregoryTravis authored and jdunkerley committed Dec 18, 2024
1 parent e8ff34e commit a855497
Show file tree
Hide file tree
Showing 11 changed files with 557 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@
- [Added `Table.input` allowing creation of typed tables from vectors of data,
including auto parsing text columns.][11562]
- [Enhance Managed_Resource to allow implementation of in-memory caches][11577]
- [Added `add_group_number` to the in-memory database.[11818]
- [The reload button clears the HTTP cache.][11673]

[11235]: https://github.com/enso-org/enso/pull/11235
Expand All @@ -135,6 +136,7 @@
[11490]: https://github.com/enso-org/enso/pull/11490
[11562]: https://github.com/enso-org/enso/pull/11562
[11577]: https://github.com/enso-org/enso/pull/11577
[11818]: https://github.com/enso-org/enso/pull/11818
[11673]: https://github.com/enso-org/enso/pull/11673

#### Enso Language & Runtime
Expand Down
94 changes: 94 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/DB_Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import Standard.Table.Columns_To_Add.Columns_To_Add
import Standard.Table.Columns_To_Keep.Columns_To_Keep
import Standard.Table.Expression.Expression
import Standard.Table.Expression.Expression_Error
import Standard.Table.Grouping_Method.Grouping_Method
import Standard.Table.Internal.Add_Row_Number
import Standard.Table.Internal.Column_Naming_Helper.Column_Naming_Helper
import Standard.Table.Internal.Constant_Column.Constant_Column
Expand Down Expand Up @@ -926,6 +927,99 @@ type DB_Table
updated_table = renamed_table.updated_columns (renamed_table.internal_columns + [new_column])
updated_table.as_subquery

## PRIVATE add group column, group id, bucket, tile
GROUP Standard.Base.Values
ICON column_add
Adds a new column to the table enumerating groups of rows, assigning each
row to one group number. All rows in each group will get the same number.

Arguments:
- grouping_method: Specifies how to group the rows; see "Grouping
Methods", below.
- name: The name of the new column. Defaults to "Group".
- from: The starting value for the enumeration. Defaults to 0.
- step: The amount to increment the enumeration by. Defaults to 1.

? Grouping Methods

The following grouping methods are supported:
- `Unique`: Group rows by the specified columns.
- Equal_Count: Create the specified number of groups with the same
number of rows in each group (except possibly the last one).

? Ordering of rows

Note that the ordering of rows from the original table is preserved in
all cases. The grouping and ordering settings can affect how the group
numbers are assigned, depending on the grouping method. The order of
the rows itself is not changed by this operation.

! Error Conditions

- If the columns specified in `group_by` or `order_by` are not present
in the table, a `Missing_Input_Columns` error is raised.
- If the column with the same name as provided `name` already exists,
a `Duplicate_Output_Column_Names` problem is reported and the
existing column is renamed to avoid the clash.
- If grouping on floating point numbers, a `Floating_Point_Equality`
problem is reported.

> Example
Assign group numbers based on unique values of the first two columns.

## table:
x | y | z
---+---+---
1 | 0 | 2
0 | 1 | 0
1 | 2 | 0
0 | 1 | 1
1 | 0 | 1
1 | 2 | 1
table = table_builder [['x', [1, 0, 1, 0, 1, 1]], ['y', [0, 1, 2, 1, 0, 2]], ['z' [2, 0, 0, 1, 1, 1]]]
table2 = table.add_group_number (..Unique group_by=['x', 'y']) "g"
table2.at 'g' . to_vector
# => [0, 1, 2, 1, 0, 2]
## table2:
x | y | z | g
---+---+---+---
1 | 0 | 2 | 0
0 | 1 | 0 | 1
1 | 2 | 0 | 2
0 | 1 | 1 | 1
1 | 0 | 1 | 2
1 | 2 | 1 | 0

> Example
Divide rows into three groups.
## table:
x | y
---+---
1 | 5
2 | 4
3 | 3
4 | 2
5 | 1
table = table_builder [['x', [1, 2, 3, 4, 5]], ['y', [5, 4, 3, 2, 1]]]
table2 = tabble.add_group_number (..Equal_Count 3) "g"
table2.at 'g' . to_vector
# => [0, 0, 1, 1, 2]
## table2:
x | y | g
---+---+---
1 | 5 | 0
2 | 4 | 0
3 | 3 | 1
4 | 2 | 1
5 | 1 | 2
@name (Widget.Text_Input display=..Always)
@from (Widget.Numeric_Input display=..Always)
@group_by (Widget_Helpers.make_column_name_multi_selector display=..When_Modified)
@order_by (Widget_Helpers.make_order_by_selector display=..When_Modified)
add_group_number self (grouping_method:Grouping_Method=..Unique) (name:Text="Group") (from:Integer=0) (step:Integer=1) (on_problems:Problem_Behavior=..Report_Warning) -> Table =
_ = [grouping_method, name, from, step, on_problems]
Error.throw (Unsupported_Database_Operation.Error "add_group_number")


## ALIAS order_by
GROUP Standard.Base.Selections
Expand Down
3 changes: 3 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Feature.enso
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ type Feature
## PRIVATE
Catch all for tests that haven't yet been categorized correctly or use multiple features.
Integration_Tests
## PRIVATE
add a group number column to a table.
Add_Group_Number
## PRIVATE
add a row number column to a table.
Add_Row_Number
Expand Down
24 changes: 24 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Grouping_Method.enso
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from Standard.Base import all
import Standard.Base.Errors.Common.Missing_Argument

polyglot java import org.enso.table.operations.AddGroupNumber

## Specifies a method for grouping rows in `add_group_number`.
type Grouping_Method
## Group rows by the specified columns.

Arguments:
- on: Rows that have the same values for these columns will be grouped
together. At least one column must be specified.
Unique (on:(Vector | Text | Integer | Regex)=(Missing_Argument.throw "on"))

## Create the specified number of groups with the same number of rows in
each group (except possibly the last one).

Arguments
- group_count: The number of groups to divide the table into.
- order_by: (Optional.) Specifies the order in which rows should be
assigned to groups. Only affects the assignment of group numbers, not
the ordering of the output rows. Defaults to the order of the rows in
the table.
Equal_Count (group_count:Integer=(Missing_Argument.throw "group_count")) (order_by:(Vector | Text)=[])
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
private

from Standard.Base import all
import Standard.Base.Errors.Common.Unsupported_Argument_Types
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

import project.Column.Column
import project.Grouping_Method.Grouping_Method
import project.Internal.Java_Problems
import project.Internal.Problem_Builder.Problem_Builder
import project.Internal.Table_Helpers
import project.Set_Mode.Set_Mode
import project.Table.Table
from project.Internal.Add_Row_Number import rename_columns_if_needed

polyglot java import java.lang.ArithmeticException
polyglot java import org.enso.table.operations.AddGroupNumber

add_group_number (table:Table) (grouping_method:Grouping_Method) (name:Text) (from:Integer) (step:Integer) (on_problems:Problem_Behavior=..Report_Warning) -> Table =
problem_builder = Problem_Builder.new error_on_missing_columns=True

handle_arithmetic_exception _ =
Error.throw (Illegal_Argument.Error "The row number has exceeded the 64-bit integer range. BigInteger numbering is currently not supported. Please use a smaller start/step.")

Panic.catch ArithmeticException handler=handle_arithmetic_exception <| Panic.catch Unsupported_Argument_Types handler=handle_arithmetic_exception <|
Java_Problems.with_problem_aggregator on_problems java_problem_aggregator->
new_storage = case grouping_method of
Grouping_Method.Unique group_by ->
_illegal_if group_by.is_empty "..Unique requires a non-empty 'group_by'" <|
grouping = _prepare_group_by table problem_builder group_by
AddGroupNumber.numberGroupsUnique table.row_count from step grouping java_problem_aggregator
Grouping_Method.Equal_Count group_count order_by ->
_illegal_if (group_count < 1) "group_count must be at least 1" <|
ordering = _prepare_ordering table problem_builder order_by
AddGroupNumber.numberGroupsEqualCount table.row_count group_count from step (ordering.at 0) (ordering.at 1) java_problem_aggregator
new_column = Column.from_storage name new_storage
renamed_table = rename_columns_if_needed table name on_problems Table.new
problem_builder.attach_problems_before on_problems <|
renamed_table.set new_column name set_mode=Set_Mode.Add

_prepare_group_by table problem_builder group_by =
table.columns_helper.select_columns_helper group_by Case_Sensitivity.Default True problem_builder . map c->c.java_column

_prepare_ordering table problem_builder order_by =
ordering = Table_Helpers.resolve_order_by table.columns order_by problem_builder
ordering_columns = ordering.map c->c.column.java_column
directions = ordering.map c->c.associated_selector.direction.to_sign
[ordering_columns, directions]

_illegal_if b msg ~cont = if b then Error.throw (Illegal_Argument.Error msg) else cont
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,17 @@ make_join_condition_selector table display:Display=..Always cache=Nothing =
item_editor = Single_Choice display=display values=names
Vector_Editor item_editor=item_editor item_default="(..Equals "+table.column_names.first.pretty+")" display=display

## PRIVATE
make_grouping_method_selector table:Table display:Display=..Always -> Widget =
column_selector = make_column_name_selector table display=Display.Always
columns_selector = Vector_Editor item_editor=column_selector item_default=table.column_names.first.pretty display=display

unique = Option "Unique" "..Unique" [["on", columns_selector]]
equal_count = Option "Equal Count" "..Equal_Count" [["order_by", columns_selector]]
names=[unique, equal_count]

Single_Choice display=display values=names

## PRIVATE
Make a column name selector.
make_order_by_selector : Table -> Display -> Boolean -> Widget
Expand Down
2 changes: 2 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ export project.Extensions.Table_Conversions.parse_to_table
export project.Extensions.Table_Conversions.to_table
export project.Extensions.Table_Conversions.write_table

export project.Grouping_Method.Grouping_Method

export project.Headers.Headers

export project.Join_Condition.Join_Condition
Expand Down
95 changes: 95 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ import project.Delimited.Delimited_Format.Delimited_Format
import project.Expression.Expression
import project.Expression.Expression_Error
import project.Extensions.Table_Conversions
import project.Grouping_Method.Grouping_Method
import project.Internal.Add_Group_Number
import project.Internal.Add_Row_Number
import project.Internal.Add_Running
import project.Internal.Aggregate_Column_Helper
Expand Down Expand Up @@ -2323,6 +2325,99 @@ type Table
add_row_number self (name:Text="Row") (from:Integer=0) (step:Integer=1) (group_by:(Vector | Text | Integer | Regex)=[]) (order_by:(Vector | Text)=[]) (on_problems:Problem_Behavior=..Report_Warning) =
Incomparable_Values.handle_errors <| Add_Row_Number.add_row_number self name from step group_by order_by on_problems

## PRIVATE add group column, group id, bucket, tile
GROUP Standard.Base.Values
ICON column_add
Adds a new column to the table enumerating groups of rows, assigning each
row to one group number. All rows in each group will get the same number.

Arguments:
- grouping_method: Specifies how to group the rows; see "Grouping
Methods", below.
- name: The name of the new column. Defaults to "Group".
- from: The starting value for the enumeration. Defaults to 0.
- step: The amount to increment the enumeration by. Defaults to 1.

? Grouping Methods

The following grouping methods are supported:
- `Unique`: Group rows by the specified columns.
- Equal_Count: Create the specified number of groups with the same
number of rows in each group (except possibly the last one).

? Ordering of rows

Note that the ordering of rows from the original table is preserved in
all cases. The grouping and ordering settings can affect how the group
numbers are assigned, depending on the grouping method. The order of
the rows itself is not changed by this operation.

! Error Conditions

- If the columns specified in `group_by` or `order_by` are not present
in the table, a `Missing_Input_Columns` error is raised.
- If the column with the same name as provided `name` already exists,
a `Duplicate_Output_Column_Names` problem is reported and the
existing column is renamed to avoid the clash.
- If grouping on floating point numbers, a `Floating_Point_Equality`
problem is reported.

> Example
Assign group numbers based on unique values of the first two columns.

## table:
x | y | z
---+---+---
1 | 0 | 2
0 | 1 | 0
1 | 2 | 0
0 | 1 | 1
1 | 0 | 1
1 | 2 | 1
table = table_builder [['x', [1, 0, 1, 0, 1, 1]], ['y', [0, 1, 2, 1, 0, 2]], ['z' [2, 0, 0, 1, 1, 1]]]
table2 = table.add_group_number (..Unique group_by=['x', 'y']) "g"
table2.at 'g' . to_vector
# => [0, 1, 2, 1, 0, 2]
## table2:
x | y | z | g
---+---+---+---
1 | 0 | 2 | 0
0 | 1 | 0 | 1
1 | 2 | 0 | 2
0 | 1 | 1 | 1
1 | 0 | 1 | 2
1 | 2 | 1 | 0

> Example
Divide rows into three groups.
## table:
x | y
---+---
1 | 5
2 | 4
3 | 3
4 | 2
5 | 1
table = table_builder [['x', [1, 2, 3, 4, 5]], ['y', [5, 4, 3, 2, 1]]]
table2 = tabble.add_group_number (..Equal_Count 3) "g"
table2.at 'g' . to_vector
# => [0, 0, 1, 1, 2]
## table2:
x | y | g
---+---+---
1 | 5 | 0
2 | 4 | 0
3 | 3 | 1
4 | 2 | 1
5 | 1 | 2
@grouping_method (Widget_Helpers.make_grouping_method_selector display=..Always)
@name (Widget.Text_Input display=..Always)
@from (Widget.Numeric_Input display=..Always)
@group_by (Widget_Helpers.make_column_name_multi_selector display=..When_Modified)
@order_by (Widget_Helpers.make_order_by_selector display=..When_Modified)
add_group_number self (grouping_method:Grouping_Method=(Missing_Argument.throw "grouping_method")) (name:Text="Group") (from:Integer=0) (step:Integer=1) (on_problems:Problem_Behavior=..Report_Warning) -> Table =
Incomparable_Values.handle_errors <| Add_Group_Number.add_group_number self grouping_method name from step on_problems

## ALIAS add column, expression, formula, new column, update column
GROUP Standard.Base.Values
ICON column_add
Expand Down
Loading

0 comments on commit a855497

Please sign in to comment.