-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable custom aggregate functions (take 2) #529
Conversation
cc @lovasoa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thank you !
A few things:
- Can we avoid calling
init
in create_aggregate itself ? I think this is a very surprising behavior for the programmer.init
should be called every time the aggregate function is called. - Can we make the step function return the new state, like the standard reduce ? This would avoid the awkward
function init() { return {sum: 0} }
- Taking three functions as arguments makes the code using create_aggregate a little bit hard to read. Can we take an object instead ?
db.create_aggregate("json_array", {
init: () => [],
step: (state, value) => [...state, value],
finalize: state => JSON.stringify(state)
})
- Can we make init and finalize optional ? init should default to
() => null
and finalize tostate => state
. In the end we should be able to call
db.create_aggregate("js_sum", { step: (a,b) => a+b })
- Can we add memory leak tests like the ones we have in
test/test_functions_recreate.js
Yes, thank you! Bad mental model striking there. I like all your proposed changes, will do. |
A question: when an instance of That is, if I say: I'm assuming I can? If not, storing the state is tricky, because I need to account for this scenario:
I would try to answer this for myself but the web platform is a really tricky runtime so I'm asking in case you know off hand? |
The way that proper sqlite3 extensions do it is by hanging their state off the execution context like so. Currently, |
I don't think so? We're not the ones calling the step function, that's sqlite doing it
We can make We can't make (Maybe I'm not being clever enough and there is a way for both!) |
Basically it seems that the sqlite extension pattern of 'allocate a struct and stick it in the context pointer' is not going to work for us here. I wonder if using the id of the pointer returned by sqlite3_aggregate_context would be enough? Since no two functions could use the same pointer, per https://www.sqlite.org/c3ref/aggregate_context.html ?
I managed to make This branch is WIP status, and I apologize for the noise on it. I will look into getting it down to a single function argument tomorrow. |
About defaults for init and finalize: Can't we just have create_aggregate_func start like that : function create_aggregate_func<T>(
name: string,
methods: {
init?: () => T,
step: (state: T, value: any) => T,
finalize?: (state: T) => any
}
) {
let init = methods.init || (() => null);
let step = (state: T, value: any) => {
state = methods.step(state, value);
__update_sqlite_internal_state(state);
};
let finalize = methods.finalize || (state => state);
...
} About thread-safety: there is no such thing as multi-threading with concurrent access in javascript. The javascript runtime guarantees that there is a single execution thread that can access mutable javascript objects. That said, even within a single thread, there can be multiple aggregate functions that are at various stages of there execution simultaneously. The following has to work: SELECT js_agg_fn(x), js_agg_fn(y) FROM t; A test case like this one should probably be added to the tests too. |
Having to return values from db.create_aggregate(
"json_agg", {
step: function(state, val) { state = (state || []); state.push(val); return state; },
finalize: function(state) { return JSON.stringify(state); }
}
); And with modifying state (and with state defaulting to db.create_aggregate(
"json_agg", {
step: function(state, val) { state.push(val); },
finalize: function(state) { return JSON.stringify(state); }
}
); I think we might even want to make
(I also wonder if there's efficency trickery we could do to accumulate values in a typed array?) edit: standard deviation is another example of an aggregation that can be completed without accumulating all values |
No, that's ugly, you wouldn't write it like that ! 🙂 This is a very common pattern in javascript, in many state-management libraries. You would write it as
Let's just stick to what developers are used to. Reduce (sometimes called fold) is a very common pattern, and users will appreciate it if you don't try to reinvent the wheel here. |
Can we also add a test with
|
|
You can see from the changes in ac548d4 how much that simplifies the step functions |
OK, I think this is ready for re-review now. Thanks for working through this with me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me ! Can you just fix the documentation, I'll read it one last time, and we'll merge :)
Co-authored-by: Ophir LOJKINE <[email protected]>
This PR builds on #407, updating it to current HEAD, adding more tests, and documenting it.
This PR closes #204, and is composed mostly of the work that @AnyhowStep did so long ago in that thread.
#407 notes that aggregate window functions do not work, but I think that's a different issue that should be tackled in another PR.
The full list of changes follows:
create_aggregate
parseFunctionArguments
into a function that can be used by bothcreate_aggregate
andcreate_function
setFunctionResults
into a functionsqlite3_aggregate_context
functioncreate_aggregate
functioncreate_aggregate(function_name, initial_value, { step: (state, ...value) =>, finalize: (state) =>})
create_aggregate
callssqlite3_aggregate_context
to allocate 1 byte of memory per instance of the aggregate function, which it uses as a key for its state array.sqlite3_aggregate_context
documentation