Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creating SQLite aggregate functions #204

Closed
mqudsi opened this issue Jun 26, 2017 · 8 comments · Fixed by #529
Closed

Add support for creating SQLite aggregate functions #204

mqudsi opened this issue Jun 26, 2017 · 8 comments · Fixed by #529

Comments

@mqudsi
Copy link

mqudsi commented Jun 26, 2017

Currently sql.js supports creating custom SQL functions via the exposed create_function function. This allows you to use a locally-defined function from within SQLite queries, such as SELECT custom_func(col) FROM ....

SQLite also supports user-defined aggregate functions. This is supported via the same native C SQLite function that create_function uses, namely sqlite3_create_function; however, it requires the definition of two additional function pointers to be passed in to the function which are called at certain points during the aggregate calculation process.

This is all document on the SQLite page for sqlite3_create_function:

The sixth, seventh and eighth parameters, xFunc, xStep and xFinal, are pointers to C-language functions that implement the SQL function or aggregate. A scalar SQL function requires an implementation of the xFunc callback only; NULL pointers must be passed as the xStep and xFinal parameters. An aggregate SQL function requires an implementation of xStep and xFinal and NULL pointer must be passed for xFunc. To delete an existing SQL function or aggregate, pass NULL pointers for all three function callbacks.

Currently we are only defining the first of these (xFunc) for scalar function support. xStep and xFinal would also have to be defined. When a custom aggregate function is used in SQLite, xFunc is first called then xStep is called for each value being aggregated, and finally xFinal is called to finalize the results and return a result.

The definitions for xFunc and xStep are identical, while xFinal does not take a parameter besides the SQLite3 context (since it is a finalizer):

void (*xFunc)(sqlite3_context*,int,sqlite3_value**),
void (*xStep)(sqlite3_context*,int,sqlite3_value**),
void (*xFinal)(sqlite3_context*)

The create_function method in sql.js can only support custom scalar functions, a different definition would be needed to support creating aggregate functions.

In Python, scalar function support is implemented in the same way as sql.js with a single create_function method. Aggregate function support was implemented via a create_aggregate function that takes a class with methods step and finalize which are called accordingly; for sql.js I would propose a method defined along the lines of:

create_aggregate(name, func, step, final)

Where func, step, and final are function callbacks. It should be straightforward to understand and use for anyone that is remotely familiar with SQLite custom functions in C or any other language.

Thanks for considering!

@lovasoa
Copy link
Member

lovasoa commented Jun 27, 2017

This is a very interesting feature request ! I currently don’t have the time to work on this, but I would be happy to review a pull request for this.

I recommend reading the code for create_function https://github.com/kripken/sql.js/blob/master/coffee/api.coffee#L439-L483

Implementing this shouldn't be hard

@yifanwu
Copy link

yifanwu commented Feb 19, 2018

Hey @lovasoa I'm trying to implement this, and I was a bit confused by what the addFunction does here. It doesn't seem to be defined anywhere in the repo? Thanks!

@AnyhowStep
Copy link

@lovasoa
I know you're a busy person but I figured I'd put this here anyway.
I saw you recently removed CoffeeScript from the repo. Thank you for that! CoffeeScript was a pain to work with =(

If you change create_function to,

    function setFunctionResult (cx, result) {
      switch (typeof result) {
        case "boolean":
            sqlite3_result_int(cx, result ? 1 : 0);
            break;
        case "number":
            sqlite3_result_double(cx, result);
            break;
        case "string":
            sqlite3_result_text(cx, result, -1, -1);
            break;
        case "object":
            if (result === null) {
                sqlite3_result_null(cx);
            } else if (result.length != null) {
                var blobptr = allocate(result, "i8", ALLOC_NORMAL);
                sqlite3_result_blob(cx, blobptr, result.length, -1);
                _free(blobptr);
            } else {
                sqlite3_result_error(cx, (
                    "Wrong API use : tried to return a value "
                    + "of an unknown type (" + result + ")."
                ), -1);
            }
            break;
        default:
            sqlite3_result_null(cx);
      }
    }

    function parseFunctionArguments (argc, argv) {
      function extract_blob(ptr) {
        var size = sqlite3_value_bytes(ptr);
        var blob_ptr = sqlite3_value_blob(ptr);
        var blob_arg = new Uint8Array(size);
        for (var j = 0; j < size; j += 1) blob_arg[j] = HEAP8[blob_ptr + j];
        return blob_arg;
      }
      var args = [];
      for (var i = 0; i < argc; i += 1) {
          var value_ptr = getValue(argv + (4 * i), "i32");
          var value_type = sqlite3_value_type(value_ptr);
          var arg;
          if (value_type === SQLITE_INTEGER || value_type === SQLITE_FLOAT) {
              arg = sqlite3_value_double(value_ptr);
          } else if (value_type === SQLITE_TEXT) {
              arg = sqlite3_value_text(value_ptr);
          } else if (value_type === SQLITE_BLOB) {
              arg = extract_blob(value_ptr);
          } else arg = null;
          args.push(arg);
      }
      return args;
    }

    function invokeWithFunctionArguments (cx, func, args) {
      try {
          return func.apply(null, args);
      } catch (error) {
          if (error instanceof Error) {
            sqlite3_result_error(cx, customErrorMessage, -1);
          } else {
            sqlite3_result_error(cx, (typeof error == "string") ? error : String(error), -1);
            return undefined;
          }
      }
    }

    /** Register a custom function with SQLite
    @example Register a simple function
        db.create_function("addOne", function (x) {return x+1;})
        db.exec("SELECT addOne(1)") // = 2

    @param {string} name the name of the function as referenced in SQL statements.
    @param {function} func the actual function to be executed.
    @return {Database} The database object. Useful for method chaining
     */
    Database.prototype["create_function"] = function create_function(name, func, options) {
        var func_ptr;
        function wrapped_func(cx, argc, argv) {
            var args = parseFunctionArguments(argc, argv);
            var result = invokeWithFunctionArguments(cx, func, args);
            setFunctionResult(cx, result);
        }
        if (Object.prototype.hasOwnProperty.call(this.functions, name)) {
            removeFunction(this.functions[name]);
            delete this.functions[name];
        }
        // The signature of the wrapped function is :
        // void wrapped(sqlite3_context *db, int argc, sqlite3_value **argv)
        func_ptr = addFunction(wrapped_func, "viii");
        this.functions[name] = func_ptr;
        this.handleError(sqlite3_create_function_v2(
            this.db,
            name,
            //nArg
            //If this parameter is -1,
            //then the SQL function or aggregate may take any number of arguments between
            //0 and the limit set by sqlite3_limit(SQLITE_LIMIT_FUNCTION_ARG)
            (
              options.isVarArg === true ?
              -1 :
              func.length
            ),
            SQLITE_UTF8,
            0,
            func_ptr,
            0,
            0,
            0
        ));
        return this;
    };

You get a few benefits,

  1. varArg functions are now supported
  2. Parsing function arguments is abstracted away
  3. Invoking functions is abstracted away
  4. Setting the result of a function is abstracted away

Then, to implement create_aggregate,

    Database.prototype["create_aggregate"] = function create_aggregate(name, init, step, finalize) {
        let state = undefined;
        const wrapped_step = function(cx, argc, argv) {
          if (state === undefined) {
            state = init();
          }
          const args = parseFunctionArguments(argc, argv);
          invokeWithFunctionArguments(cx, step, [state, ...args]);
        };
        const wrapped_finalize = function (cx) {
          const result = invokeWithFunctionArguments(cx, finalize, [state]);
          setFunctionResult(cx, result);
          state = undefined;
        }

        if (Object.prototype.hasOwnProperty.call(this.functions, name)) {
          removeFunction(this.functions[name]);
          delete this.functions[name];
        }
        if (Object.prototype.hasOwnProperty.call(this.functions, name + "__finalize")) {
          removeFunction(this.functions[name + "__finalize"]);
          delete this.functions[name + "__finalize"];
        }
        // The signature of the wrapped function is :
        // void wrapped(sqlite3_context *db, int argc, sqlite3_value **argv)
        const step_ptr = addFunction(wrapped_step, "viii");
        // The signature of the wrapped function is :
        // void wrapped(sqlite3_context *db, int argc, sqlite3_value **argv)
        const finalize_ptr = addFunction(wrapped_finalize, "viii");
        this.functions[name] = step_ptr;
        this.functions[name + "__finalize"] = finalize_ptr;
        this.handleError(sqlite3_create_function_v2(
            this.db,
            name,
            //nArg
            //If this parameter is -1,
            //then the SQL function or aggregate may take any number of arguments between
            //0 and the limit set by sqlite3_limit(SQLITE_LIMIT_FUNCTION_ARG)
            /**
             * @todo Implement vararg aggregate function
             */
            step.length - 1,
            SQLITE_UTF8,
            //pApp
            0,
            //xFunc
            0,
            //xStep
            step_ptr,
            //xFinal
            finalize_ptr,
            //xDestroy
            0
        ));
        return this;
    };

The TypeScript signature for create_aggregate would be,

    create_aggregate<StateT> (
        functionName : string,
        init : () => StateT,
        step : (state : StateT, ...args : unknown[]) => void,
        finalize : (state : StateT) => unknown
    ) : this;

You can use it to implement STDDEV_POP,

            create_aggregate(
                "STDDEV_POP",
                () => {
                    return {
                        values : [] as number[],
                    };
                },
                (state, x) => {
                    if (x === null) {
                        return;
                    }
                    if (typeof x == "number") {
                        state.values.push(x);
                    } else {
                        throw new Error(`STDDEV_POP(${typeof x}) not implmented`);
                    }
                },
                (state) => {
                    if (state == undefined) {
                        return null;
                    }
                    if (state.values.length == 0) {
                        return null;
                    }
                    const sum = state.values.reduce(
                        (sum, value) => sum + value,
                        0
                    );
                    const count = state.values.length;
                    const avg = sum/count;
                    const squaredErrors = state.values.map(value => {
                        return Math.pow(value - avg, 2);
                    });
                    const sumSquaredErrors = squaredErrors.reduce(
                        (sumSquaredErrors, squaredError) => sumSquaredErrors + squaredError,
                        0
                    );
                    return Math.sqrt(
                        sumSquaredErrors / count
                    );
                }
            );

If someone wants to make a PR for it, go for it!
I'm kind of pre-occupied with other stuff at the moment.

@AnyhowStep
Copy link

AnyhowStep commented Mar 8, 2020

Instead of addFunction, it might be nice to have something like,

function addOrReplaceFunction (name, wrapped) {
        if (Object.prototype.hasOwnProperty.call(this.functions, name)) {
          removeFunction(this.functions[name]);
          delete this.functions[name];
        }
        // The signature of the wrapped function is :
        // void wrapped(sqlite3_context *db, int argc, sqlite3_value **argv)
        const func_ptr = addFunction(wrapped, "viii");
        this.functions[name] = func_ptr;
}

So, it can be re-used in create_function and create_aggregate

@lovasoa
Copy link
Member

lovasoa commented Mar 8, 2020

This sounds great @AnyhowStep ! I really encourage you to make a pull request ! You did most of the work anyway. You would just have to add a test and a documentation comment (and maybe mention it in the README). I'm sure this feature would benefit others !

@bokolob
Copy link

bokolob commented Oct 11, 2021

Hi guys! Why #407 is not merged yet? Can I help? :)

@llimllib
Copy link
Contributor

llimllib commented Sep 2, 2022

If I took #407, updated it, and finished the two open tasks on it, would it have any chance of landing?

I'm willing to take a whack at it but I'd like to confirm that it's the lack of tests and docs holding it up

@lovasoa
Copy link
Member

lovasoa commented Sep 2, 2022

Hi ! yes, the feature is still welcome ! I can make no promise as to my responsiveness since I don't have a lot of time on my hands for sql.js, but if the code is good, tested, and documented, it will end up merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants