Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-expression objects and fixed functional forms #193

Closed
MilesCranmer opened this issue Apr 11, 2023 · 5 comments
Closed

Multi-expression objects and fixed functional forms #193

MilesCranmer opened this issue Apr 11, 2023 · 5 comments
Assignees
Labels
feature (importance: mid) Mid-importance feature

Comments

@MilesCranmer
Copy link
Owner

MilesCranmer commented Apr 11, 2023

Right now you can fix a functional form by writing a custom objective function that splits an expression tree into subexpressions. However, this is not an elegant approach.

I think a better solution is to have a multi-expression object:

struct MultiTree{T<:Number}
    trees::Dict{Symbol,Node{T}}
end
MultiTree(; kwargs...) = MultiTree(Dict([Symbol(k) => v for (k, v) in kwargs]))

Then, for example, if I wish to find the functional form:

SymbolicRegression.@form F = f(x1, x2, x3) + g(x4, x5) / f(x1, x2, x6)

this would result in the following MultiTree:

operators = Options(; binary_operators=[+, *, /, -], unary_operators=[cos, sin], form=F) 
# Note new argument "form"

x1, x2, x3 = [Node(;feature=i) for i=1:3]

example_multi_tree = MultiTree(f=cos(x1) * 3.2, g=x2*x2 - 0.1)

The evaluation would then be:

function eval_multi_tree(mtree::MultiTree, X::Matrix{T}, options) where T
    outputs = Dict{Symbol,Vector{T}}()
    for (k, tree) in mtree.trees
        features = options.form.features[k]
        outputs[k] = eval_tree_array(tree, X[features, :], options)
    end
    return options.form(outputs)
end

For the evolution: at each mutation, perhaps one random tree of the multi-tree would be mutated or crossed-over. There would be an assumption that all individuals have the same multi-tree structure.

@ChrisRackauckas @AlCap23 @johanbluecreek what do you think of this? It could be interesting to use to evolve SINDy bases.

Hey @marcovirgolin, what do you think of something like this with multiple trees in a single individual? This is sort of similar to what Eureqa used to do.

@MilesCranmer
Copy link
Owner Author

This could also be a nice way to get vector expressions:

SymbolicRegression.@form F = [f1(x1, x2, x3), f2(x1, x2, x3), f3(x1, x2, x3)]

@MilesCranmer MilesCranmer added the feature (importance: mid) Mid-importance feature label Apr 11, 2023
@marcovirgolin
Copy link

I have done something like that in the past.
You have a good catch of mutating a single tree at the time: as the search space is larger with multi-trees, of course less likely to get multiple lucky simultaneous mutations.

Makes sense to assume all solutions have same number of trees if you know this a priori, but that needs not always need to be the case. E.g., you might want to do SR for interpretable dimensionality reduction from R^d (d original features) to R^k (k latent features), where you want k to be small to have fewer trees to interpret, but large enough to get good reconstruction accuracy.

@MilesCranmer MilesCranmer self-assigned this Aug 28, 2023
@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Aug 28, 2023

I have been thinking more about how this could work. Programmatically I think the easiest way forward is as follows.

  1. Create a way to "freeze" a set of nodes in a tree. Those nodes cannot be mutated or used in crossovers.
  2. You would use a regular Node type to store all expression information. For two subexpressions f and g, you would have root.l storing f, and root.r storing g. Since root would be labeled as frozen, it would not be reduced.
    • However, you could still crossover the contents of f with the contents of g. Just you couldn't crossover root itself.
  3. Rewrite functions like eval_tree_array, string_tree, node_to_symbolic to work with a custom functional form.

It will still take a bit of work to get this to work. But I think this is the easiest way to embed it in SymbolicRegression.jl without breaking anything for existing functionality.

@eelregit
Copy link

eelregit commented Jan 4, 2024

Found this issue after submitting a quite similar feature request for composite PySR regressor in MilesCranmer/PySR#514

For the evolution: at each mutation, perhaps one random tree of the multi-tree would be mutated or crossed-over. There would be an assumption that all individuals have the same multi-tree structure.

I wonder when different trees are degenerate, would correlated changes (mutating all sub-trees together) actually help to optimize faster, even though with a larger search space?

  1. Create a way to "freeze" a set of nodes in a tree.

Very interesting approach. If I understand correctly, would it be easy for this to enforce, say the final sub-tree is at most a bivariate function (i.e., $k\leq2$ in @marcovirgolin 's comment), with one variable being a constrained expression of the other sub-trees ($f+g/f$ in an above example)?

@MilesCranmer
Copy link
Owner Author

Added in v1.0 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature (importance: mid) Mid-importance feature
Projects
None yet
Development

No branches or pull requests

3 participants