-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Reusing parts of equation #113
Comments
Perhaps a better way of doing this is not to try to reconnect nodes, but to change the definition of Might be related to Kolmogorov Complexity: https://en.wikipedia.org/wiki/Kolmogorov_complexity. Is there any Julia package that lets us compute Kolmogov complexity of a particular sequence? |
This might be useful: https://github.com/simonschoelly/InformationDistances.jl >>> using InformationDistances
>>> compressor = LibDeflateCompressor()
>>> compressed_length(compressor, "abababababab")
6
>>> compressed_length(compressor, "ababababababab")
6
>>> compressed_length(compressor, "abababacbababab")
... # Note the "c"
17 |
#135 will get part way to solving this. |
Evolutionary aspectI think we need a few genetic operators (in a new PR):
Complexity calculations
Other utils
Crossovers
Evaluation
Printing
@Jgmedina95 would you be up for implementing any of these? |
Sorry if I haven't seen your respective version of the following code. But I couldn't find it, so I propose the following for the complexity. It counts reused nodes with the complexity of 1 and does not continue traversing
|
Hi @viktmar, function _count_nodes(tree::Node{T}, nodes_seen::ID)::Int where {T,ID}
!(ID <: Nothing) && haskey(nodes_seen, tree) && return 0
count = if tree.degree == 0
1
elseif tree.degree == 1
1 + _count_nodes(tree.l, nodes_seen)
else
1 + _count_nodes(tree.l, nodes_seen) + _count_nodes(tree.r, nodes_seen)
end
!(ID <: Nothing) && (nodes_seen[tree] = true)
return count
end Where nodes_seen is an |
Actually I take that back, it seems like |
It's working!!! Here's an example (on the using SymbolicRegression
X = rand(Float32, 2, 100)
# We will repeat this part:
y = (X[1, :] .- X[2, :] .+ 1)
# Sum of powers:
y = y .^ 1.5 .+ y .^ 2.1 .+ y .^ 0.8
# Subtract mean of y:
y = y .- mean(y)
options = SymbolicRegression.Options(;
binary_operators=(+, -, ^), # npopulations=20
complexity_of_variables=3,
maxsize=40,
constraints=[(^) => (-1, 1)],
mutationWeights=[0.048, 0.47, 0.79, 5.1, 1.7, 0.0020, 0.00023, 0.05, 0.05, 0.21],
# The third to last and second to last are make random connection, and break random connection
)
hall_of_fame = EquationSearch(X, y; niterations=4000, options=options, multithreading=true) This seems to correctly build Printed expressions use Side note: would be good to refactor the mutation weights so that it's not just a long vector of numbers... Probably best to have it as a separate struct. |
The current search is so slow with this that I think some functions are breaking hierarchies somehow. I think I think one needs to be careful about moving nodes around if those nodes are referenced by something - since they could end up a parent to their own parent! |
@MilesCranmer That is a really interesting idea! My only contact so far was during my thesis, where I built a translation from a proof tree in our proof language into a sequential proof in another language (SMTCoq). I simplified first (simple tree traversal, no caching), and then for the more complicated phases, build a table of shared sub expressions (maximally shared, everything that was used more than once got a variable), swapping out the concrete sub trees with a variable pointing to the correct entry in the table. Those variables where variables like any other. That worked well, but it was a much less dynamic situation (proof in, altered proof out) than genetic programming. My thoughts about this are:
Hope that made sense! By the way, I would be interested in working on simplifications (that would be https://github.com/SymbolicML/DynamicExpressions.jl, right?), if you would be interested :) |
PySR should be able to reuse parts of an equation.
In theory, a PySR equation (
Node
) type could have a single subtree used by multiple parts of the expression, as any two nodes can point to the same node.The
l
andr
attributes point to child nodes. These could in principle point to a node that is also pointed to by another node.However, is that the best way to reuse parts of an expression? Also, how would copying work, with the pointer structure?
Perhaps all that is needed is to add a "reconnection" mutation, and update the copying procedure to pay attention to re-used pointers.
(Any help appreciated, if someone is interested in this)
The text was updated successfully, but these errors were encountered: