diff --git a/.gitignore b/.gitignore
index 2cb9c5d85..ecd0d8ac2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,8 @@
.dataset*.jl
.hyperparams*.jl
+outputs
*.csv
+*.bak
*.bkup
performance*txt
*.out
@@ -8,7 +10,7 @@ trials*
**/__pycache__
build
dist
-Manifest.toml
+Manifest*.toml
*.cov
.coveralls.yml
**/*tmp*.jl
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 000000000..170c6dfc5
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,1548 @@
+
+
+
+
+# Changelog
+
+## SymbolicRegression.jl v1.0.0
+
+Summary of major recent changes, described in more detail below:
+
+- [Changed the core expression type from `Node{T} → Expression{T,Node{T},Metadata{...}}`](#changed-the-core-expression-type-from-nodet--expressiontnodet)
+ - This gives us new features, improves user hackability, and greatly improves ergonomics!
+- [Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`)](#created-template-expressions-for-fitting-expressions-under-a-user-specified-functional-form-templateexpression--abstractexpression)
+ - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function.
+ - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! Or more generally, you can learn expressions onto any Julia struct.
+ - (Note that this still does not permit learning using non-scalar operators, though we are working on that!)
+ - Template expressions also make use of colored strings to represent each part in the printout, to improve readability.
+- [Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`)](#created-parametric-expressions-for-custom-functional-forms-with-per-class-parameters-parametricexpression--abstractexpression)
+ - This lets you fit expressions that act as _models of multiple systems_, with per-system parameters!
+- [Introduced a variety of new abstractions for user extensibility](#introduced-a-variety-of-new-abstractions-for-user-extensibility) (**and to support new research on symbolic regression!**)
+ - `AbstractExpression`, for increased flexibility in custom expression types.
+ - `mutate!` and `AbstractMutationWeights`, for user-defined mutation operators.
+ - `AbstractSearchState`, for holding custom metadata during searches.
+ - `AbstractOptions` and `AbstractRuntimeOptions`, for customizing pretty much everything else in the library via multiple dispatch. Please make an issue/PR if you would like any particular internal functions be declared `public` to enable stability across versions for your tool.
+ - Many of these were motivated to modularize the implementation of [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl), an LLM-guided version of SymbolicRegression.jl, so it can sit as a modular layer on top of SymbolicRegression.jl.
+- Fundamental improvements to the underlying evolutionary algorithm
+ - New mutation operators introduced, `swap_operands` and `rotate_tree` – both of which seem to help kick the evolution out of local optima.
+ - New hyperparameter defaults created, based on a Pareto front volume calculation, rather than simply accuracy of the best expression.
+- [Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option](#support-for-zygotejl-and-enzymejl-within-the-constant-optimizer-specified-using-the-autodiff_backend-option)
+- [Changed output file handling](#changed-output-file-handling)
+- Major refactoring of the codebase to improve readability and modularity
+- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator
+ - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches.
+ - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again.
+- Increased documentation and examples.
+- Improved progress bar.
+- StyledStrings integration.
+- Julia 1.10 is now the minimum supported Julia version.
+- [Other small features](#other-small-features-in-v100)
+- Also see the [Update Guide](#update-guide) below for more details on upgrading.
+
+Note that some of these features were recently introduced in patch releases since they were backwards compatible. I am noting them here for visibility.
+
+### Changed the core expression type from `Node{T} → Expression{T,Node{T},...}`
+
+https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326
+
+This is a breaking change in the format of expressions returned by SymbolicRegression. Now, instead of returning a `Node{T}`, SymbolicRegression will return a `Expression{T,Node{T},...}` (both from `equation_search` and from `report(mach).equations`). This type is much more convenient and high-level than the `Node` type, as it includes metadata relevant for the node, such as the operators and variable names.
+
+This means you can reliably do things like:
+
+```julia
+using SymbolicRegression: Options, Expression, Node
+
+options = Options(binary_operators=[+, -, *, /], unary_operators=[cos, exp, sin])
+operators = options.operators
+variable_names = ["x1", "x2", "x3"]
+x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3]
+
+## Use the operators directly!
+tree = cos(x1 - 3.2 * x2) - x1 * x1
+```
+
+You can then do operations with this `tree`, without needing to track `operators`:
+
+```julia
+println(tree) # Looks up the right operators based on internal metadata
+
+X = randn(3, 100)
+
+tree(X) # Call directly!
+tree'(X) # gradients of expression
+```
+
+Each time you use an operator on or between two `Expression`s that include the operator in its list, it will look up the right enum index, and create the correct `Node`, and then return a new `Expression`.
+
+You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`).
+
+### Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`)
+
+Template Expressions allow users to define symbolic expressions with a fixed structure, combining multiple sub-expressions under user-specified constraints.
+This is particularly useful for symbolic regression tasks where domain-specific knowledge or constraints must be imposed on the model's structure.
+
+This also lets you fit vector expressions using SymbolicRegression.jl, where vector components can also be shared!
+
+A `TemplateExpression` is constructed by specifying:
+
+- A named tuple of sub-expressions (e.g., `(; f=x1 - x2 * x2, g=1.5 * x3)`).
+- A structure function that defines how these sub-expressions are combined in different contexts.
+
+For example, you can create a `TemplateExpression` that enforces
+the constraint: `sin(f(x1, x2)) + g(x3)^2` - where we evolve `f` and `g` simultaneously.
+
+Let's see some code for this. First, we define some base expressions for each input feature:
+
+```julia
+using SymbolicRegression
+
+options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
+operators = options.operators
+variable_names = ["x1", "x2", "x3"]
+
+# Base expressions:
+x1 = Expression(Node{Float64}(; feature=1); operators, variable_names)
+x2 = Expression(Node{Float64}(; feature=2); operators, variable_names)
+x3 = Expression(Node{Float64}(; feature=3); operators, variable_names)
+```
+
+To build a `TemplateExpression`, we specify the structure using
+a `TemplateStructure` object. This class has several fields:
+
+- `combine`: Optional function taking a `NamedTuple` of function keys => expressions,
+ returning a single expression. Fallback method used by `get_tree`
+ on a `TemplateExpression` to generate a single `Expression`.
+- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vectors,
+ returning a single vector. Used for evaluating the expression tree.
+ You may optionally define a method with a second argument `X` for if you wish
+ to include the data matrix `X` (of shape `[num_features, num_rows]`) in the
+ computation.
+- `combine_strings`: Optional function taking a `NamedTuple` of function keys => strings,
+ returning a single string. Used for printing the expression tree.
+- `variable_constraints`: Optional `NamedTuple` that defines which variables each sub-expression is allowed to access.
+ For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`.
+
+Let's see an example:
+
+```julia
+
+# Combine f and g them into a single scalar expression:
+structure = TemplateStructure(;
+ combine_strings=e -> "sin(" * e.f * ") + (" * e.g * ")^2",
+ combine_vectors=e -> map((f, g) -> sin(f) + g * g, e.f, e.g),
+ variable_constraints = (; f=[1, 2], g=[3]), # We constrain it to f(x1, x2) and g(x3)
+)
+```
+
+This defines how the `TemplateExpression` should be evaluated numerically on a given input,
+and also how it should be represented as a string:
+
+```julia
+julia> f_example = x1 - x2 * x2; # Normal `Expression` object
+
+julia> g_example = 1.5 * x3;
+
+julia> # Create TemplateExpression from these sub-expressions:
+ st_expr = TemplateExpression((; f=f_example, g=g_example); structure, operators, variable_names);
+
+julia> st_expr # Prints using `my_structure`!
+sin(x1 - (x2 * x2)) + 1.5 * x3^2
+
+julia> st_expr([0.0; 1.0; 2.0;;]) # Combines evaluation of `f` and `g` via `my_structure`!
+1-element Vector{Float64}:
+ 8.158529015192103
+```
+
+We can also use this `TemplateExpression` in SymbolicRegression.jl searches!
+
+
+For example, say that we want to fit *vector expressions*:
+
+```julia
+using SymbolicRegression
+using MLJBase: machine, fit!, report
+```
+
+We first define our structure.
+This also has our variable mapping, which says
+we are fitting `f(x1, x2)`, `g1(x3)`, and `g2(x3)`:
+
+```julia
+structure = TemplateStructure(;
+ combine_strings=e -> "( " * e.f * " + " * e.g1 * ", " * e.f * " + " * e.g2 * " )",
+ combine_vectors=e -> map(i -> (e.f[i] + e.g1[i], e.f[i] + e.g2[i]), eachindex(e.f)),
+ variable_constraints = (; f=[1, 2], g1=[3], g2=[3]),
+)
+```
+
+Now, our dataset is a regular 2D array of inputs for `X`.
+But our `y` is actually a _vector of 2-tuples_!
+
+```julia
+X = rand(100, 3) .* 10
+
+y = [
+ (
+ sin(X[i, 1]) + X[i, 3]^2,
+ sin(X[i, 1]) + X[i, 3]
+ )
+ for i in eachindex(axes(X, 1))
+]
+```
+
+Now, since this is a vector-valued expression, we need to specify a custom `elementwise_loss` function:
+
+```julia
+elementwise_loss = ((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2
+```
+
+This reduces `y` and the predicted value of `y` returned by the structure function.
+
+Our regressor is then:
+
+```julia
+model = SRRegressor(;
+ binary_operators=(+, *),
+ unary_operators=(sin,),
+ maxsize=15,
+ elementwise_loss=elementwise_loss,
+ expression_type=TemplateExpression,
+ # Note - this is where we pass custom options to the expression type:
+ expression_options=(; structure),
+)
+
+mach = machine(model, X, y)
+fit!(mach)
+```
+
+Let's see the performance of the model:
+
+```julia
+report(mach)
+```
+
+We can also check the expression is split up correctly:
+
+```julia
+r = report(mach)
+idx = r.best_idx
+best_expr = r.equations[idx]
+best_f = get_contents(best_expr).f
+best_g1 = get_contents(best_expr).g1
+best_g2 = get_contents(best_expr).g2
+```
+
+
+
+### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`)
+
+Parametric Expressions are another example of an `AbstractExpression` with additional features than a normal `Expression`.
+This type allows SymbolicRegression.jl to fit a _parametric functional form_, rather than an expression with fixed constants.
+This is particularly useful when modeling multiple systems or categories where each may have unique parameters but share
+a common functional form and certain constants.
+
+A parametric expression is constructed with a tree represented as a `ParametricNode <: AbstractExpressionNode` – this is an alternative
+type to the usual `Node` type which includes extra fields: `is_parameter::Bool`, and `parameter::UInt16`.
+A `ParametricExpression` wraps this type and stores the actual parameter matrix (under `.metadata.parameters`) as well as
+the parameter names (under `.metadata.parameter_names`).
+
+Various internal functions have been overloaded for custom behavior when fitting parametric expressions.
+For example, `mutate_constant` will perturb a row of the parameter matrix, rather than a single parameter.
+
+When fitting a `ParametricExpression`, the `expression_options` parameter in `Options/SRRegressor`
+should include a `max_parameters` keyword, which specifies the maximum number of separate parameters
+in the functional form.
+
+
+Let's see an example of fitting a parametric expression:
+
+```julia
+using SymbolicRegression
+using Random: MersenneTwister
+using Zygote
+using MLJBase: machine, fit!, predict, report
+```
+
+Let's generate two classes of model and try to find it:
+
+```julia
+rng = MersenneTwister(0)
+X = NamedTuple{(:x1, :x2, :x3, :x4, :x5)}(ntuple(_ -> randn(rng, Float32, 30), Val(5)))
+X = (; X..., classes=rand(rng, 1:2, 30)) # Add class labels (1 or 2)
+
+# Define per-class parameters
+p1 = [0.0f0, 3.2f0]
+p2 = [1.5f0, 0.5f0]
+
+# Generate target variable y with class-specific parameters
+y = [
+ 2 * cos(X.x4[i] + p1[X.classes[i]]) + X.x1[i]^2 - p2[X.classes[i]]
+ for i in eachindex(X.classes)
+]
+```
+
+When fitting a `ParametricExpression`, it tends to be more important to set up
+an `autodiff_backend` such as `:Zygote` or `:Enzyme`, as the default backend (finite differences)
+can be too slow for the high-dimensional parameter spaces.
+
+```julia
+model = SRRegressor(
+ niterations=100,
+ binary_operators=[+, *, /, -],
+ unary_operators=[cos, exp],
+ populations=30,
+ expression_type=ParametricExpression,
+ expression_options=(; max_parameters=2), # Allow up to 2 parameters
+ autodiff_backend=:Zygote, # Use Zygote for automatic differentiation
+ parallelism=:multithreading,
+)
+
+mach = machine(model, X, y)
+
+fit!(mach)
+```
+
+The expressions are returned with the parameters:
+
+```julia
+r = report(mach);
+best_expr = r.equations[r.best_idx]
+@show best_expr
+@show get_metadata(best_expr).parameters
+```
+
+
+
+### Introduced a variety of new abstractions for user extensibility
+
+v1 introduces several new abstract types to improve extensibility.
+These allow you to define custom behaviors by leveraging Julia's multiple dispatch system.
+
+**Expression types**: `AbstractExpression`: As explained above, SymbolicRegression now works on `Expression` rather than `Node`, by default. Actually, most internal functions in SymbolicRegression.jl are now defined on `AbstractExpression`, which allows overloading behavior. The expression type used can be modified with the `expression_type` and `node_type` options in `Options`.
+
+- `expression_type`: By default, this is `Expression`, which simply stores a binary tree (`Node`) as well as the `variable_names::Vector{String}` and `operators::DynamicExpressions.OperatorEnum`. See the implementation of `TemplateExpression` and `ParametricExpression` for examples of what needs to be overloaded.
+- `node_type`: By default, this will be `DynamicExpressions.default_node_type(expression_type)`, which allows `ParametricExpression` to default to `ParametricNode` as the underlying node type.
+
+**Mutation types**: `mutate!(tree::N, member::P, ::Val{S}, mutation_weights::AbstractMutationWeights, options::AbstractOptions; kws...) where {N<:AbstractExpression,P<:PopMember,S}`, where `S` is a symbol representing the type of mutation to perform (where the symbols are taken from the `mutation_weights` fields). This allows you to define new mutation types by subtyping `AbstractMutationWeights` and creating new `mutate!` methods (simply pass the `mutation_weights` instance to `Options` or `SRRegressor`).
+
+**Search states**: `AbstractSearchState`: this is the abstract type for `SearchState` which stores the search process's state (such as the populations and halls of fame). For advanced users, you may wish to overload this to store additional state details. (For example, [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl) stores some history of the search process to feed the language model.)
+
+**Global options and full customization**: `AbstractOptions` and `AbstractRuntimeOptions`: Many functions throughout SymbolicRegression.jl take `AbstractOptions` as an input. The default assumed implementation is `Options`. However, you can subtype `AbstractOptions` to overload certain behavior.
+
+For example, if we have new options that we want to add to `Options`:
+
+```julia
+Base.@kwdef struct MyNewOptions
+ a::Float64 = 1.0
+ b::Int = 3
+end
+```
+
+we can create a combined options type that forwards properties to each corresponding type:
+
+```julia
+struct MyOptions{O<:SymbolicRegression.Options} <: SymbolicRegression.AbstractOptions
+ new_options::MyNewOptions
+ sr_options::O
+end
+const NEW_OPTIONS_KEYS = fieldnames(MyNewOptions)
+
+# Constructor with both sets of parameters:
+function MyOptions(; kws...)
+ new_options_keys = filter(k -> k in NEW_OPTIONS_KEYS, keys(kws))
+ new_options = MyNewOptions(; NamedTuple(new_options_keys .=> Tuple(kws[k] for k in new_options_keys))...)
+ sr_options_keys = filter(k -> !(k in NEW_OPTIONS_KEYS), keys(kws))
+ sr_options = SymbolicRegression.Options(; NamedTuple(sr_options_keys .=> Tuple(kws[k] for k in sr_options_keys))...)
+ return MyOptions(new_options, sr_options)
+end
+
+# Make all `Options` available while also making `new_options` accessible
+function Base.getproperty(options::MyOptions, k::Symbol)
+ if k in NEW_OPTIONS_KEYS
+ return getproperty(getfield(options, :new_options), k)
+ else
+ return getproperty(getfield(options, :sr_options), k)
+ end
+end
+
+Base.propertynames(options::MyOptions) = (NEW_OPTIONS_KEYS..., fieldnames(SymbolicRegression.Options)...)
+```
+
+These new abstractions provide users with greater flexibility in defining the structure and behavior of expressions, nodes, and the search process itself.
+These are also of course used as the basis for alternate behavior such as `ParametricExpression` and `TemplateExpression`.
+
+### Fundamental improvements to the underlying evolutionary algorithm
+
+### Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option
+
+Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients.
+
+However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use
+
+```julia
+Options(
+ autodiff_backend=:Zygote,
+)
+```
+
+to use Zygote.jl for autodiff during BFGS optimization, or even
+
+```julia
+Options(
+ autodiff_backend=:Enzyme,
+)
+```
+
+for Enzyme.jl (though Enzyme support is highly experimental).
+
+### Changed output file handling
+
+Instead of writing to a single file like `hall_of_fame_.csv`, outputs are now organized in a directory structure.
+Each run gets a unique ID (containing a timestamp and random string, e.g., `20240315_120000_x7k92p`), and outputs are saved to `outputs//`.
+Currently, only saves `hall_of_fame.csv` (and `hall_of_fame.csv.bak`), with plans to add more logs and diagnostics in this folder in future releases.
+
+The output directory can be customized via the `output_directory` option (defaults to `./outputs`).
+A custom run ID can be specified via the new `run_id` parameter passed to `equation_search` (or `SRRegressor`).
+
+### Other Small Features in v1.0.0
+
+- Support for per-variable complexity, via the `complexity_of_variables` option.
+- Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option.
+- Default `maxsize` increased from 20 to 30.
+- Default `niterations` increased from 10 to 50, as many users seem to be unaware that this is small (and meant for testing), even in publications. I think this 50 is still low, but it should be a more accurate default for those who don't tune.
+
+### Update Guide
+
+Note that most code should work without changes!
+Only if you are interacting with the return types of
+`equation_search` or `report(mach)`,
+or if you have modified any internals,
+should you need to make some changes.
+
+Also note that the "_hall of fame_" CSV file is now stored in
+a directory structure, of the form `outputs//hall_of_fame.csv`.
+This is to accommodate additional log files without polluting the current working directory.
+Multi-output runs are now stored in the format `.../hall_of_fame_output1.csv`, rather than
+the old format `hall_of_fame_{timestamp}.csv.out1`.
+
+So, the key changes are, as discussed [above](#changed-the-core-expression-type-from-nodet--expressiontnodet), the change from `Node` to `Expression` as the default type for representing expressions.
+This includes the hall of fame object returned by `equation_search`, as well as the vector of
+expressions stored in `report(mach).equations` for the MLJ interface.
+If you need to interact with the internal tree structure, you can use `get_contents(expression)` (which returns the tree of an `Expression`, or the named tuple of a `ParametricExpression` - use `get_tree` to map it to a single tree format).
+
+To access other info stored in expressions, such as the operators or variable names, use `get_metadata(expression)`.
+
+This also means that expressions are now basically self-contained.
+Functions like `eval_tree_array` no longer require options as arguments (though you can pass it to override the expression's stored options).
+This means you can also simply call the expression directly with input data (in `[n_features, n_rows]` format).
+
+Before this change, you might have written something like this:
+
+```julia
+using SymbolicRegression
+
+x1 = Node{Float64}(; feature=1)
+options = Options(; binary_operators=(+, *))
+tree = x1 * x1
+```
+
+This had worked, but only because of some spooky action at a distance behavior
+involving a global store of last-used operators!
+(Noting that `Node` simply stores an index to the operator to be lightweight.)
+
+After this change, things are much cleaner:
+
+```julia
+options = Options(; binary_operators=(+, *))
+operators = options.operators
+variable_names = ["x1"]
+x1 = Expression(Node{Float64}(; feature=1); operators, variable_names)
+
+tree = x1 * x1
+```
+
+This is now a safe and explicit construction, since `*` can lookup what operators each expression uses, and infer the right indices!
+This `operators::OperatorEnum` is a tuple of functions, so does not incur dispatch costs at runtime.
+(The `variable_names` is optional, and gets stripped during the evolution process, but is embedded when returned to the user.)
+
+We can now use this directly:
+
+```julia
+println(tree) # Uses the `variable_names`, if stored
+tree(randn(1, 50)) # Evaluates the expression using the stored operators
+```
+
+Also note that the minimum supported version of Julia is now 1.10.
+This is because Julia 1.9 and earlier have now reached end-of-life status,
+and 1.10 is the new LTS release.
+
+### Additional Notes
+
+- **Custom Loss Functions**: Continue to define these on `AbstractExpressionNode`.
+- **General Usage**: Most existing code should work with minimal changes.
+- **CI Updates**: Tests are now split into parts for faster runs, and use TestItems.jl for better scoping of test variables.
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0
+
+## SymbolicRegression.jl v0.24.5
+
+### SymbolicRegression v0.24.5
+
+[Diff since v0.24.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.4...v0.24.5)
+
+**Merged pull requests:**
+
+- ci: split up test suite into multiple runners (#311) (@MilesCranmer)
+- chore(deps): bump julia-actions/cache from 1 to 2 (#315) (@dependabot[bot])
+- CompatHelper: bump compat for DynamicQuantities to 0.14, (keep existing compat) (#317) (@github-actions[bot])
+- Use DispatchDoctor.jl to wrap entire package with `@stable` (#321) (@MilesCranmer)
+- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) (#322) (@github-actions[bot])
+- Mark more functions as stable (#323) (@MilesCranmer)
+- Allow per-variable complexity (#324) (@MilesCranmer)
+- Refactor tests to use TestItems.jl (#325) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.24.4
+
+### SymbolicRegression v0.24.4
+
+[Diff since v0.24.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.3...v0.24.4)
+
+**Merged pull requests:**
+
+- feat: use `?` for wildcard units instead of `⋅` (#307) (@MilesCranmer)
+- refactor: fix some more type instabilities (#308) (@MilesCranmer)
+- refactor: remove unused Tricks dependency (#309) (@MilesCranmer)
+- Add option to force dimensionless constants (#310) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.24.3
+
+### SymbolicRegression v0.24.3
+
+[Diff since v0.24.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.2...v0.24.3)
+
+**Merged pull requests:**
+
+- 40% speedup (for default settings) via more parallelism inside workers (#304) (@MilesCranmer)
+
+**Closed issues:**
+
+- Silence warnings for Optim.jl (#255)
+
+## SymbolicRegression.jl v0.24.2
+
+### SymbolicRegression v0.24.2
+
+[Diff since v0.24.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.1...v0.24.2)
+
+**Merged pull requests:**
+
+- Bump julia-actions/setup-julia from 1 to 2 (#300) (@dependabot[bot])
+- [pre-commit.ci] pre-commit autoupdate (#301) (@pre-commit-ci[bot])
+- A small update on examples.md for 1-based indexing (#302) (@liuyxpp)
+- Fixes for Julia 1.11 (#303) (@MilesCranmer)
+
+**Closed issues:**
+
+- API Overhaul (#187)
+- [Feature]: Training on high dimensions X (#299)
+
+## SymbolicRegression.jl v0.24.1
+
+### What's Changed
+
+- CompatHelper: bump compat for MLJModelInterface to 1.9, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/295
+- CompatHelper: bump compat for ProgressBars to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/294
+- Ensure we load ClusterManagers.jl on workers by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/297
+- Move test dependencies to test folder by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/298
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.0...v0.24.1
+
+## SymbolicRegression.jl v0.24.0
+
+### What's Changed
+
+- Experimental support for program synthesis / graph-like expressions instead of trees (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/271)
+ - **BREAKING**: many types now have a third type parameter, declaring the type of node. For example, `PopMember{T,L}` is now `PopMember{T,L,N}` for `N` the type of expression.
+ - Can now specify a `node_type` in creation of `Options`. This `node_type <: AbstractExpressionNode` can be a `GraphNode` which will result in expressions that care share nodes – and therefore have a lower complexity.
+ - Two new mutations: `form_connection` and `break_connection` – which control the merging and breaking of shared nodes in expressions. These are experimental.
+- **BREAKING**: The `Dataset` struct has had many of its field declared immutable (for memory safety). If you had relied on the mutability of the struct to set parameters after initializing it, you will need to modify your code.
+- **BREAKING**: LoopVectorization.jl moved to a package extension. Need to install it separately (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287).
+- **DEPRECATED**: Now prefer to use new keyword-based constructors for nodes:
+
+```julia
+Node{T}(feature=...) # leaf referencing a particular feature column
+Node{T}(val=...) # constant value leaf
+Node{T}(op=1, l=x1) # operator unary node, using the 1st unary operator
+Node{T}(op=1, l=x1, r=1.5) # binary unary node, using the 1st binary operator
+```
+
+rather than the previous constructors `Node(op, l, r)` and `Node(T; val=...)` (though those will still work; just with a `depwarn`).
+
+- Bumper.jl support added. Passing `bumper=true` to `Options()` will result in using bump-allocation for evaluation which can get speeds equivalent to LoopVectorization and sometimes even better due to better management of allocations. (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287)
+- Upgraded Optim.jl to 1.9.
+- Upgraded DynamicQuantities to 0.13
+- Upgraded DynamicExpressions to 0.16
+- The main search loop has been greatly refactored for readability and improved type inference. It now looks like this (down from a monolithic ~1000 line function)
+
+```julia
+function _equation_search(
+ datasets::Vector{D}, ropt::RuntimeOptions, options::Options, saved_state
+) where {D<:Dataset}
+ _validate_options(datasets, ropt, options)
+ state = _create_workers(datasets, ropt, options)
+ _initialize_search!(state, datasets, ropt, options, saved_state)
+ _warmup_search!(state, datasets, ropt, options)
+ _main_search_loop!(state, datasets, ropt, options)
+ _tear_down!(state, ropt, options)
+ return _format_output(state, ropt)
+end
+```
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.3...v0.24.0
+
+## SymbolicRegression.jl v0.23.3
+
+### SymbolicRegression v0.23.3
+
+[Diff since v0.23.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.2...v0.23.3)
+
+**Merged pull requests:**
+
+- Bump peter-evans/create-or-update-comment from 3 to 4 (#283) (@dependabot[bot])
+- Bump peter-evans/find-comment from 2 to 3 (#284) (@dependabot[bot])
+- Bump peter-evans/create-pull-request from 5 to 6 (#286) (@dependabot[bot])
+
+## SymbolicRegression.jl v0.23.2
+
+### SymbolicRegression v0.23.2
+
+[Diff since v0.23.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.1...v0.23.2)
+
+**Merged pull requests:**
+
+- Formatting overhaul (#278) (@MilesCranmer)
+- Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer)
+- Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer)
+
+**Closed issues:**
+
+- Garbage collection too passive on worker processes (#237)
+- How can I set the maximum number of nests? (#285)
+
+## SymbolicRegression.jl v0.23.1
+
+### What's Changed
+
+- Implement swap operands mutation for binary operators by @foxtran in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276
+
+### New Contributors
+
+- @foxtran made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.0...v0.23.1
+
+## SymbolicRegression.jl v0.23.0
+
+### SymbolicRegression v0.23.0
+
+[Diff since v0.22.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.5...v0.23.0)
+
+**Merged pull requests:**
+
+- Automatically set heap size hint on workers (#270) (@MilesCranmer)
+
+**Closed issues:**
+
+- How do I set up a basis function consisting of three different inputs x, y, z? (#268)
+
+## SymbolicRegression.jl v0.22.5
+
+### SymbolicRegression v0.22.5
+
+[Diff since v0.22.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.4...v0.22.5)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for DynamicQuantities to 0.7, (keep existing compat) (#259) (@github-actions[bot])
+- Create `cond` operator (#260) (@MilesCranmer)
+- Add `[compat]` entry for Documenter (#261) (@MilesCranmer)
+- CompatHelper: bump compat for DynamicQuantities to 0.10 (#264) (@github-actions[bot])
+
+## SymbolicRegression.jl v0.22.4
+
+### SymbolicRegression v0.22.4
+
+[Diff since v0.22.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.3...v0.22.4)
+
+**Merged pull requests:**
+
+- Hotfix for breaking change in Optim.jl (#256) (@MilesCranmer)
+- Fix worldage issues by avoiding `static_hasmethod` when not needed (#258) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.22.3
+
+### What's Changed
+
+- CompatHelper: bump compat for DynamicExpressions to 0.13, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/250
+- Fix type stability of deterministic mode by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/251
+- Faster random sampling of nodes by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/252
+- Faster copying of `MutationWeights` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/253
+- Hotfix for breaking change in Optim.jl by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/256
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.2...v0.22.3
+
+## SymbolicRegression.jl v0.22.2
+
+### SymbolicRegression v0.22.2
+
+[Diff since v0.22.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.1...v0.22.2)
+
+**Merged pull requests:**
+
+- Expand aqua test suite (#246) (@MilesCranmer)
+- Return more descriptive errors for poorly defined operators (#247) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.22.1
+
+### SymbolicRegression v0.22.1
+
+[Diff since v0.22.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.0...v0.22.1)
+
+## SymbolicRegression.jl v0.22.0
+
+### What's Changed
+
+- (**Algorithm modification**) Evaluate on fixed batch when building per-population hall of fame in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/243
+ - This only affects searches that use `batching=true`. It results in improved searches on large datasets, as the "winning expression" is not biased towards an expression that landed on a lucky batch.
+ - Note that this only occurs within an iteration. Evaluation on the entire dataset still happens at the end of an iteration and those loss measurements are used for absolute comparison between expressions.
+- (**Algorithm modification**) Deprecates the `fast_cycle` feature in #243. Use of this parameter will have no effect.
+ - Was removed to ease maintenance burden and because it doesn't have a use. This feature was created early on in development as a way to get parallelism within a population. It is no longer useful as you can parallelize across populations.
+- Add Aqua.jl to test suite in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/245
+- CompatHelper: bump compat for DynamicExpressions to 0.12, (keep existing compat) in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/242
+ - Is able to avoids method invalidations when using operators to construct expressions manually by modifying a global constant mapping of operator => index, rather than `@eval`-ing new operators.
+ - This only matters if you were using operators to build trees, like `x1 + x2`. All internal search code uses `Node()` explicitly to build expressions, so did not rely on method invalidation at any point.
+- Renames some parameters in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/234
+ - `npop` => `population_size`
+ - `npopulations` => `populations`
+ - This is just to match PySR's API. Also note that the deprecated parameters will still work, and there will not be a warning unless you are running with `--depwarn=yes`.
+- Ensure that `predict` uses units if trained with them in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/244
+ - If you train on a dataset that has physical units, this ensures that `MLJ.predict` will output predictions in the same units. Before this change, `MLJ.predict` would return numerical arrays with no units.
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.5...v0.22.0
+
+## SymbolicRegression.jl v0.21.5
+
+### What's Changed
+
+- Allow custom display variable names by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/240
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.4...v0.21.5
+
+## SymbolicRegression.jl v0.21.4
+
+### SymbolicRegression v0.21.4
+
+[Diff since v0.21.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.3...v0.21.4)
+
+**Closed issues:**
+
+- [Cleanup] Better implementation of batching (#88)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for LossFunctions to 0.11, (keep existing compat) (#238) (@github-actions[bot])
+- Enable compatibility with MLJTuning.jl (#239) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.21.3
+
+### What's Changed
+
+- Batching inside optimization loop + batching support for custom objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/235
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.2...v0.21.3
+
+## SymbolicRegression.jl v0.21.2
+
+### What's Changed
+
+- Allow empty string units (==1) by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/233
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.1...v0.21.2
+
+## SymbolicRegression.jl v0.21.1
+
+### What's Changed
+
+- Update DynamicExpressions.jl version by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/232
+ - Makes Zygote.jl an extension
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.0...v0.21.1
+
+## SymbolicRegression.jl v0.21.0
+
+### What's Changed
+
+- https://github.com/MilesCranmer/SymbolicRegression.jl/pull/228 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/230 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/231
+ - **Dimensional analysis** (#228)
+ - Allows you to (softly) constrain discovered expressions to those that respect physical dimensions
+ - Pass vectors of DynamicQuantities.jl `Quantity` type to the MLJ interface.
+ - OR, specify `X_units`, `y_units` to low-level `equation_search`.
+ - **Printing improvements** (#228)
+ - By default, only 5 significant digits are now printed, rather than the entire float. You can change this with the `print_precision` option.
+ - In the default printed equations, `x₁` is used rather than `x1`.
+ - `y =` is printed at the start (or `y₁ =` for multi-output). With units this becomes, for example, `y[kg] =`.
+ - **Misc**
+ - Easier to convert from MLJ interface to SymbolicUtils (via `node_to_symbolic(::Node, ::AbstractSRRegressor)`) (#228)
+ - Improved precompilation (#228)
+ - Various performance and type stability improvements (#228)
+ - Inlined the recording option to speedup compilation (#230)
+ - Updated tutorials to use MLJ rather than low-level interface (#228)
+ - Moved JSON3.jl to extension (#231)
+ - Use PackageExtensionsCompat.jl over Requires.jl (#231)
+ - Require LossFunctions.jl to be 0.10 (#231)
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.20.0...v0.21.0
+
+## SymbolicRegression.jl v0.20.0
+
+### SymbolicRegression v0.20.0
+
+[Diff since v0.19.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.1...v0.20.0)
+
+**Closed issues:**
+
+- [Feature]: MLJ integration (#225)
+
+**Merged pull requests:**
+
+- MLJ Integration (#226) (@MilesCranmer, @OkonSamuel)
+
+## SymbolicRegression.jl v0.19.1
+
+### SymbolicRegression v0.19.1
+
+[Diff since v0.19.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.0...v0.19.1)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for StatsBase to 0.34, (keep existing compat) (#202) (@github-actions[bot])
+- (Soft deprecation) change `varMap` to `variable_names` (#219) (@MilesCranmer)
+- (Soft deprecation) rename `EquationSearch` to `equation_search` (#222) (@MilesCranmer)
+- Fix equation splitting for unicode variables (#223) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.19.0
+
+### What's Changed
+
+- Time to load improved by 40% with the following changes in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/215
+ - Moved SymbolicUtils.jl to extension/Requires.jl
+ - Removed StaticArrays.jl as a dependency and implement tiny version of MVector
+ - Removed `@generated` functions
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.18.0...v0.19.0
+
+## SymbolicRegression.jl v0.18.0
+
+### SymbolicRegression v0.18.0
+
+[Diff since v0.17.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.1...v0.18.0)
+
+**Merged pull requests:**
+
+- Overload ^ if user passes explicitly (#201) (@MilesCranmer)
+- Upgrade DynamicExpressions to 0.8; LossFunctions to 0.10 (#206) (@github-actions[bot])
+- Show expressions evaluated per second (#209) (@MilesCranmer)
+- Cache complexity of expressions whenever possible (#210) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.17.1
+
+### SymbolicRegression v0.17.1
+
+[Diff since v0.17.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.0...v0.17.1)
+
+**Merged pull requests:**
+
+- Faster custom losses (#197) (@MilesCranmer)
+- Migrate from SnoopPrecompile to PrecompileTools (#198) (@timholy)
+
+## SymbolicRegression.jl v0.17.0
+
+### SymbolicRegression v0.17.0
+
+[Diff since v0.16.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.3...v0.17.0)
+
+**Closed issues:**
+
+- troubles in pysr.install() (#196)
+
+**Merged pull requests:**
+
+- Multiple refactors: arbitrary data in `Dataset`, separate mutation weight conditioning, fix data races, cleaner API (#190) (@MilesCranmer)
+- CompatHelper: bump compat for DynamicExpressions to 0.6, (keep existing compat) (#194) (@github-actions[bot])
+
+## SymbolicRegression.jl v0.16.3
+
+### SymbolicRegression v0.16.3
+
+[Diff since v0.16.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.2...v0.16.3)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for SymbolicUtils to 1, (keep existing compat) (#168) (@github-actions[bot])
+
+## SymbolicRegression.jl v0.16.2
+
+### What's Changed
+
+- Turn off simplification when constraints given by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/189
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.1...v0.16.2
+
+## SymbolicRegression.jl v0.16.1
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.0...v0.16.1
+
+## SymbolicRegression.jl v0.16.0
+
+### SymbolicRegression v0.16.0
+
+[Diff since v0.15.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.3...v0.16.0)
+
+**Closed issues:**
+
+- Partially fixed trees (#166)
+- Settings of `addprocs` (#180)
+- Equation printout should split into multiple lines (#182)
+
+**Merged pull requests:**
+
+- Force safe closing of threads (#175) (@MilesCranmer)
+- Abstract number support (#183) (@MilesCranmer)
+- Include datetime in default filename (#185) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.15.3
+
+### What's Changed
+
+- Re-compute losses for warm start by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/177
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.2...v0.15.3
+
+## SymbolicRegression.jl v0.15.2
+
+### What's Changed
+
+- Include depth check in `check_constraints` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/172
+- Fix data race in state saving by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/173
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.1...v0.15.2
+
+## SymbolicRegression.jl v0.15.1
+
+### What's Changed
+
+- Fix bug in constraint checking by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/171
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.0...v0.15.1
+
+## SymbolicRegression.jl v0.15.0
+
+### What's Changed
+
+- Fully-customizable training objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/143
+- Safely catch non-readable stdin stream by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/169
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.5...v0.15.0
+
+## SymbolicRegression.jl v0.14.5
+
+### SymbolicRegression v0.14.5
+
+[Diff since v0.14.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.4...v0.14.5)
+
+**Closed issues:**
+
+- Large test output (#159)
+
+**Merged pull requests:**
+
+- Quiet progress bar during CI (#160) (@MilesCranmer)
+- Proper SnoopCompilation (#161) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.14.4
+
+### SymbolicRegression v0.14.4
+
+[Diff since v0.14.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.3...v0.14.4)
+
+**Merged pull requests:**
+
+- Refactor monitoring of resources (#158) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.14.3
+
+### SymbolicRegression v0.14.3
+
+[Diff since v0.14.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.2...v0.14.3)
+
+**Merged pull requests:**
+
+- Turn off safe operators for turbo=true (#156) (@MilesCranmer)
+- Use `ProgressBars.jl` instead of copying (#157) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.14.2
+
+### SymbolicRegression v0.14.2
+
+[Diff since v0.14.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.1...v0.14.2)
+
+## SymbolicRegression.jl v0.14.1
+
+### SymbolicRegression v0.14.1
+
+[Diff since v0.14.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.0...v0.14.1)
+
+**Merged pull requests:**
+
+- Do optimizations as a low-probability mutation (#154) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.14.0
+
+### SymbolicRegression v0.14.0
+
+[Diff since v0.13.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.3...v0.14.0)
+
+**Merged pull requests:**
+
+- Add `@extend_operators` from DynamicExpressions.jl v0.4.0 (#153) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.13.3
+
+### SymbolicRegression v0.13.3
+
+[Diff since v0.13.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.3)
+
+**Merged pull requests:**
+
+- 30% speed up by using LoopVectorization in DynamicExpressions.jl (#151) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.13.2
+
+- Allow strings to be passed for the `parallelism` argument of EquationSearch (e.g., `"multithreading"` instead of `:multithreading`). This is to allow compatibility with PyJulia calls, which can't pass symbols.
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.2
+
+## SymbolicRegression.jl v0.13.1
+
+### SymbolicRegression v0.13.1
+
+[Diff since v0.13.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.0...v0.13.1)
+
+**Merged pull requests:**
+
+- Refactor mutation probabilities (#140) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.13.0
+
+### SymbolicRegression v0.13.0
+
+[Diff since v0.12.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.6...v0.13.0)
+
+**Merged pull requests:**
+
+- Split codebase in two: DynamicExpressions.jl and SymbolicRegression.jl (#147) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.12.6
+
+### SymbolicRegression v0.12.6
+
+[Diff since v0.12.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.5...v0.12.6)
+
+**Closed issues:**
+
+- [Feature] Integration of Existing Knowledge (#139)
+- Search fidelity is much worse after v0.12.3 (#148)
+
+**Merged pull requests:**
+
+- Fix search performance problem #148 (#149) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.12.5
+
+### SymbolicRegression v0.12.5
+
+[Diff since v0.12.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.4...v0.12.5)
+
+## SymbolicRegression.jl v0.12.4
+
+### SymbolicRegression v0.12.4
+
+[Diff since v0.12.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.3...v0.12.4)
+
+**Merged pull requests:**
+
+- Create logo (#145) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.12.3
+
+### SymbolicRegression v0.12.3
+
+[Diff since v0.12.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.2...v0.12.3)
+
+**Merged pull requests:**
+
+- Even faster evaluation (#144) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.12.2
+
+### SymbolicRegression v0.12.2
+
+[Diff since v0.12.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.1...v0.12.2)
+
+**Closed issues:**
+
+- How to fix a number of variables in predicted equations (#130)
+
+**Merged pull requests:**
+
+- Fast evaluation for constant trees (#129) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.12.1
+
+### SymbolicRegression v0.12.1
+
+[Diff since v0.12.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.0...v0.12.1)
+
+## SymbolicRegression.jl v0.12.0
+
+### What's Changed
+
+- Use functions returning NaN on branch cuts instead of abs (issue #109) by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/123
+ - By returning NaN, an expression will have infinite loss - this will make the expression search simply avoid expressions that hit out-of-domain errors, rather than using `abs` everywhere which results in fundamentally different functional forms.
+- Generalize `Node{T}` type to non-floats by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/122
+ - Will eventually enable integer-only expression searches
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.11.1...v0.12.0
+
+## SymbolicRegression.jl v0.11.1
+
+### What's Changed
+
+- Generalize expressions to have arbitrary constant types by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/119
+- Optimizer options by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121
+- Fix recorder when `Inf` appears as loss for expression
+- Fix normalization when dataset has zero variance: https://github.com/MilesCranmer/SymbolicRegression.jl/commit/85f4909e8156ba8ff6cf89122371901a13df5688
+- Set default parsimony to 0.0
+
+### New Contributors
+
+- @johanbluecreek made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.10.2...v0.11.1
+
+## SymbolicRegression.jl v0.10.2
+
+### SymbolicRegression v0.10.2
+
+[Diff since v0.9.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.7...v0.10.2)
+
+**Merged pull requests:**
+
+- Update losses.md (#114) (@pitmonticone)
+- Set `timeout-minutes` for CI (#116) (@rikhuijzer)
+
+## SymbolicRegression.jl v0.9.7
+
+### SymbolicRegression v0.9.7
+
+[Diff since v0.9.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.6...v0.9.7)
+
+## SymbolicRegression.jl v0.9.6
+
+### SymbolicRegression v0.9.6
+
+[Diff since v0.9.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.5...v0.9.6)
+
+## SymbolicRegression.jl v0.9.5
+
+### What's Changed
+
+- Add deterministic option in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/108
+- Fix issue with infinite while loop due to numerical precision
+
+**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.3...v0.9.5
+
+## SymbolicRegression.jl v0.9.3
+
+### SymbolicRegression v0.9.3
+
+[Diff since v0.9.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.2...v0.9.3)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for LossFunctions to 0.8, (keep existing compat) (#106) (@github-actions[bot])
+
+## SymbolicRegression.jl v0.9.2
+
+### SymbolicRegression v0.9.2
+
+[Diff since v0.9.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.0...v0.9.2)
+
+**Closed issues:**
+
+- Q : recording # of function calls (#74)
+- Mangled name from @FromFile displayed in docs (#78)
+- Consistent snake_case vs CamelCase (#85)
+
+**Merged pull requests:**
+
+- Apply Blue formatting + change all internal methods to snake_case (#100) (@MilesCranmer)
+- Limiting max evaluations (#104) (@MilesCranmer)
+- Custom complexities of operators, variables, and constants (#105) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.9.0
+
+### SymbolicRegression v0.9.0
+
+[Diff since v0.8.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.7...v0.9.0)
+
+**Closed issues:**
+
+- Update SymbolicUtils (#98)
+
+**Merged pull requests:**
+
+- Bump SymbolicUtils.jl to 0.19 (#84) (@ChrisRackauckas)
+
+## SymbolicRegression.jl v0.8.7
+
+### SymbolicRegression v0.8.7
+
+[Diff since v0.8.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.6...v0.8.7)
+
+## SymbolicRegression.jl v0.8.6
+
+### SymbolicRegression v0.8.6
+
+[Diff since v0.8.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.5...v0.8.6)
+
+**Merged pull requests:**
+
+- Switch from FromFile.jl to traditional module system (#95) (@MilesCranmer)
+- Add constraints on the number of times operators can be nested (#96) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.8.5
+
+### SymbolicRegression v0.8.5
+
+[Diff since v0.8.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.3...v0.8.5)
+
+**Closed issues:**
+
+- [CLEANUP] Default settings (#72)
+- forcing variables to regression (#87)
+
+**Merged pull requests:**
+
+- Autodiff for equations (#39) (@kazewong)
+- fix worker connection timeout error (#91) (@CharFox1)
+- Automatic multi-node compute setup by passing custom `addprocs` (#94) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.8.3
+
+### SymbolicRegression v0.8.3
+
+[Diff since v0.8.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.2...v0.8.3)
+
+## SymbolicRegression.jl v0.8.2
+
+### SymbolicRegression v0.8.2
+
+[Diff since v0.8.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.1...v0.8.2)
+
+**Closed issues:**
+
+- Interactive regression / printing epochs (#80)
+
+## SymbolicRegression.jl v0.8.1
+
+### SymbolicRegression v0.8.1
+
+[Diff since v0.7.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.13...v0.8.1)
+
+**Closed issues:**
+
+- [BUG] Domain errors (#71)
+- [Performance] Single evaluation results (#73)
+
+**Merged pull requests:**
+
+- Refactoring PopMember + adding adaptive parsimony to tournament (#75) (@MilesCranmer)
+- Introduce better default hyperparameters (#76) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.7.13
+
+### SymbolicRegression v0.7.13
+
+[Diff since v0.7.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.10...v0.7.13)
+
+## SymbolicRegression.jl v0.7.10
+
+### SymbolicRegression v0.7.10
+
+[Diff since v0.7.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.9...v0.7.10)
+
+## SymbolicRegression.jl v0.7.9
+
+### SymbolicRegression v0.7.9
+
+[Diff since v0.7.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.8...v0.7.9)
+
+## SymbolicRegression.jl v0.7.8
+
+### SymbolicRegression v0.7.8
+
+[Diff since v0.7.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.7...v0.7.8)
+
+**Closed issues:**
+
+- Tournament selection p (#68)
+
+**Merged pull requests:**
+
+- Fix tournament samples (#70) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.7.7
+
+### SymbolicRegression v0.7.7
+
+[Diff since v0.7.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.6...v0.7.7)
+
+## SymbolicRegression.jl v0.7.6
+
+### SymbolicRegression v0.7.6
+
+[Diff since v0.7.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.5...v0.7.6)
+
+**Closed issues:**
+
+- Parsimony interference in pareto frontier (#66)
+- DomainError when computing pareto curve (#67)
+
+## SymbolicRegression.jl v0.7.5
+
+### SymbolicRegression v0.7.5
+
+[Diff since v0.7.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.4...v0.7.5)
+
+## SymbolicRegression.jl v0.7.4
+
+### SymbolicRegression v0.7.4
+
+[Diff since v0.7.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.3...v0.7.4)
+
+**Closed issues:**
+
+- Base.print (#64)
+
+## SymbolicRegression.jl v0.7.3
+
+### SymbolicRegression v0.7.3
+
+[Diff since v0.7.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.2...v0.7.3)
+
+## SymbolicRegression.jl v0.7.2
+
+### SymbolicRegression v0.7.2
+
+[Diff since v0.7.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.1...v0.7.2)
+
+## SymbolicRegression.jl v0.7.1
+
+### SymbolicRegression v0.7.1
+
+[Diff since v0.7.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.0...v0.7.1)
+
+**Merged pull requests:**
+
+- CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#56) (@github-actions[bot])
+
+## SymbolicRegression.jl v0.7.0
+
+### SymbolicRegression v0.7.0
+
+[Diff since v0.6.19](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.19...v0.7.0)
+
+**Closed issues:**
+
+- Switching from Float to UInt8 ? (#58)
+
+**Merged pull requests:**
+
+- Revert to SymbolicUtils.jl 0.6 (#60) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.6.19
+
+### SymbolicRegression v0.6.19
+
+[Diff since v0.6.18](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.18...v0.6.19)
+
+## SymbolicRegression.jl v0.6.18
+
+### SymbolicRegression v0.6.18
+
+[Diff since v0.6.17](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.17...v0.6.18)
+
+## SymbolicRegression.jl v0.6.17
+
+### SymbolicRegression v0.6.17
+
+[Diff since v0.6.16](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.16...v0.6.17)
+
+**Closed issues:**
+
+- Can't define options as listed in Tutorial, causes Method Error. (#54)
+- Using recorder to only track specific information? (#55)
+
+## SymbolicRegression.jl v0.6.16
+
+### SymbolicRegression v0.6.16
+
+[Diff since v0.6.15](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.15...v0.6.16)
+
+**Merged pull requests:**
+
+- Expand compatibility to other SymbolicUtils.jl versions (#53) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.6.15
+
+### SymbolicRegression v0.6.15
+
+[Diff since v0.6.14](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.14...v0.6.15)
+
+**Closed issues:**
+
+- Unsatisfiable requirements detected for package SymbolicUtils (#51)
+
+**Merged pull requests:**
+
+- SymbolicUtils v0.18 (#50) (@AlCap23)
+
+## SymbolicRegression.jl v0.6.14
+
+### SymbolicRegression v0.6.14
+
+[Diff since v0.6.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.13...v0.6.14)
+
+**Closed issues:**
+
+- nested task error (#43)
+- MethodError: Cannot `convert` an object of type SymbolicUtils.Term{Number, Nothing} to an object of type SymbolicUtils.Pow{Number, SymbolicUtils.Term{Number, Nothing}, Float32, Nothing} (#44)
+
+## SymbolicRegression.jl v0.6.13
+
+### SymbolicRegression v0.6.13
+
+[Diff since v0.6.12](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.12...v0.6.13)
+
+## SymbolicRegression.jl v0.6.12
+
+### SymbolicRegression v0.6.12
+
+[Diff since v0.6.11](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.11...v0.6.12)
+
+**Closed issues:**
+
+- Options.npopulations = nothing, does not detect number of cores (#38)
+
+**Merged pull requests:**
+
+- Fix index functions in SymbolicUtils (#40) (@MilesCranmer)
+
+## SymbolicRegression.jl v0.6.11
+
+### SymbolicRegression v0.6.11
+
+[Diff since v0.6.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.10...v0.6.11)
+
+**Merged pull requests:**
+
+- Updates for SymbolicUtils 0.13 (#37) (@AlCap23)
+
+## SymbolicRegression.jl v0.6.10
+
+### SymbolicRegression v0.6.10
+
+[Diff since v0.6.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.9...v0.6.10)
+
+**Closed issues:**
+
+- Saving equations throughout runtime (#33)
+
+**Merged pull requests:**
+
+- Add multithreading as alternative to distributed (#34) (@MilesCranmer)
+- Allow infinities in recorder (#36) (@cobac)
+
+## SymbolicRegression.jl v0.6.9
+
+### SymbolicRegression v0.6.9
+
+[Diff since v0.6.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.8...v0.6.9)
+
+## SymbolicRegression.jl v0.6.8
+
+### SymbolicRegression v0.6.8
+
+[Diff since v0.6.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.7...v0.6.8)
+
+## SymbolicRegression.jl v0.6.7
+
+### SymbolicRegression v0.6.7
+
+[Diff since v0.6.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.6...v0.6.7)
+
+## SymbolicRegression.jl v0.6.6
+
+### SymbolicRegression v0.6.6
+
+[Diff since v0.6.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.5...v0.6.6)
+
+## SymbolicRegression.jl v0.6.5
+
+### SymbolicRegression v0.6.5
+
+[Diff since v0.6.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.4...v0.6.5)
+
+## SymbolicRegression.jl v0.6.4
+
+### SymbolicRegression v0.6.4
+
+[Diff since v0.6.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.3...v0.6.4)
+
+## SymbolicRegression.jl v0.6.3
+
+### SymbolicRegression v0.6.3
+
+[Diff since v0.6.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.2...v0.6.3)
+
+## SymbolicRegression.jl v0.6.2
+
+### SymbolicRegression v0.6.2
+
+[Diff since v0.6.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.1...v0.6.2)
+
+**Closed issues:**
+
+- Data recorder (#27)
+- Long-running parallel jobs have small percentage of processes hang (#28)
+
+## SymbolicRegression.jl v0.6.1
+
+### SymbolicRegression v0.6.1
+
+[Diff since v0.6.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.0...v0.6.1)
+
+**Merged pull requests:**
+
+- Recorder and improved tournament selection (#29) (@MilesCranmer)
diff --git a/Project.toml b/Project.toml
index 9c5400593..ba95ee1f4 100644
--- a/Project.toml
+++ b/Project.toml
@@ -21,11 +21,12 @@ Optim = "429524aa-4258-5aef-a3af-852621145aeb"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
-ProgressBars = "49802e3a-d2f1-5c88-81d8-b72133a6f568"
+ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
+StyledStrings = "f489334b-da3d-4c2e-b8f0-e476e12c162b"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
[weakdeps]
@@ -40,13 +41,13 @@ SymbolicRegressionSymbolicUtilsExt = "SymbolicUtils"
[compat]
ADTypes = "^1.4.0"
-Compat = "^4.2"
+Compat = "^4.16"
ConstructionBase = "<1.5.7"
Dates = "1"
DifferentiationInterface = "0.5, 0.6"
-DispatchDoctor = "0.4"
+DispatchDoctor = "^0.4.17"
Distributed = "<0.0.1, 1"
-DynamicExpressions = "1"
+DynamicExpressions = "1.4"
DynamicQuantities = "1"
Enzyme = "0.12"
JSON3 = "1"
@@ -58,16 +59,12 @@ Optim = "~1.8, ~1.9"
Pkg = "<0.0.1, 1"
PrecompileTools = "1"
Printf = "<0.0.1, 1"
-ProgressBars = "~1.4, ~1.5"
+ProgressMeter = "1.10"
Random = "<0.0.1, 1"
Reexport = "1"
SpecialFunctions = "0.10.1, 1, 2"
StatsBase = "0.33, 0.34"
+StyledStrings = "1"
SymbolicUtils = "0.19, ^1.0.5, 2, 3"
TOML = "<0.0.1, 1"
julia = "1.10"
-
-[extras]
-Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9"
-JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
-SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"
diff --git a/docs/Project.toml b/docs/Project.toml
index 7a01b4d6a..6399bf082 100644
--- a/docs/Project.toml
+++ b/docs/Project.toml
@@ -2,6 +2,7 @@
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DynamicExpressions = "a40a106e-89c9-4ca8-8020-a735e8728b6b"
Gumbo = "708ec375-b3d6-5a57-a7ce-8257bf98657a"
+Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"
[compat]
diff --git a/docs/make.jl b/docs/make.jl
index b0546821b..336f2fcb0 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -17,12 +17,9 @@ using SymbolicRegression:
@extend_operators
using DynamicExpressions
-DocMeta.setdocmeta!(
- SymbolicRegression, :DocTestSetup, :(using LossFunctions); recursive=true
-)
-DocMeta.setdocmeta!(
- SymbolicRegression, :DocTestSetup, :(using DynamicExpressions); recursive=true
-)
+include("utils.jl")
+process_literate_blocks("test")
+process_literate_blocks("examples")
readme = open(dirname(@__FILE__) * "/../README.md") do io
read(io, String)
@@ -88,6 +85,12 @@ open(dirname(@__FILE__) * "/src/index.md", "w") do io
write(io, index_base)
end
+DocMeta.setdocmeta!(
+ SymbolicRegression,
+ :DocTestSetup,
+ :(using LossFunctions, DynamicExpressions);
+ recursive=true,
+)
makedocs(;
sitename="SymbolicRegression.jl",
authors="Miles Cranmer",
@@ -100,7 +103,10 @@ makedocs(;
pages=[
"Contents" => "index_base.md",
"Home" => "index.md",
- "Examples" => "examples.md",
+ "Examples" => [
+ "Short Examples" => "examples.md",
+ "Template Expressions" => "examples/template_expression.md",
+ ],
"API" => "api.md",
"Losses" => "losses.md",
"Types" => "types.md",
@@ -133,9 +139,11 @@ apply_to_a_href!(html.root) do element
element.attributes["href"] = "#LossFunctions." * element.children[1].text
end
-# Then, we write the new html to the file:
+# Then, we write the new html to the file, only if it has changed:
open("docs/build/losses/index.html", "w") do io
write(io, string(html))
end
-deploydocs(; repo="github.com/MilesCranmer/SymbolicRegression.jl.git")
+if !haskey(ENV, "JL_LIVERELOAD")
+ deploydocs(; repo="github.com/MilesCranmer/SymbolicRegression.jl.git")
+end
diff --git a/docs/src/examples.md b/docs/src/examples.md
index 762970a9a..106d526a3 100644
--- a/docs/src/examples.md
+++ b/docs/src/examples.md
@@ -230,26 +230,39 @@ Note that you can also search for dimensionless units by settings
## 7. Working with Expressions
-Expressions in `SymbolicRegression.jl` are represented using the `Expression` type, which combines the raw `Node` type with an `OperatorEnum`. This allows for more flexible and powerful expression manipulation and evaluation.
-
-Here's an example:
+Expressions in `SymbolicRegression.jl` are represented using the `Expression{T,Node{T},...}` type, which provides a more robust way to combine structure, operators, and constraints. Here's an example:
```julia
using SymbolicRegression
-# Define options with operators
-options = Options(; binary_operators=[+, -, *], unary_operators=[cos])
+# Define options with operators and structure
+options = Options(
+ binary_operators=[+, -, *],
+ unary_operators=[cos],
+ expression_options=(
+ structure=TemplateStructure(),
+ variable_constraints=Dict(1 => [1, 2], 2 => [2])
+ )
+)
-# Create expression nodes
+# Create expression nodes with constraints
operators = options.operators
variable_names = ["x1", "x2"]
-x1 = Expression(Node{Float64}(feature=1); operators, variable_names)
-x2 = Expression(Node{Float64}(feature=2); operators, variable_names)
+x1 = Expression(
+ Node{Float64}(feature=1),
+ operators=operators,
+ variable_names=variable_names,
+ structure=options.expression_options.structure
+)
+x2 = Expression(
+ Node{Float64}(feature=2),
+ operators=operators,
+ variable_names=variable_names,
+ structure=options.expression_options.structure
+)
-# Construct an expression using the operators from options
+# Construct and evaluate expression
expr = x1 * cos(x2 - 3.2)
-
-# Evaluate the expression directly
X = rand(Float64, 2, 100)
output = expr(X)
```
@@ -330,3 +343,100 @@ to browse the documentation for the Python frontend
[PySR](http://astroautomata.com/PySR), which has additional documentation.
In particular, the [tuning page](http://astroautomata.com/PySR/tuning)
is useful for improving search performance.
+
+## 10. Template Expressions
+
+Template expressions allow you to define structured expressions where different parts can be constrained to use specific variables. In this example, we'll create expressions that output pairs of values.
+
+First, let's set up our basic configuration:
+
+```julia
+using SymbolicRegression
+using Random: rand
+using MLJBase: machine, fit!, report
+
+options = Options(
+ binary_operators=(+, *, /, -),
+ unary_operators=(sin, cos)
+)
+operators = options.operators
+variable_names = ["x1", "x2", "x3"]
+```
+
+Now we'll create base expressions for each variable:
+
+```julia
+x1, x2, x3 = [
+ Expression(
+ Node{Float64}(feature=i);
+ operators=operators,
+ variable_names=variable_names
+ )
+ for i in 1:3
+]
+```
+
+The key part is defining our template structure. This determines how different parts of the expression combine:
+
+```julia
+structure = TemplateStructure{(:f, :g1, :g2)}(;
+ # Define how to combine vectors of evaluated expressions
+ combine_vectors=e -> map(
+ (f, g1, g2) -> (f + g1, f + g2),
+ e.f, e.g1, e.g2
+ ),
+ # Define how to combine strings for printing
+ combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2) )",
+ # Constrain which variables can be used in each part
+ variable_constraints=(; f=[1, 2], g1=[3], g2=[3])
+)
+```
+
+Let's generate some example data:
+
+```julia
+X = rand(100, 3) .* 10
+# Create pairs of target expressions
+y = [
+ (sin(X[i, 1]) + X[i, 3]^2, sin(X[i, 1]) + X[i, 3])
+ for i in eachindex(axes(X, 1))
+]
+```
+
+Now we can set up and train our model:
+
+```julia
+model = SRRegressor(;
+ binary_operators=(+, *),
+ unary_operators=(sin,),
+ maxsize=25,
+ expression_type=TemplateExpression,
+ # Pass options used to instantiate expressions
+ expression_options=(; structure),
+ # Our `y` is 2-tuple of values
+ elementwise_loss=((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2
+)
+
+mach = machine(model, X, y)
+fit!(mach)
+```
+
+After training, we can examine the best expression:
+
+```julia
+r = report(mach)
+best_expr = r.equations[r.best_idx]
+
+# Access individual parts of the template expression
+f_part = get_contents(best_expr).f # Expression using x1 or x2
+g1_part = get_contents(best_expr).g1 # Expression using x3
+g2_part = get_contents(best_expr).g2 # Expression using x3
+```
+
+The above code demonstrates how template expressions can be used to:
+
+- Define structured expressions with multiple components
+- Constrains which variables can be used in each component
+- Create expressions that can output multiple values
+
+You can even output custom structs - see the more detailed Template Expression example!
diff --git a/docs/src/index_base.md b/docs/src/index_base.md
index 57a3c4e72..058f08767 100644
--- a/docs/src/index_base.md
+++ b/docs/src/index_base.md
@@ -1,5 +1,5 @@
# Contents
```@contents
-Pages = ["examples.md", "api.md", "types.md", "losses.md"]
+Pages = ["examples.md", "examples/template_expression.md", "api.md", "types.md", "losses.md"]
```
diff --git a/docs/src/types.md b/docs/src/types.md
index cd62389be..bf954dfac 100644
--- a/docs/src/types.md
+++ b/docs/src/types.md
@@ -62,14 +62,34 @@ These types allow you to define expressions with parameters that can be tuned to
## Template Expressions
-Template expressions are a type of expression that allows you to specify a predefined structure.
-This lets you also fit vector expressions, as the custom evaluation structure can simply return
-a vector of tuples.
+Template expressions allow you to specify predefined structures and constraints for your expressions.
+These use the new `TemplateStructure` type to define how expressions should be combined and evaluated.
```@docs
TemplateExpression
+TemplateStructure
```
+Example usage:
+
+```julia
+# Define a template structure
+structure = TemplateStructure(
+ combine=e -> e.f + e.g, # Create normal `Expression`
+ combine_vectors=e -> (e.f .+ e.g), # Output vector
+ combine_strings=e -> "($e.f) + ($e.g)", # Output string
+ variable_constraints=(; f=[1, 2], g=[3]) # Constrain dependencies
+)
+
+# Use in options
+model = SRRegressor(;
+ expression_type=TemplateExpression,
+ expression_options=(; structure=structure)
+)
+```
+
+The `variable_constraints` field allows you to specify which variables can be used in different parts of the expression.
+
## Population
Groups of equations are given as a population, which is
diff --git a/docs/utils.jl b/docs/utils.jl
new file mode 100644
index 000000000..bcb9b3519
--- /dev/null
+++ b/docs/utils.jl
@@ -0,0 +1,94 @@
+using Literate: Literate
+
+# Function to process literate blocks in test files
+function process_literate_blocks(base_path="test")
+ test_dir = joinpath(@__DIR__, "..", base_path)
+ for file in readdir(test_dir)
+ if endswith(file, ".jl")
+ process_file(joinpath(test_dir, file))
+ end
+ end
+end
+
+function process_file(filepath)
+ content = read(filepath, String)
+ blocks = match_literate_blocks(content)
+ for (output_file, block_content) in blocks
+ process_literate_block(output_file, block_content, filepath)
+ end
+end
+
+function match_literate_blocks(content)
+ pattern = r"^(\s*)#literate_begin\s+file=\"(.*?)\"\n(.*?)#literate_end"sm
+ matches = collect(eachmatch(pattern, content))
+ return Dict(
+ m.captures[2] => process_block_content(m.captures[1], m.captures[3]) for
+ m in matches
+ )
+end
+
+function process_block_content(indent, block_content)
+ if isempty(block_content)
+ return ""
+ end
+ indent_length = length(indent)
+ lines = split(block_content, '\n')
+ stripped_lines = [
+ if length(line) > indent_length
+ line[(indent_length + 1):end]
+ else
+ ""
+ end for line in lines
+ ]
+ return strip(join(stripped_lines, '\n'))
+end
+
+function process_literate_block(output_file, content, source_file)
+ # Create a temporary .jl file
+ temp_file = tempname() * ".jl"
+ write(temp_file, content)
+
+ # Process the temporary file with Literate.markdown
+ output_dir = joinpath(@__DIR__, "src", "examples")
+ base_name = first(splitext(basename(output_file))) # Remove any existing extension
+
+ Literate.markdown(temp_file, output_dir; name=base_name, documenter=true)
+
+ # Generate the relative path for EditURL
+ edit_path = relpath(source_file, output_dir)
+
+ # Read the generated markdown file
+ md_file = joinpath(output_dir, base_name * ".md")
+ md_content = read(md_file, String)
+
+ # Replace the existing EditURL with the correct one
+ new_content = replace(md_content, r"EditURL = .*" => "EditURL = \"$edit_path\"")
+
+ # Add a codeblock at the end with the raw julia source
+ new_content = replace(
+ new_content,
+ r"\*This page was generated using \[Literate\.jl\]\(https://github\.com/fredrikekre/Literate\.jl\)\.\*" => """
+
+ ```@raw html
+
+ Show raw source code
+ ```
+
+ ```julia
+ $(replace(content, r"```" => "\\```"))
+ ```
+
+ which uses Literate.jl to generate this page.
+
+ ```@raw html
+
+ ```
+
+ """,
+ )
+
+ # Write the updated content back to the file
+ write(md_file, new_content)
+
+ @info "Processed literate block to $md_file with EditURL set to $edit_path"
+end
diff --git a/example.jl b/example.jl
index ef70096e5..129c72f40 100644
--- a/example.jl
+++ b/example.jl
@@ -4,12 +4,10 @@ X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2
options = SymbolicRegression.Options(;
- binary_operators=[+, *, /, -], unary_operators=[cos, exp], populations=20
+ binary_operators=[+, *, /, -], unary_operators=[cos, exp]
)
-hall_of_fame = equation_search(
- X, y; niterations=40, options=options, parallelism=:multithreading
-)
+hall_of_fame = equation_search(X, y; options=options, parallelism=:multithreading)
dominating = calculate_pareto_frontier(hall_of_fame)
diff --git a/examples/template_expression.jl b/examples/template_expression.jl
index ade5fc5cf..8c2465b1a 100644
--- a/examples/template_expression.jl
+++ b/examples/template_expression.jl
@@ -8,23 +8,14 @@ operators = options.operators
variable_names = (i -> "x$i").(1:3)
x1, x2, x3 = (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
-variable_mapping = (; f=[1, 2], g1=[3], g2=[3])
-
-function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}})
- return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2) )"
-end
-function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}})
- return map(i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i]), eachindex(nt.f))
-end
-
-st_expr = TemplateExpression(
- (; f=x1, g1=x3, g2=x3);
- structure=my_structure,
- operators,
- variable_names,
- variable_mapping,
+structure = TemplateStructure{(:f, :g1, :g2)}(;
+ combine_vectors=e -> map((f, g1, g2) -> (f + g1, f + g2), e.f, e.g1, e.g2),
+ combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2) )",
+ variable_constraints=(; f=[1, 2], g1=[3], g2=[3]),
)
+st_expr = TemplateExpression((; f=x1, g1=x3, g2=x3); structure, operators, variable_names)
+
X = rand(100, 3) .* 10
# Our dataset is a vector of 2-tuples
@@ -35,7 +26,7 @@ model = SRRegressor(;
unary_operators=(sin,),
maxsize=15,
expression_type=TemplateExpression,
- expression_options=(; structure=my_structure, variable_mapping),
+ expression_options=(; structure),
# The elementwise needs to operate directly on each row of `y`:
elementwise_loss=((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2,
early_stop_condition=(loss, complexity) -> loss < 1e-5 && complexity <= 7,
diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl
new file mode 100644
index 000000000..b3794e823
--- /dev/null
+++ b/examples/template_expression_complex.jl
@@ -0,0 +1,289 @@
+#! format: off
+#literate_begin file="src/examples/template_expression.md"
+#=
+# Searching with template expressions
+
+Template expressions are a powerful feature in SymbolicRegression.jl that allow you to impose structure
+on the symbolic regression search. Rather than searching for a completely free-form expression, you can
+specify a template that combines multiple sub-expressions in a prescribed way.
+
+This is particularly useful when:
+- You have domain knowledge about the functional form of your solution
+- You want to learn vector-valued expressions (e.g., force fields, velocity fields)
+- You need to enforce constraints on which variables can appear in different parts of the expression
+- You want to share sub-expressions between multiple components
+
+For example, you might know that your system follows a pattern like:
+`sin(f(x1, x2)) + g(x3)^2`
+where `f` and `g` are unknown functions you want to learn. With template expressions, you can encode
+this structure while still letting the symbolic regression search discover the optimal form of the
+sub-expressions.
+
+In this tutorial, we'll walk through a complete example of using template expressions to learn
+the components of a particle's motion under magnetic and drag forces. We'll see how to:
+
+1. Define the structure of our template
+2. Specify constraints on which variables each sub-expression can access
+3. Set up the symbolic regression search
+4. Interpret and use the results
+
+Let's get started!
+=#
+using SymbolicRegression, Random
+using MLJBase: machine, fit!, predict, report
+
+#=
+
+## The Physical Problem
+
+We'll study a charged particle moving through a magnetic field with temperature-dependent drag.
+The total force on the particle will have two components:
+
+```math
+\mathbf{F} = \mathbf{F}_\text{drag} + \mathbf{F}_\text{magnetic} = -\eta(T)\mathbf{v} + q \mathbf{v} \times \mathbf{B}(t)
+```
+where we will take ``q = 1`` for simplicity.
+
+From physics, we know:
+- The magnetic force comes from a cross product with the field: ``\mathbf{F}_\text{magnetic} = \mathbf{v} \times \mathbf{B}``
+- The drag force opposes motion, and we'll define a simple model for it: ``\mathbf{F}_\text{drag} = -\eta(T)\mathbf{v}``
+
+Now, the parts of this model we don't know:
+- The magnetic field ``\mathbf{B}(t)`` varies in time throughout the experiment, but this pattern repeats for each experiment. We want to learn the components of this field, symbolically!
+- The drag coefficient ``\eta(T)`` depends only on temperature. We also want to figure out what this is!
+
+We'll generate synthetic data from a known model and then try to rediscover these relationships,
+**only knowing the total force** on a particle for a given experiment, as well as the input variables:
+time, velocity, and temperature.
+We will do this with template expressions to encode the physical structure of the problem.
+
+Let's say we run this experiment 1000 times:
+=#
+n = 1000
+rng = Random.MersenneTwister(0);
+
+#=
+Each time we run the experiment, the temperature is a bit different:
+=#
+T = 298.15 .+ 0.5 .* rand(rng, n)
+T[1:3]
+
+#=
+We run the experiment, and record the velocity at a random time
+between 0 and 10 seconds.
+=#
+t = 10 .* rand(rng, n)
+t[1:3]
+
+#=
+We introduce a particle at a random velocity between -1 and 1
+=#
+v = [ntuple(_ -> 2 * rand(rng) - 1, 3) for _ in 1:n]
+v[1:3]
+
+#=
+**Now, let's create the true unknown model.**
+
+Let's assume the magnetic field is sinusoidal with frequency 1 Hz along axes x and y,
+and decays exponentially along the z-axis:
+
+```math
+\mathbf{B}(t) = \begin{pmatrix}
+\sin(\omega t) \\
+\cos(\omega t) \\
+e^{-t/10}
+\end{pmatrix}
+\quad \text{where} \quad \omega = 2\pi
+```
+
+This gives us a rotating magnetic field in the x-y plane that weakens along z:
+=#
+omega = 2π
+B = [(sin(omega * ti), cos(omega * ti), exp(-ti / 10)) for ti in t]
+B[1:3]
+
+#=
+We assume the drag force is linear in the velocity and
+depends on the temperature with a power law:
+
+```math
+\mathbf{F}_\text{drag} = -\alpha T^{1/2} \mathbf{v}
+\quad \text{where} \quad \alpha = 10^{-5}
+```
+
+This creates a temperature-dependent damping effect:
+=#
+F_d = [-1e-5 * Ti^(1//2) .* vi for (Ti, vi) in zip(T, v)]
+F_d[1:3]
+
+#=
+Now, let's compute the true magnetic force, in 3D:
+=#
+cross((a1, a2, a3), (b1, b2, b3)) = (a2 * b3 - a3 * b2, a3 * b1 - a1 * b3, a1 * b2 - a2 * b1)
+F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)]
+F_mag[1:3]
+
+#=
+We then sum these to get the total force:
+=#
+F = [fd .+ fm for (fd, fm) in zip(F_d, F_mag)]
+F[1:3]
+
+#=
+This forms our dataset!
+=#
+data = (; t, v, T, F, B, F_d, F_mag)
+keys(data)
+
+#=
+Now, let's format the input variables for input to the regressor:
+=#
+X = (;
+ t=data.t,
+ v_x=[vi[1] for vi in data.v],
+ v_y=[vi[2] for vi in data.v],
+ v_z=[vi[3] for vi in data.v],
+ T=data.T,
+)
+keys(X)
+
+#=
+Template expressions allow us to regress directly on a struct,
+so here we can define a `Force` type:
+=#
+struct Force{T}
+ x::T
+ y::T
+ z::T
+end
+y = [Force(F...) for F in data.F]
+y[1:3]
+
+#=
+Our variable names are the keys of the struct:
+=#
+variable_names = ["t", "v_x", "v_y", "v_z", "T"]
+
+#=
+Template expressions require you to define a _structure_ function,
+which describes how to combine the sub-expressions into a single
+expression, numerically evaluate them, and print them.
+
+First, let's just make a function that prints the expression:
+=#
+function combine_strings(e)
+ ## e is a named tuple of strings representing each formula
+ return " ╭ 𝐁 = [ " * e.B_x * " , " * e.B_y * " , " * e.B_z * " ]\n ╰ 𝐅 = (" * e.F_d_scale * ") * 𝐯"
+ ## (Note that string interpolation will erase the colors, so use `*` instead)
+end
+
+#=
+So, this will just print the separate B and F_d expressions we've learned.
+
+Then, let's define an expression that takes the numerical values
+evaluated in the TemplateExpression, and combines them into the resultant
+force vector. Inside this function, we can do whatever we want.
+=#
+function combine_vectors(e, X)
+ ## This time, e is a named tuple of *vectors*, representing the batched
+ ## evaluation of each formula.
+
+ ## First, extract the 3D velocity vectors from the input matrix:
+ v = [(X[2, i], X[3, i], X[4, i]) for i in eachindex(axes(X, 2))]
+
+ ## Use this to compute the full drag force:
+ F_d = [F_d_scale_i .* vi for (F_d_scale_i, vi) in zip(e.F_d_scale, v)]
+
+ ## Collect the magnetic field components that we've learned into the vector:
+ B = [(bx, by, bz) for (bx, by, bz) in zip(e.B_x, e.B_y, e.B_z)]
+
+ ## Using this, we compute the magnetic force with a cross product:
+ F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)]
+
+ ## Finally, we combine the drag and magnetic forces into the total force:
+ return [Force((fd .+ fm)...) for (fd, fm) in zip(F_d, F_mag)]
+end
+
+#=
+For the functions we wish to learn, we can constraint what variables
+each of them depends on, explicitly. Let's say B only depends on time,
+and the drag force scale only depends on temperature (we explicitly
+multiply the velocity in).
+=#
+variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5])
+
+#=
+Now, we can create our template expression:
+=#
+structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(;
+ combine_strings=combine_strings,
+ combine_vectors=combine_vectors,
+ variable_constraints=variable_constraints,
+)
+
+#=
+Let's look at an example of how this would be used
+in a TemplateExpression, for some guess at the form of
+the solution:
+=#
+options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, sqrt, exp))
+## The inner operators are an `DynamicExpressions.OperatorEnum` which is used by `Expression`:
+operators = options.operators
+t = Expression(Node{Float64}(; feature=1); operators, variable_names)
+T = Expression(Node{Float64}(; feature=5); operators, variable_names)
+B_x = B_y = B_z = 2.1 * cos(t)
+F_d_scale = 1.0 * sqrt(T)
+
+ex = TemplateExpression(
+ (; B_x, B_y, B_z, F_d_scale);
+ structure, operators, variable_names
+)
+
+#=
+So we can see that it prints the expression as we've defined it.
+
+Now, we can create a regressor that builds template expressions
+which follow this structure:
+=#
+model = SRRegressor(;
+ binary_operators=(+, -, *, /),
+ unary_operators=(sin, cos, sqrt, exp),
+ niterations=500,
+ maxsize=35,
+ expression_type=TemplateExpression,
+ expression_options=(; structure=structure),
+ ## The elementwise needs to operate directly on each row of `y`:
+ elementwise_loss=(F1, F2) -> (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2,
+ batching=true,
+ batch_size=30,
+);
+
+#=
+Note how we also have to define the custom `elementwise_loss`
+function. This is because our `combine_vectors` function
+returns a `Force` struct, so we need to combine it against the truth!
+
+Next, we can set up our machine and fit:
+=#
+
+mach = machine(model, X, y)
+
+#=
+At this point, you would run:
+```julia
+fit!(mach)
+```
+
+which should print using your `combine_strings` function
+during the search. The final result is accessible with:
+```julia
+report(mach)
+```
+which would return a named tuple of the fitted results,
+including the `.equations` field, which is a vector
+of `TemplateExpression` objects that dominated the Pareto front.
+=#
+#literate_end
+#! format: on
+
+fit!(mach)
diff --git a/src/Configure.jl b/src/Configure.jl
index 2b184e5cd..d8f029bfa 100644
--- a/src/Configure.jl
+++ b/src/Configure.jl
@@ -1,6 +1,6 @@
const TEST_TYPE = Float32
-function test_operator(op::F, x::T, y=nothing) where {F,T}
+function test_operator(@nospecialize(op::Function), x::T, y=nothing) where {T}
local output
try
output = y === nothing ? op(x) : op(x, y)
@@ -26,14 +26,18 @@ function test_operator(op::F, x::T, y=nothing) where {F,T}
end
return nothing
end
+precompile(Tuple{typeof(test_operator),Function,Float64,Float64})
+precompile(Tuple{typeof(test_operator),Function,Float32,Float32})
+precompile(Tuple{typeof(test_operator),Function,Float64})
+precompile(Tuple{typeof(test_operator),Function,Float32})
const TEST_INPUTS = collect(range(-100, 100; length=99))
function assert_operators_well_defined(T, options::AbstractOptions)
test_input = if T <: Complex
- (x -> convert(T, x)).(TEST_INPUTS .+ TEST_INPUTS .* im)
+ Base.Fix1(convert, T).(TEST_INPUTS .+ TEST_INPUTS .* im)
else
- (x -> convert(T, x)).(TEST_INPUTS)
+ Base.Fix1(convert, T).(TEST_INPUTS)
end
for x in test_input, y in test_input, op in options.operators.binops
test_operator(op, x, y)
@@ -54,20 +58,18 @@ function test_option_configuration(
verbosity > 0 &&
@warn "You are using multithreading mode, but only one thread is available. Try starting julia with `--threads=auto`."
end
- if any(d -> d.X_units !== nothing || d.y_units !== nothing, datasets) &&
- options.dimensional_constraint_penalty === nothing
+ if any(has_units, datasets) && options.dimensional_constraint_penalty === nothing
verbosity > 0 &&
@warn "You are using dimensional constraints, but `dimensional_constraint_penalty` was not set. The default penalty of `1000.0` will be used."
end
- for op in (options.operators.binops..., options.operators.unaops...)
- if is_anonymous_function(op)
- throw(
- AssertionError(
- "Anonymous functions can't be used as operators for SymbolicRegression.jl",
- ),
- )
- end
+ if any(is_anonymous_function, options.operators.binops) ||
+ any(is_anonymous_function, options.operators.unaops)
+ throw(
+ AssertionError(
+ "Anonymous functions can't be used as operators for SymbolicRegression.jl"
+ ),
+ )
end
assert_operators_well_defined(T, options)
@@ -80,6 +82,7 @@ function test_option_configuration(
),
)
end
+ return nothing
end
# Check for errors before they happen
@@ -205,9 +208,7 @@ function activate_env_on_workers(
end
end
-function import_module_on_workers(
- procs, filename::String, options::AbstractOptions, verbosity
-)
+function import_module_on_workers(procs, filename::String, verbosity)
loaded_modules_head_worker = [k.name for (k, _) in Base.loaded_modules]
included_as_local = "SymbolicRegression" ∉ loaded_modules_head_worker
@@ -329,7 +330,7 @@ function configure_workers(;
end
if we_created_procs
- import_module_on_workers(procs, file, options, verbosity)
+ import_module_on_workers(procs, file, verbosity)
end
move_functions_to_workers(procs, options, example_dataset, verbosity)
diff --git a/src/Core.jl b/src/Core.jl
index 2d6e73d89..6000412ce 100644
--- a/src/Core.jl
+++ b/src/Core.jl
@@ -12,7 +12,7 @@ include("Options.jl")
using .ProgramConstantsModule:
MAX_DEGREE, BATCH_DIM, FEATURE_DIM, RecordType, DATA_TYPE, LOSS_TYPE
-using .DatasetModule: Dataset, is_weighted
+using .DatasetModule: Dataset, is_weighted, has_units
using .MutationWeightsModule: AbstractMutationWeights, MutationWeights, sample_mutation
using .OptionsStructModule:
AbstractOptions,
diff --git a/src/Dataset.jl b/src/Dataset.jl
index f9e28bcc5..49a452938 100644
--- a/src/Dataset.jl
+++ b/src/Dataset.jl
@@ -131,22 +131,11 @@ function Dataset(
n = size(X, BATCH_DIM)
nfeatures = size(X, FEATURE_DIM)
- variable_names = if variable_names === nothing
- ["x$(i)" for i in 1:nfeatures]
- else
- variable_names
- end
- display_variable_names = if display_variable_names === nothing
- ["x$(subscriptify(i))" for i in 1:nfeatures]
- else
- display_variable_names
- end
-
- y_variable_name = if y_variable_name === nothing
- ("y" ∉ variable_names) ? "y" : "target"
- else
- y_variable_name
- end
+ variable_names = @something(variable_names, ["x$(i)" for i in 1:nfeatures])
+ display_variable_names = @something(
+ display_variable_names, ["x$(subscriptify(i))" for i in 1:nfeatures]
+ )
+ y_variable_name = @something(y_variable_name, ("y" ∉ variable_names) ? "y" : "target")
avg_y = if y === nothing || !(eltype(y) isa Number)
nothing
else
diff --git a/src/ExpressionBuilder.jl b/src/ExpressionBuilder.jl
index 709937ecf..d7bc5f5d6 100644
--- a/src/ExpressionBuilder.jl
+++ b/src/ExpressionBuilder.jl
@@ -5,6 +5,7 @@ This module provides functions for creating, initializing, and manipulating
module ExpressionBuilderModule
using DispatchDoctor: @unstable
+using Compat: Fix
using DynamicExpressions:
AbstractExpressionNode,
AbstractExpression,
@@ -133,20 +134,20 @@ end
pop::Population, options::AbstractOptions, dataset::Dataset{T,L}
) where {T,L}
return Population(
- map(member -> embed_metadata(member, options, dataset), pop.members)
+ map(Fix{2}(Fix{3}(embed_metadata, dataset), options), pop.members)
)
end
function embed_metadata(
hof::HallOfFame, options::AbstractOptions, dataset::Dataset{T,L}
) where {T,L}
return HallOfFame(
- map(member -> embed_metadata(member, options, dataset), hof.members), hof.exists
+ map(Fix{2}(Fix{3}(embed_metadata, dataset), options), hof.members), hof.exists
)
end
function embed_metadata(
vec::Vector{H}, options::AbstractOptions, dataset::Dataset{T,L}
) where {T,L,H<:Union{HallOfFame,Population,PopMember}}
- return map(elem -> embed_metadata(elem, options, dataset), vec)
+ return map(Fix{2}(Fix{3}(embed_metadata, dataset), options), vec)
end
end
diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl
index a75b82939..d09990ad7 100644
--- a/src/HallOfFame.jl
+++ b/src/HallOfFame.jl
@@ -1,7 +1,8 @@
module HallOfFameModule
+using StyledStrings: @styled_str
using DynamicExpressions: AbstractExpression, string_tree
-using ..UtilsModule: split_string
+using ..UtilsModule: split_string, AnnotatedIOBuffer, dump_buffer
using ..CoreModule:
MAX_DEGREE, AbstractOptions, Dataset, DATA_TYPE, LOSS_TYPE, relu, create_expression
using ..ComplexityModule: compute_complexity
@@ -119,17 +120,26 @@ function calculate_pareto_frontier(hallOfFame::HallOfFame{T,L,N}) where {T,L,N}
return dominating
end
+const HEADER = let
+ join(
+ (
+ rpad(styled"{bold:{underline:Complexity}}", 10),
+ rpad(styled"{bold:{underline:Loss}}", 9),
+ rpad(styled"{bold:{underline:Score}}", 9),
+ styled"{bold:{underline:Equation}}",
+ ),
+ " ",
+ )
+end
+
function string_dominating_pareto_curve(
hallOfFame, dataset, options; width::Union{Integer,Nothing}=nothing
)
- twidth = (width === nothing) ? 100 : max(100, width::Integer)
- output = ""
- output *= "Hall of Fame:\n"
- # TODO: Get user's terminal width.
- output *= "-"^(twidth - 1) * "\n"
- output *= @sprintf(
- "%-10s %-8s %-8s %-8s\n", "Complexity", "Loss", "Score", "Equation"
- )
+ terminal_width = (width === nothing) ? 100 : max(100, width::Integer)
+ _buffer = IOBuffer()
+ buffer = AnnotatedIOBuffer(_buffer)
+ println(buffer, '─'^(terminal_width - 1))
+ println(buffer, HEADER)
formatted = format_hall_of_fame(hallOfFame, options)
for (tree, score, loss, complexity) in
@@ -148,25 +158,47 @@ function string_dominating_pareto_curve(
if dataset.y_sym_units === nothing && dataset.X_sym_units !== nothing
y_prefix *= WILDCARD_UNIT_STRING
end
- eqn_string = y_prefix * " = " * eqn_string
- base_string_length = length(@sprintf("%-10d %-8.3e %8.3e ", 1, 1.0, 1.0))
-
- dots = "..."
- equation_width = (twidth - 1) - base_string_length - length(dots)
-
- output *= @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score)
+ prefix = y_prefix * " = "
+ eqn_string = prefix * eqn_string
+ stats_columns_string = @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score)
+ left_cols_width = length(stats_columns_string)
+ print(buffer, stats_columns_string)
+ print(
+ buffer,
+ wrap_equation_string(
+ eqn_string, left_cols_width + length(prefix), terminal_width
+ ),
+ )
+ end
+ print(buffer, '─'^(terminal_width - 1))
+ return dump_buffer(buffer)
+end
- split_eqn = split_string(eqn_string, equation_width)
- print_pad = false
- while length(split_eqn) > 1
- cur_piece = popfirst!(split_eqn)
- output *= " "^(print_pad * base_string_length) * cur_piece * dots * "\n"
+function wrap_equation_string(eqn_string, left_cols_width, terminal_width)
+ dots = "..."
+ equation_width = (terminal_width - 1) - left_cols_width - length(dots)
+ _buffer = IOBuffer()
+ buffer = AnnotatedIOBuffer(_buffer)
+
+ forced_split_eqn = split(eqn_string, '\n')
+ print_pad = false
+ for piece in forced_split_eqn
+ subpieces = split_string(piece, equation_width)
+ for (i, subpiece) in enumerate(subpieces)
+ # We don't need dots on the last subpiece, since it
+ # is either the last row of the entire string, or it has
+ # an explicit newline in it!
+ requires_dots = i < length(subpieces)
+ print(buffer, ' '^(print_pad * left_cols_width))
+ print(buffer, subpiece)
+ if requires_dots
+ print(buffer, dots)
+ end
+ println(buffer)
print_pad = true
end
- output *= " "^(print_pad * base_string_length) * split_eqn[1] * "\n"
end
- output *= "-"^(twidth - 1)
- return output
+ return dump_buffer(buffer)
end
function format_hall_of_fame(hof::HallOfFame{T,L}, options) where {T,L}
diff --git a/src/InterfaceDynamicExpressions.jl b/src/InterfaceDynamicExpressions.jl
index 6c8aa45fd..86f14d3be 100644
--- a/src/InterfaceDynamicExpressions.jl
+++ b/src/InterfaceDynamicExpressions.jl
@@ -1,6 +1,7 @@
module InterfaceDynamicExpressionsModule
using Printf: @sprintf
+using Compat: Fix
using DynamicExpressions:
DynamicExpressions as DE,
OperatorEnum,
@@ -199,16 +200,17 @@ Convert an equation to a string.
)
end
- vprecision = vals[options.print_precision]
if X_sym_units !== nothing || y_sym_units !== nothing
return DE.string_tree(
tree,
DE.get_operators(tree, options);
- f_variable=(feature, vname) -> string_variable(feature, vname, X_sym_units),
+ f_variable=Fix{3}(string_variable, X_sym_units),
f_constant=let
unit_placeholder =
options.dimensionless_constants_only ? "" : WILDCARD_UNIT_STRING
- (val,) -> string_constant(val, vprecision, unit_placeholder)
+ Fix{2}(
+ Fix{3}(string_constant, unit_placeholder), options.v_print_precision
+ )
end,
variable_names=display_variable_names,
kws...,
@@ -218,13 +220,12 @@ Convert an equation to a string.
tree,
DE.get_operators(tree, options);
f_variable=string_variable,
- f_constant=(val,) -> string_constant(val, vprecision, ""),
+ f_constant=Fix{2}(Fix{3}(string_constant, ""), options.v_print_precision),
variable_names=display_variable_names,
kws...,
)
end
end
-const vals = ntuple(Val, 8192)
function string_variable_raw(feature, variable_names)
if variable_names === nothing || feature > length(variable_names)
return "x" * string(feature)
diff --git a/src/LossFunctions.jl b/src/LossFunctions.jl
index 01dcca86b..637bb0fa4 100644
--- a/src/LossFunctions.jl
+++ b/src/LossFunctions.jl
@@ -140,7 +140,7 @@ function eval_loss_batched(
regularization::Bool=true,
idx=nothing,
)::L where {T<:DATA_TYPE,L<:LOSS_TYPE}
- _idx = idx === nothing ? batch_sample(dataset, options) : idx
+ _idx = @something(idx, batch_sample(dataset, options))
return eval_loss(tree, dataset, options; regularization=regularization, idx=_idx)
end
@@ -172,7 +172,7 @@ function loss_to_score(
L(0.01)
end
loss_val = loss / normalization
- size = complexity === nothing ? compute_complexity(member, options) : complexity
+ size = @something(complexity, compute_complexity(member, options))
parsimony_term = size * options.parsimony
loss_val += L(parsimony_term)
@@ -247,11 +247,8 @@ function dimensional_regularization(
) where {T<:DATA_TYPE,L<:LOSS_TYPE}
if !violates_dimensional_constraints(tree, dataset, options)
return zero(L)
- elseif options.dimensional_constraint_penalty === nothing
- return L(1000)
- else
- return L(options.dimensional_constraint_penalty::Float32)
end
+ return convert(L, something(options.dimensional_constraint_penalty, 1000))
end
end
diff --git a/src/MLJInterface.jl b/src/MLJInterface.jl
index 4d7a1d140..395837ef2 100644
--- a/src/MLJInterface.jl
+++ b/src/MLJInterface.jl
@@ -56,6 +56,7 @@ function modelexpr(model_name::Symbol)
addprocs_function::Union{Function,Nothing} = nothing
heap_size_hint_in_bytes::Union{Integer,Nothing} = nothing
runtests::Bool = true
+ run_id::Union{String,Nothing} = nothing
loss_type::L = Nothing
selection_method::Function = choose_best
dimensions_type::Type{D} = SymbolicDimensions{DEFAULT_DIM_BASE_TYPE}
@@ -173,7 +174,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class
else
old_fitresult.types
end
- X_t::types.X_t, variable_names, X_units::types.X_units = get_matrix_and_info(
+ X_t::types.X_t, variable_names, display_variable_names, X_units::types.X_units = get_matrix_and_info(
X, m.dimensions_type
)
y_t::types.y_t, y_variable_names, y_units::types.y_units = format_input_for(
@@ -193,6 +194,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class
niterations=m.niterations,
weights=w_t,
variable_names=variable_names,
+ display_variable_names=display_variable_names,
options=options,
parallelism=m.parallelism,
numprocs=m.numprocs,
@@ -202,6 +204,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class
runtests=m.runtests,
saved_state=(old_fitresult === nothing ? nothing : old_fitresult.state),
return_state=true,
+ run_id=m.run_id,
loss_type=m.loss_type,
X_units=X_units_clean,
y_units=y_units_clean,
@@ -252,14 +255,17 @@ end
function get_matrix_and_info(X, ::Type{D}) where {D}
sch = MMI.istable(X) ? MMI.schema(X) : nothing
Xm_t = MMI.matrix(X; transpose=true)
- colnames = if sch === nothing
- [map(i -> "x$(subscriptify(i))", axes(Xm_t, 1))...]
+ colnames, display_colnames = if sch === nothing
+ (
+ ["x$(i)" for i in eachindex(axes(Xm_t, 1))],
+ ["x$(subscriptify(i))" for i in eachindex(axes(Xm_t, 1))],
+ )
else
- [string.(sch.names)...]
+ ([string(name) for name in sch.names], [string(name) for name in sch.names])
end
D_promoted = get_dimensions_type(Xm_t, D)
Xm_t_strip, X_units = unwrap_units_single(Xm_t, D_promoted)
- return Xm_t_strip, colnames, X_units
+ return Xm_t_strip, colnames, display_colnames, X_units
end
function format_input_for(::SRRegressor, y, ::Type{D}) where {D}
@@ -278,7 +284,8 @@ function format_input_for(::MultitargetSRRegressor, y, ::Type{D}) where {D}
MMI.istable(y) || (length(size(y)) == 2 && size(y, 2) > 1),
"For single-output regression, please use `SRRegressor`."
)
- return get_matrix_and_info(y, D)
+ out = get_matrix_and_info(y, D)
+ return out[1], out[2], out[4]
end
function validate_variable_names(variable_names, fitresult)
@assert(
@@ -418,7 +425,7 @@ function _predict(m::M, fitresult, Xnew, idx, classes) where {M<:AbstractSRRegre
params = full_report(m, fitresult; v_with_strings=Val(false))
prototype = MMI.istable(Xnew) ? Xnew : nothing
- Xnew_t, variable_names, X_units = get_matrix_and_info(Xnew, m.dimensions_type)
+ Xnew_t, variable_names, _, X_units = get_matrix_and_info(Xnew, m.dimensions_type)
T = promote_type(eltype(Xnew_t), fitresult.types.T)
if isempty(params.equations) || any(isempty, params.equations)
@@ -430,17 +437,17 @@ function _predict(m::M, fitresult, Xnew, idx, classes) where {M<:AbstractSRRegre
validate_variable_names(variable_names, fitresult)
validate_units(X_units_clean, fitresult.X_units)
- idx = idx === nothing ? params.best_idx : idx
+ _idx = something(idx, params.best_idx)
if M <: SRRegressor
return eval_tree_mlj(
- params.equations[idx], Xnew_t, classes, m, T, fitresult, nothing, prototype
+ params.equations[_idx], Xnew_t, classes, m, T, fitresult, nothing, prototype
)
elseif M <: MultitargetSRRegressor
outs = [
eval_tree_mlj(
- params.equations[i][idx[i]], Xnew_t, classes, m, T, fitresult, i, prototype
- ) for i in eachindex(idx, params.equations)
+ params.equations[i][_idx[i]], Xnew_t, classes, m, T, fitresult, i, prototype
+ ) for i in eachindex(_idx, params.equations)
]
out_matrix = reduce(hcat, outs)
if !fitresult.y_is_table
@@ -567,6 +574,9 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt
- `runtests::Bool=true`: Whether to run (quick) tests before starting the
search, to see if there will be any problems during the equation search
related to the host environment.
+ - `run_id::Union{String,Nothing}=nothing`: A unique identifier for the run.
+ This will be used to store outputs from the run in the `outputs` directory.
+ If not specified, a unique ID will be generated.
- `loss_type::Type=Nothing`: If you would like to use a different type
for the loss than for the data you passed, specify the type here.
Note that if you pass complex data `::Complex{L}`, then the loss
diff --git a/src/MutationFunctions.jl b/src/MutationFunctions.jl
index 73e0367b0..5348d2ffb 100644
--- a/src/MutationFunctions.jl
+++ b/src/MutationFunctions.jl
@@ -149,11 +149,11 @@ function append_random_op(
options::AbstractOptions,
nfeatures::Int,
rng::AbstractRNG=default_rng();
- makeNewBinOp::Union{Bool,Nothing}=nothing,
+ make_new_bin_op::Union{Bool,Nothing}=nothing,
) where {T<:DATA_TYPE}
tree, context = get_contents_for_mutation(ex, rng)
ex = with_contents_for_mutation(
- ex, append_random_op(tree, options, nfeatures, rng; makeNewBinOp), context
+ ex, append_random_op(tree, options, nfeatures, rng; make_new_bin_op), context
)
return ex
end
@@ -162,16 +162,15 @@ function append_random_op(
options::AbstractOptions,
nfeatures::Int,
rng::AbstractRNG=default_rng();
- makeNewBinOp::Union{Bool,Nothing}=nothing,
+ make_new_bin_op::Union{Bool,Nothing}=nothing,
) where {T<:DATA_TYPE}
node = rand(rng, NodeSampler(; tree, filter=t -> t.degree == 0))
- if makeNewBinOp === nothing
- choice = rand(rng)
- makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin)
- end
+ _make_new_bin_op = @something(
+ make_new_bin_op, rand(rng) < options.nbin / (options.nuna + options.nbin),
+ )
- if makeNewBinOp
+ if _make_new_bin_op
newnode = constructorof(typeof(tree))(;
op=rand(rng, 1:(options.nbin)),
l=make_random_leaf(nfeatures, T, typeof(tree), rng, options),
@@ -210,10 +209,10 @@ function insert_random_op(
) where {T<:DATA_TYPE}
node = rand(rng, NodeSampler(; tree))
choice = rand(rng)
- makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin)
+ make_new_bin_op = choice < options.nbin / (options.nuna + options.nbin)
left = copy(node)
- if makeNewBinOp
+ if make_new_bin_op
right = make_random_leaf(nfeatures, T, typeof(tree), rng, options)
newnode = constructorof(typeof(tree))(;
op=rand(rng, 1:(options.nbin)), l=left, r=right
@@ -246,10 +245,10 @@ function prepend_random_op(
) where {T<:DATA_TYPE}
node = tree
choice = rand(rng)
- makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin)
+ make_new_bin_op = choice < options.nbin / (options.nuna + options.nbin)
left = copy(tree)
- if makeNewBinOp
+ if make_new_bin_op
right = make_random_leaf(nfeatures, T, typeof(tree), rng, options)
newnode = constructorof(typeof(tree))(;
op=rand(rng, 1:(options.nbin)), l=left, r=right
@@ -399,7 +398,7 @@ function gen_random_tree_fixed_size(
while cur_size < node_count
if cur_size == node_count - 1 # only unary operator allowed.
options.nuna == 0 && break # We will go over the requested amount, so we must break.
- tree = append_random_op(tree, options, nfeatures, rng; makeNewBinOp=false)
+ tree = append_random_op(tree, options, nfeatures, rng; make_new_bin_op=false)
else
tree = append_random_op(tree, options, nfeatures, rng)
end
diff --git a/src/MutationWeights.jl b/src/MutationWeights.jl
index 9de15af7d..5b6253cec 100644
--- a/src/MutationWeights.jl
+++ b/src/MutationWeights.jl
@@ -100,16 +100,16 @@ will be normalized to sum to 1.0 after initialization.
- [`AbstractMutationWeights`](@ref SymbolicRegression.CoreModule.MutationWeightsModule.AbstractMutationWeights): Use to define custom mutation weight types.
"""
Base.@kwdef mutable struct MutationWeights <: AbstractMutationWeights
- mutate_constant::Float64 = 0.048
- mutate_operator::Float64 = 0.47
- swap_operands::Float64 = 0.1
- rotate_tree::Float64 = 0.3
- add_node::Float64 = 0.79
- insert_node::Float64 = 5.1
- delete_node::Float64 = 1.7
- simplify::Float64 = 0.0020
- randomize::Float64 = 0.00023
- do_nothing::Float64 = 0.21
+ mutate_constant::Float64 = 0.0353
+ mutate_operator::Float64 = 3.63
+ swap_operands::Float64 = 0.00608
+ rotate_tree::Float64 = 1.42
+ add_node::Float64 = 0.0771
+ insert_node::Float64 = 2.44
+ delete_node::Float64 = 0.369
+ simplify::Float64 = 0.00148
+ randomize::Float64 = 0.00695
+ do_nothing::Float64 = 0.431
optimize::Float64 = 0.0
form_connection::Float64 = 0.5
break_connection::Float64 = 0.1
diff --git a/src/Operators.jl b/src/Operators.jl
index e7b99ea10..f99cc3bed 100644
--- a/src/Operators.jl
+++ b/src/Operators.jl
@@ -106,6 +106,15 @@ DE.get_op_name(::typeof(safe_log1p)) = "log1p"
DE.get_op_name(::typeof(safe_acosh)) = "acosh"
DE.get_op_name(::typeof(safe_sqrt)) = "sqrt"
+# Expression algebra
+DE.declare_operator_alias(::typeof(safe_pow), ::Val{2}) = ^
+DE.declare_operator_alias(::typeof(safe_log), ::Val{1}) = log
+DE.declare_operator_alias(::typeof(safe_log2), ::Val{1}) = log2
+DE.declare_operator_alias(::typeof(safe_log10), ::Val{1}) = log10
+DE.declare_operator_alias(::typeof(safe_log1p), ::Val{1}) = log1p
+DE.declare_operator_alias(::typeof(safe_acosh), ::Val{1}) = acosh
+DE.declare_operator_alias(::typeof(safe_sqrt), ::Val{1}) = sqrt
+
# Deprecated operations:
@deprecate pow(x, y) safe_pow(x, y)
@deprecate pow_abs(x, y) safe_pow(x, y)
diff --git a/src/Options.jl b/src/Options.jl
index 4c84eb56e..aa247709e 100644
--- a/src/Options.jl
+++ b/src/Options.jl
@@ -2,9 +2,9 @@ module OptionsModule
using DispatchDoctor: @unstable
using Optim: Optim
-using Dates: Dates
using StatsBase: StatsBase
-using DynamicExpressions: OperatorEnum, Expression, default_node_type
+using DynamicExpressions:
+ OperatorEnum, Expression, default_node_type, AbstractExpression, AbstractExpressionNode
using ADTypes: AbstractADType, ADTypes
using LossFunctions: L2DistLoss, SupervisedLoss
using Optim: Optim
@@ -28,7 +28,7 @@ using ..OperatorsModule:
using ..MutationWeightsModule: AbstractMutationWeights, MutationWeights, mutations
import ..OptionsStructModule: Options
using ..OptionsStructModule: ComplexityMapping, operator_specialization
-using ..UtilsModule: max_ops, @save_kwargs, @ignore
+using ..UtilsModule: @save_kwargs, @ignore
"""Build constraints on operator-level complexity from a user-passed dict."""
@unstable function build_constraints(;
@@ -206,7 +206,6 @@ const deprecated_options_mapping = Base.ImmutableDict(
:mutationWeights => :mutation_weights,
:hofMigration => :hof_migration,
:shouldOptimizeConstants => :should_optimize_constants,
- :hofFile => :output_file,
:perturbationFactor => :perturbation_factor,
:batchSize => :batch_size,
:crossoverProbability => :crossover_probability,
@@ -230,7 +229,10 @@ const deprecated_options_mapping = Base.ImmutableDict(
# For static analysis tools:
@ignore const DEFAULT_OPTIONS = ()
-const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators (functions) to use.
+const OPTION_DESCRIPTIONS = """- `defaults`: What set of defaults to use for `Options`. The default,
+ `nothing`, will simply take the default options from the current version of SymbolicRegression.
+ However, you may also select the defaults from an earlier version, such as `v"0.24.5"`.
+- `binary_operators`: Vector of binary operators (functions) to use.
Each operator should be defined for two input scalars,
and one output scalar. All operators
need to be defined over the entire real line (excluding infinity - these
@@ -382,7 +384,6 @@ const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators
type, such as `:Zygote` for Zygote, `:Enzyme`, etc. Most backends will not
work, and many will never work due to incompatibilities, though support for some
is gradually being added.
-- `output_file`: What file to store equations to, as a backup.
- `perturbation_factor`: When mutating a constant, either
multiply or divide by (1+perturbation_factor)^(rand()+1).
- `probability_negate_constant`: Probability of negating a constant in the equation
@@ -399,6 +400,9 @@ const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators
not.
- `print_precision`: How many digits to print when printing
equations. By default, this is 5.
+- `output_directory`: The base directory to save output files to. Files
+ will be saved in a subdirectory according to the run ID. By default,
+ this is `./outputs`.
- `save_to_file`: Whether to save equations to a file during the search.
- `bin_constraints`: See `constraints`. This is the same, but specified for binary
operators only (for example, if you have an operator that is both a binary
@@ -439,82 +443,161 @@ https://github.com/MilesCranmer/PySR/discussions/115.
$(OPTION_DESCRIPTIONS)
"""
@unstable @save_kwargs DEFAULT_OPTIONS function Options(;
- binary_operators=Function[+, -, /, *],
- unary_operators=Function[],
- constraints=nothing,
- elementwise_loss::Union{Function,SupervisedLoss,Nothing}=nothing,
- loss_function::Union{Function,Nothing}=nothing,
- tournament_selection_n::Integer=12, #1 sampled from every tournament_selection_n per mutation
- tournament_selection_p::Real=0.86,
- topn::Integer=12, #samples to return per population
- complexity_of_operators=nothing,
- complexity_of_constants::Union{Nothing,Real}=nothing,
- complexity_of_variables::Union{Nothing,Real,AbstractVector}=nothing,
- parsimony::Real=0.0032,
- dimensional_constraint_penalty::Union{Nothing,Real}=nothing,
+ # Note: We can only `@nospecialize` on the first 32 arguments, which is why
+ # we have to declare some of these later on.
+ @nospecialize(defaults::Union{VersionNumber,Nothing} = nothing),
+ # Search options:
+ ## 1. Creating the Search Space:
+ @nospecialize(binary_operators = nothing),
+ @nospecialize(unary_operators = nothing),
+ @nospecialize(maxsize::Union{Nothing,Integer} = nothing),
+ @nospecialize(maxdepth::Union{Nothing,Integer} = nothing),
+ @nospecialize(expression_type::Type{<:AbstractExpression} = Expression),
+ @nospecialize(expression_options::NamedTuple = NamedTuple()),
+ @nospecialize(
+ node_type::Type{<:AbstractExpressionNode} = default_node_type(expression_type)
+ ),
+ ## 2. Setting the Search Size:
+ @nospecialize(populations::Union{Nothing,Integer} = nothing),
+ @nospecialize(population_size::Union{Nothing,Integer} = nothing),
+ @nospecialize(ncycles_per_iteration::Union{Nothing,Integer} = nothing),
+ ## 3. The Objective:
+ @nospecialize(elementwise_loss::Union{Function,SupervisedLoss,Nothing} = nothing),
+ @nospecialize(loss_function::Union{Function,Nothing} = nothing),
+ ### [model_selection - only used in MLJ interface]
+ @nospecialize(dimensional_constraint_penalty::Union{Nothing,Real} = nothing),
+ ### dimensionless_constants_only
+ ## 4. Working with Complexities:
+ @nospecialize(parsimony::Union{Nothing,Real} = nothing),
+ @nospecialize(constraints = nothing),
+ @nospecialize(nested_constraints = nothing),
+ @nospecialize(complexity_of_operators = nothing),
+ @nospecialize(complexity_of_constants::Union{Nothing,Real} = nothing),
+ @nospecialize(complexity_of_variables::Union{Nothing,Real,AbstractVector} = nothing),
+ @nospecialize(warmup_maxsize_by::Union{Real,Nothing} = nothing),
+ ### use_frequency
+ ### use_frequency_in_tournament
+ @nospecialize(adaptive_parsimony_scaling::Union{Real,Nothing} = nothing),
+ ### should_simplify
+ ## 5. Mutations:
+ @nospecialize(
+ mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple,Nothing} =
+ nothing
+ ),
+ @nospecialize(crossover_probability::Union{Real,Nothing} = nothing),
+ @nospecialize(annealing::Union{Bool,Nothing} = nothing),
+ @nospecialize(alpha::Union{Nothing,Real} = nothing),
+ ### perturbation_factor
+ @nospecialize(probability_negate_constant::Union{Real,Nothing} = nothing),
+ ### skip_mutation_failures
+ ## 6. Tournament Selection:
+ @nospecialize(tournament_selection_n::Union{Nothing,Integer} = nothing),
+ @nospecialize(tournament_selection_p::Union{Nothing,Real} = nothing),
+ ## 7. Constant Optimization:
+ ### optimizer_algorithm
+ ### optimizer_nrestarts
+ ### optimizer_probability
+ ### optimizer_iterations
+ ### optimizer_f_calls_limit
+ ### optimizer_options
+ ### should_optimize_constants
+ ## 8. Migration between Populations:
+ ### migration
+ ### hof_migration
+ ### fraction_replaced
+ ### fraction_replaced_hof
+ ### topn
+ ## 9. Data Preprocessing:
+ ### [none]
+ ## 10. Stopping Criteria:
+ ### timeout_in_seconds
+ ### max_evals
+ @nospecialize(early_stop_condition::Union{Function,Real,Nothing} = nothing),
+ ## 11. Performance and Parallelization:
+ ### [others, passed to `equation_search`]
+ @nospecialize(batching::Union{Bool,Nothing} = nothing),
+ @nospecialize(batch_size::Union{Nothing,Integer} = nothing),
+ ### turbo
+ ### bumper
+ ### autodiff_backend
+ ## 12. Determinism:
+ ### [others, passed to `equation_search`]
+ ### deterministic
+ ### seed
+ ## 13. Monitoring:
+ ### verbosity
+ ### print_precision
+ ### progress
+ ## 14. Environment:
+ ### [none]
+ ## 15. Exporting the Results:
+ ### [others, passed to `equation_search`]
+ ### output_directory
+ ### save_to_file
+
+ # Other search, but no specializations (since Julia limits us to 32!)
+ ## 1. Search Space:
+ ## 2. Setting the Search Size:
+ ## 3. The Objective:
dimensionless_constants_only::Bool=false,
- alpha::Real=0.100000,
- maxsize::Integer=20,
- maxdepth::Union{Nothing,Integer}=nothing,
- turbo::Bool=false,
- bumper::Bool=false,
- migration::Bool=true,
- hof_migration::Bool=true,
- should_simplify::Union{Nothing,Bool}=nothing,
- should_optimize_constants::Bool=true,
- output_file::Union{Nothing,AbstractString}=nothing,
- expression_type::Type=Expression,
- node_type::Type=default_node_type(expression_type),
- expression_options::NamedTuple=NamedTuple(),
- populations::Integer=15,
- perturbation_factor::Real=0.076,
- annealing::Bool=false,
- batching::Bool=false,
- batch_size::Integer=50,
- mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple}=MutationWeights(),
- crossover_probability::Real=0.066,
- warmup_maxsize_by::Real=0.0,
+ ## 4. Working with Complexities:
use_frequency::Bool=true,
use_frequency_in_tournament::Bool=true,
- adaptive_parsimony_scaling::Real=20.0,
- population_size::Integer=33,
- ncycles_per_iteration::Integer=550,
- fraction_replaced::Real=0.00036,
- fraction_replaced_hof::Real=0.035,
- verbosity::Union{Integer,Nothing}=nothing,
- print_precision::Integer=5,
- save_to_file::Bool=true,
- probability_negate_constant::Real=0.01,
- seed=nothing,
- bin_constraints=nothing,
- una_constraints=nothing,
- progress::Union{Bool,Nothing}=nothing,
- terminal_width::Union{Nothing,Integer}=nothing,
+ should_simplify::Union{Nothing,Bool}=nothing,
+ ## 5. Mutations:
+ perturbation_factor::Union{Nothing,Real}=nothing,
+ skip_mutation_failures::Bool=true,
+ ## 6. Tournament Selection
+ ## 7. Constant Optimization:
optimizer_algorithm::Union{AbstractString,Optim.AbstractOptimizer}=Optim.BFGS(;
linesearch=LineSearches.BackTracking()
),
- optimizer_nrestarts::Integer=2,
- optimizer_probability::Real=0.14,
+ optimizer_nrestarts::Int=2,
+ optimizer_probability::AbstractFloat=0.14,
optimizer_iterations::Union{Nothing,Integer}=nothing,
optimizer_f_calls_limit::Union{Nothing,Integer}=nothing,
optimizer_options::Union{Dict,NamedTuple,Optim.Options,Nothing}=nothing,
- autodiff_backend::Union{AbstractADType,Symbol,Nothing}=nothing,
- use_recorder::Bool=false,
- recorder_file::AbstractString="pysr_recorder.json",
- early_stop_condition::Union{Function,Real,Nothing}=nothing,
+ should_optimize_constants::Bool=true,
+ ## 8. Migration between Populations:
+ migration::Bool=true,
+ hof_migration::Bool=true,
+ fraction_replaced::Union{Real,Nothing}=nothing,
+ fraction_replaced_hof::Union{Real,Nothing}=nothing,
+ topn::Union{Nothing,Integer}=nothing,
+ ## 9. Data Preprocessing:
+ ## 10. Stopping Criteria:
timeout_in_seconds::Union{Nothing,Real}=nothing,
max_evals::Union{Nothing,Integer}=nothing,
- skip_mutation_failures::Bool=true,
- nested_constraints=nothing,
+ ## 11. Performance and Parallelization:
+ turbo::Bool=false,
+ bumper::Bool=false,
+ autodiff_backend::Union{AbstractADType,Symbol,Nothing}=nothing,
+ ## 12. Determinism:
deterministic::Bool=false,
- # Not search options; just construction options:
+ seed=nothing,
+ ## 13. Monitoring:
+ verbosity::Union{Integer,Nothing}=nothing,
+ print_precision::Integer=5,
+ progress::Union{Bool,Nothing}=nothing,
+ ## 14. Environment:
+ ## 15. Exporting the Results:
+ output_directory::Union{Nothing,String}=nothing,
+ save_to_file::Bool=true,
+ ## Undocumented features:
+ bin_constraints=nothing,
+ una_constraints=nothing,
+ terminal_width::Union{Nothing,Integer}=nothing,
+ use_recorder::Bool=false,
+ recorder_file::AbstractString="pysr_recorder.json",
+ ### Not search options; just construction options:
define_helper_functions::Bool=true,
- deprecated_return_state=nothing,
#########################################
# Deprecated args: ######################
+ output_file::Union{Nothing,AbstractString}=nothing,
fast_cycle::Bool=false,
npopulations::Union{Nothing,Integer}=nothing,
npop::Union{Nothing,Integer}=nothing,
+ deprecated_return_state::Union{Bool,Nothing}=nothing,
kws...,
#########################################
)
@@ -537,7 +620,6 @@ $(OPTION_DESCRIPTIONS)
#! format: off
k == :hofMigration && (hof_migration = kws[k]; true) && continue
k == :shouldOptimizeConstants && (should_optimize_constants = kws[k]; true) && continue
- k == :hofFile && (output_file = kws[k]; true) && continue
k == :perturbationFactor && (perturbation_factor = kws[k]; true) && continue
k == :batchSize && (batch_size = kws[k]; true) && continue
k == :crossoverProbability && (crossover_probability = kws[k]; true) && continue
@@ -577,7 +659,6 @@ $(OPTION_DESCRIPTIONS)
"Unknown deprecated keyword argument: $k. Please update `Options(;)` to transfer this key.",
)
end
- fast_cycle && Base.depwarn("`fast_cycle` is deprecated and has no effect.", :Options)
if npop !== nothing
Base.depwarn("`npop` is deprecated. Use `population_size` instead.", :Options)
population_size = npop
@@ -597,6 +678,9 @@ $(OPTION_DESCRIPTIONS)
Optim.BFGS(; linesearch=LineSearches.BackTracking())
end
end
+ if output_file !== nothing
+ error("`output_file` is deprecated. Use `output_directory` instead.")
+ end
if elementwise_loss === nothing
elementwise_loss = L2DistLoss()
@@ -606,6 +690,35 @@ $(OPTION_DESCRIPTIONS)
end
end
+ #################################
+ #### Supply defaults ############
+ #! format: off
+ _default_options = default_options(defaults)
+ binary_operators = something(binary_operators, _default_options.binary_operators)
+ unary_operators = something(unary_operators, _default_options.unary_operators)
+ maxsize = something(maxsize, _default_options.maxsize)
+ populations = something(populations, _default_options.populations)
+ population_size = something(population_size, _default_options.population_size)
+ ncycles_per_iteration = something(ncycles_per_iteration, _default_options.ncycles_per_iteration)
+ parsimony = something(parsimony, _default_options.parsimony)
+ warmup_maxsize_by = something(warmup_maxsize_by, _default_options.warmup_maxsize_by)
+ adaptive_parsimony_scaling = something(adaptive_parsimony_scaling, _default_options.adaptive_parsimony_scaling)
+ mutation_weights = something(mutation_weights, _default_options.mutation_weights)
+ crossover_probability = something(crossover_probability, _default_options.crossover_probability)
+ annealing = something(annealing, _default_options.annealing)
+ alpha = something(alpha, _default_options.alpha)
+ perturbation_factor = something(perturbation_factor, _default_options.perturbation_factor)
+ probability_negate_constant = something(probability_negate_constant, _default_options.probability_negate_constant)
+ tournament_selection_n = something(tournament_selection_n, _default_options.tournament_selection_n)
+ tournament_selection_p = something(tournament_selection_p, _default_options.tournament_selection_p)
+ fraction_replaced = something(fraction_replaced, _default_options.fraction_replaced)
+ fraction_replaced_hof = something(fraction_replaced_hof, _default_options.fraction_replaced_hof)
+ topn = something(topn, _default_options.topn)
+ batching = something(batching, _default_options.batching)
+ batch_size = something(batch_size, _default_options.batch_size)
+ #! format: on
+ #################################
+
if should_simplify === nothing
should_simplify = (
loss_function === nothing &&
@@ -616,22 +729,11 @@ $(OPTION_DESCRIPTIONS)
)
end
- is_testing = parse(Bool, get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false"))
-
- if output_file === nothing
- # "%Y-%m-%d_%H%M%S.%f"
- date_time_str = Dates.format(Dates.now(), "yyyy-mm-dd_HHMMSS.sss")
- output_file = "hall_of_fame_" * date_time_str * ".csv"
- if is_testing
- tmpdir = mktempdir()
- output_file = joinpath(tmpdir, output_file)
- end
- end
-
@assert maxsize > 3
@assert warmup_maxsize_by >= 0.0f0
- @assert length(unary_operators) <= max_ops
- @assert length(binary_operators) <= max_ops
+ @assert length(unary_operators) <= 8192
+ @assert length(binary_operators) <= 8192
+ @assert tournament_selection_n < population_size "`tournament_selection_n` must be less than `population_size`"
# Make sure nested_constraints contains functions within our operator set:
_nested_constraints = build_nested_constraints(;
@@ -696,25 +798,21 @@ $(OPTION_DESCRIPTIONS)
early_stop_condition = if typeof(early_stop_condition) <: Real
# Need to make explicit copy here for this to work:
stopping_point = Float64(early_stop_condition)
- (loss, complexity) -> loss < stopping_point
+ Base.Fix2(<, stopping_point) ∘ first ∘ tuple # Equivalent to (l, c) -> l < stopping_point
else
early_stop_condition
end
# Parse optimizer options
if !isa(optimizer_options, Optim.Options)
- optimizer_iterations = isnothing(optimizer_iterations) ? 8 : optimizer_iterations
- optimizer_f_calls_limit = if isnothing(optimizer_f_calls_limit)
- 10_000
- else
- optimizer_f_calls_limit
- end
+ optimizer_iterations = something(optimizer_iterations, 8)
+ optimizer_f_calls_limit = something(optimizer_f_calls_limit, 10_000)
extra_kws = hasfield(Optim.Options, :show_warnings) ? (; show_warnings=false) : ()
optimizer_options = Optim.Options(;
iterations=optimizer_iterations,
f_calls_limit=optimizer_f_calls_limit,
extra_kws...,
- (isnothing(optimizer_options) ? () : optimizer_options)...,
+ something(optimizer_options, ())...,
)
else
@assert optimizer_iterations === nothing && optimizer_f_calls_limit === nothing
@@ -733,6 +831,14 @@ $(OPTION_DESCRIPTIONS)
ADTypes.Auto(autodiff_backend)
end
+ _output_directory =
+ if output_directory === nothing &&
+ get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false") == "true"
+ mktempdir()
+ else
+ output_directory
+ end
+
options = Options{
typeof(complexity_mapping),
operator_specialization(typeof(operators), expression_type),
@@ -742,8 +848,9 @@ $(OPTION_DESCRIPTIONS)
typeof(set_mutation_weights),
turbo,
bumper,
- deprecated_return_state,
+ deprecated_return_state::Union{Bool,Nothing},
typeof(_autodiff_backend),
+ print_precision,
}(
operators,
_bin_constraints,
@@ -763,7 +870,7 @@ $(OPTION_DESCRIPTIONS)
hof_migration,
should_simplify,
should_optimize_constants,
- output_file,
+ _output_directory,
populations,
perturbation_factor,
annealing,
@@ -781,7 +888,7 @@ $(OPTION_DESCRIPTIONS)
fraction_replaced_hof,
topn,
verbosity,
- print_precision,
+ Val(print_precision),
save_to_file,
probability_negate_constant,
length(unary_operators),
@@ -815,4 +922,103 @@ $(OPTION_DESCRIPTIONS)
return options
end
+function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = nothing))
+ if version isa VersionNumber && version < v"1.0.0"
+ return (;
+ # Creating the Search Space
+ binary_operators=[+, -, /, *],
+ unary_operators=Function[],
+ maxsize=20,
+ # Setting the Search Size
+ populations=15,
+ population_size=33,
+ ncycles_per_iteration=550,
+ # Working with Complexities
+ parsimony=0.0032,
+ warmup_maxsize_by=0.0,
+ adaptive_parsimony_scaling=20.0,
+ # Mutations
+ mutation_weights=MutationWeights(;
+ mutate_constant=0.048,
+ mutate_operator=0.47,
+ swap_operands=0.1,
+ rotate_tree=0.0,
+ add_node=0.79,
+ insert_node=5.1,
+ delete_node=1.7,
+ simplify=0.0020,
+ randomize=0.00023,
+ do_nothing=0.21,
+ optimize=0.0,
+ form_connection=0.5,
+ break_connection=0.1,
+ ),
+ crossover_probability=0.066,
+ annealing=false,
+ alpha=0.1,
+ perturbation_factor=0.076,
+ probability_negate_constant=0.01,
+ # Tournament Selection
+ tournament_selection_n=12,
+ tournament_selection_p=0.86,
+ # Migration between Populations
+ fraction_replaced=0.00036,
+ fraction_replaced_hof=0.035,
+ topn=12,
+ # Performance and Parallelization
+ batching=false,
+ batch_size=50,
+ )
+ else
+ return (;
+ # Creating the Search Space
+ binary_operators=Function[+, -, /, *],
+ unary_operators=Function[],
+ maxsize=30,
+ # Setting the Search Size
+ populations=31,
+ population_size=27,
+ ncycles_per_iteration=380,
+ # Working with Complexities
+ parsimony=0.0,
+ warmup_maxsize_by=0.0,
+ adaptive_parsimony_scaling=1040,
+ # Mutations
+ mutation_weights=MutationWeights(;
+ mutate_constant=0.0346,
+ mutate_operator=0.293,
+ swap_operands=0.198,
+ rotate_tree=4.26,
+ add_node=2.47,
+ insert_node=0.0112,
+ delete_node=0.870,
+ simplify=0.00209,
+ randomize=0.000502,
+ do_nothing=0.273,
+ optimize=0.0,
+ form_connection=0.5,
+ break_connection=0.1,
+ ),
+ crossover_probability=0.0259,
+ annealing=true,
+ alpha=3.17,
+ perturbation_factor=0.129,
+ probability_negate_constant=0.00743,
+ # Tournament Selection
+ tournament_selection_n=15,
+ tournament_selection_p=0.982,
+ # Migration between Populations
+ fraction_replaced=0.00036,
+ ## ^Note: the optimal value found was 0.00000425,
+ ## but I thought this was a symptom of doing the sweep on such
+ ## a small problem, so I increased it to the older value of 0.00036
+ fraction_replaced_hof=0.0614,
+ topn=12,
+ # Performance and Parallelization
+ batching=false,
+ batch_size=50,
+ )
+ end
+end
+
end
diff --git a/src/OptionsStruct.jl b/src/OptionsStruct.jl
index fa8a0035b..b39dbf0b5 100644
--- a/src/OptionsStruct.jl
+++ b/src/OptionsStruct.jl
@@ -188,6 +188,7 @@ struct Options{
_bumper,
_return_state,
AD,
+ print_precision,
} <: AbstractOptions
operators::OP
bin_constraints::Vector{Tuple{Int,Int}}
@@ -207,7 +208,7 @@ struct Options{
hof_migration::Bool
should_simplify::Bool
should_optimize_constants::Bool
- output_file::String
+ output_directory::Union{String,Nothing}
populations::Int
perturbation_factor::Float32
annealing::Bool
@@ -225,7 +226,7 @@ struct Options{
fraction_replaced_hof::Float32
topn::Int
verbosity::Union{Int,Nothing}
- print_precision::Int
+ v_print_precision::Val{print_precision}
save_to_file::Bool
probability_negate_constant::Float32
nuna::Int
@@ -256,7 +257,7 @@ struct Options{
use_recorder::Bool
end
-function Base.print(io::IO, options::Options)
+function Base.print(io::IO, @nospecialize(options::Options))
return print(
io,
"Options(" *
@@ -278,21 +279,22 @@ function Base.print(io::IO, options::Options)
")",
)
end
-Base.show(io::IO, ::MIME"text/plain", options::Options) = Base.print(io, options)
+function Base.show(io::IO, ::MIME"text/plain", @nospecialize(options::Options))
+ return Base.print(io, options)
+end
specialized_options(options::AbstractOptions) = options
@unstable function specialized_options(options::Options)
- return _specialized_options(options)
+ return _specialized_options(options, options.operators)
end
-@generated function _specialized_options(options::O) where {O<:Options}
+@generated function _specialized_options(
+ options::O, operators::OP
+) where {O<:Options,OP<:AbstractOperatorEnum}
# Return an options struct with concrete operators
type_parameters = O.parameters
fields = Any[:(getfield(options, $(QuoteNode(k)))) for k in fieldnames(O)]
quote
- operators = getfield(options, :operators)
- Options{$(type_parameters[1]),typeof(operators),$(type_parameters[3:end]...)}(
- $(fields...)
- )
+ Options{$(type_parameters[1]),$(OP),$(type_parameters[3:end]...)}($(fields...))
end
end
diff --git a/src/Population.jl b/src/Population.jl
index d475da168..6b9173c5c 100644
--- a/src/Population.jl
+++ b/src/Population.jl
@@ -139,7 +139,7 @@ function _best_of_sample(
scores[i] = member.score * exp(adaptive_parsimony_scaling * frequency)
end
else
- map!(member -> member.score, scores, members)
+ map!(_get_score, scores, members)
end
chosen_idx = if p == 1.0
@@ -157,6 +157,7 @@ function _best_of_sample(
end
return members[chosen_idx]
end
+_get_score(member::PopMember) = member.score
const CACHED_WEIGHTS =
let init_k = collect(0:5),
diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl
index 5a1f3fe6e..1b6bc402d 100644
--- a/src/ProgressBars.jl
+++ b/src/ProgressBars.jl
@@ -1,36 +1,52 @@
module ProgressBarsModule
-using ProgressBars: ProgressBar, set_multiline_postfix
+using Compat: Fix
+using ProgressMeter: Progress, next!
+using StyledStrings: @styled_str, annotatedstring
+using ..UtilsModule: AnnotatedString
# Simple wrapper for a progress bar which stores its own state
mutable struct WrappedProgressBar
- bar::ProgressBar
- state::Union{Int,Nothing}
- cycle::Union{Int,Nothing}
-
- function WrappedProgressBar(args...; kwargs...)
- if haskey(ENV, "SYMBOLIC_REGRESSION_TEST") &&
- ENV["SYMBOLIC_REGRESSION_TEST"] == "true"
- output_stream = devnull
- return new(ProgressBar(args...; output_stream, kwargs...), nothing, nothing)
+ bar::Progress
+ postfix::Vector{Tuple{AnnotatedString,AnnotatedString}}
+
+ function WrappedProgressBar(n::Integer, niterations::Integer; kwargs...)
+ init_vector = Tuple{AnnotatedString,AnnotatedString}[]
+ kwargs = (; kwargs..., desc="Evolving for $niterations iterations...")
+ if get(ENV, "SYMBOLIC_REGRESSION_TEST", "false") == "true"
+ # For testing, create a progress bar that writes to devnull
+ output = devnull
+ return new(Progress(n; output, kwargs...), init_vector)
end
- return new(ProgressBar(args...; kwargs...), nothing, nothing)
+ return new(Progress(n; kwargs...), init_vector)
end
end
-"""Iterate a progress bar without needing to store cycle/state externally."""
+function barlen(pbar::WrappedProgressBar)::Int
+ return @something(pbar.bar.barlen, displaysize(stdout)[2])
+end
+
+"""Iterate a progress bar."""
function manually_iterate!(pbar::WrappedProgressBar)
- cur_cycle = pbar.cycle
- if cur_cycle === nothing
- pbar.cycle, pbar.state = iterate(pbar.bar)
+ width = barlen(pbar)
+ postfix = map(Fix{2}(format_for_meter, width), pbar.postfix)
+ next!(pbar.bar; showvalues=postfix, valuecolor=:none)
+ return nothing
+end
+
+function format_for_meter((k, s), width::Integer)
+ new_s = if occursin('\n', s)
+ left_margin = length(" $(string(k)): ")
+ left_padding = ' '^(width - left_margin)
+ annotatedstring(left_padding, newlines_to_spaces(s, width))
else
- pbar.cycle, pbar.state = iterate(pbar.bar, pbar.state)
+ s
end
- return nothing
+ return (k, new_s)
end
-function set_multiline_postfix!(t::WrappedProgressBar, postfix::AbstractString)
- return set_multiline_postfix(t.bar, postfix)
+function newlines_to_spaces(s::AbstractString, width::Integer)
+ return join(rpad(line, width) for line in split(s, '\n'))
end
end
diff --git a/src/SearchUtils.jl b/src/SearchUtils.jl
index 23358d9dc..ed433df65 100644
--- a/src/SearchUtils.jl
+++ b/src/SearchUtils.jl
@@ -4,9 +4,12 @@ This includes: process management, stdin reading, checking for early stops."""
module SearchUtilsModule
using Printf: @printf, @sprintf
+using Dates: Dates
using Distributed: Distributed, @spawnat, Future, procs, addprocs
using StatsBase: mean
+using StyledStrings: @styled_str
using DispatchDoctor: @unstable
+using Compat: Fix
using DynamicExpressions: AbstractExpression, string_tree
using ..UtilsModule: subscriptify
@@ -15,7 +18,7 @@ using ..ComplexityModule: compute_complexity
using ..PopulationModule: Population
using ..PopMemberModule: PopMember
using ..HallOfFameModule: HallOfFame, string_dominating_pareto_curve
-using ..ProgressBarsModule: WrappedProgressBar, set_multiline_postfix!, manually_iterate!
+using ..ProgressBarsModule: WrappedProgressBar, manually_iterate!, barlen
using ..AdaptiveParsimonyModule: RunningSearchStatistics
"""
@@ -56,6 +59,7 @@ struct RuntimeOptions{PARALLELISM,DIM_OUT,RETURN_STATE} <: AbstractRuntimeOption
parallelism::Val{PARALLELISM}
dim_out::Val{DIM_OUT}
return_state::Val{RETURN_STATE}
+ run_id::String
end
@unstable @inline function Base.getproperty(
roptions::RuntimeOptions{P,D,R}, name::Symbol
@@ -77,18 +81,22 @@ end
@unstable function RuntimeOptions(;
niterations::Int=10,
nout::Int=1,
- options::AbstractOptions=Options(),
parallelism=:multithreading,
numprocs::Union{Int,Nothing}=nothing,
procs::Union{Vector{Int},Nothing}=nothing,
addprocs_function::Union{Function,Nothing}=nothing,
heap_size_hint_in_bytes::Union{Integer,Nothing}=nothing,
runtests::Bool=true,
- return_state::Union{Bool,Nothing,Val}=nothing,
+ return_state::VRS=nothing,
+ run_id::Union{String,Nothing}=nothing,
verbosity::Union{Int,Nothing}=nothing,
progress::Union{Bool,Nothing}=nothing,
v_dim_out::Val{DIM_OUT}=Val(nothing),
-) where {DIM_OUT}
+ # Defined from options
+ options_return_state::Val{ORS}=Val(nothing),
+ options_verbosity::Union{Integer,Nothing}=nothing,
+ options_progress::Union{Bool,Nothing}=nothing,
+) where {DIM_OUT,ORS,VRS<:Union{Bool,Nothing,Val}}
concurrency = if parallelism in (:multithreading, "multithreading")
:multithreading
elseif parallelism in (:multiprocessing, "multiprocessing")
@@ -102,37 +110,32 @@ end
)
:serial
end
- not_distributed = concurrency in (:multithreading, :serial)
- not_distributed &&
- procs !== nothing &&
- error(
+ if concurrency in (:multithreading, :serial)
+ numprocs !== nothing && error(
+ "`numprocs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.",
+ )
+ procs !== nothing && error(
"`procs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.",
)
- not_distributed &&
- numprocs !== nothing &&
+ end
+ verbosity !== nothing &&
+ options_verbosity !== nothing &&
error(
- "`numprocs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.",
+ "You cannot set `verbosity` in both the search parameters " *
+ "`AbstractOptions` and the call to `equation_search`.",
+ )
+ progress !== nothing &&
+ options_progress !== nothing &&
+ error(
+ "You cannot set `progress` in both the search parameters " *
+ "`AbstractOptions` and the call to `equation_search`.",
+ )
+ ORS !== nothing &&
+ return_state !== nothing &&
+ error(
+ "You cannot set `return_state` in both the `AbstractOptions` and in the passed arguments.",
)
- _return_state = if return_state isa Val
- first(typeof(return_state).parameters)
- else
- if options.return_state === Val(nothing)
- return_state === nothing ? false : return_state
- else
- @assert(
- return_state === nothing,
- "You cannot set `return_state` in both the `AbstractOptions` and in the passed arguments."
- )
- first(typeof(options.return_state).parameters)
- end
- end
-
- dim_out = if DIM_OUT === nothing
- nout > 1 ? 2 : 1
- else
- DIM_OUT
- end
_numprocs::Int = if numprocs === nothing
if procs === nothing
4
@@ -148,42 +151,17 @@ end
end
end
- _verbosity = if verbosity === nothing && options.verbosity === nothing
- 1
- elseif verbosity === nothing && options.verbosity !== nothing
- options.verbosity
- elseif verbosity !== nothing && options.verbosity === nothing
- verbosity
- else
- error(
- "You cannot set `verbosity` in both the search parameters `AbstractOptions` and the call to `equation_search`.",
- )
- 1
- end
- _progress::Bool = if progress === nothing && options.progress === nothing
- (_verbosity > 0) && nout == 1
- elseif progress === nothing && options.progress !== nothing
- options.progress
- elseif progress !== nothing && options.progress === nothing
- progress
- else
- error(
- "You cannot set `progress` in both the search parameters `AbstractOptions` and the call to `equation_search`.",
- )
- false
- end
-
- _addprocs_function = addprocs_function === nothing ? addprocs : addprocs_function
+ _return_state = VRS <: Val ? first(VRS.parameters) : something(ORS, return_state, false)
+ dim_out = something(DIM_OUT, nout > 1 ? 2 : 1)
+ _verbosity = something(verbosity, options_verbosity, 1)
+ _progress = something(progress, options_progress, (_verbosity > 0) && nout == 1)
+ _addprocs_function = something(addprocs_function, addprocs)
+ _run_id = @something(run_id, generate_run_id())
exeflags = if concurrency == :multiprocessing
heap_size_hint_in_megabytes = floor(
- Int, (
- if heap_size_hint_in_bytes === nothing
- (Sys.free_memory() / _numprocs)
- else
- heap_size_hint_in_bytes
- end
- ) / 1024^2
+ Int,
+ (@something(heap_size_hint_in_bytes, (Sys.free_memory() / _numprocs))) / 1024^2,
)
_verbosity > 0 &&
heap_size_hint_in_bytes === nothing &&
@@ -206,9 +184,16 @@ end
Val(concurrency),
Val(dim_out),
Val(_return_state),
+ _run_id,
)
end
+function generate_run_id()
+ date_str = Dates.format(Dates.now(), "yyyymmdd_HHMMSS")
+ h = join(rand(['0':'9'; 'a':'z'; 'A':'Z'], 6))
+ return "$(date_str)_$h"
+end
+
"""A simple dictionary to track worker allocations."""
const WorkerAssignments = Dict{Tuple{Int,Int},Int}
@@ -303,9 +288,9 @@ function init_dummy_pops(
]
end
-struct StdinReader{ST}
+struct StdinReader
can_read_user_input::Bool
- stream::ST
+ stream::IO
end
"""Start watching stream (like stdin) for user input."""
@@ -328,6 +313,7 @@ function watch_stream(stream)
end
return StdinReader(can_read_user_input, stream)
end
+precompile(Tuple{typeof(watch_stream),Base.TTY})
"""Close the stdin reader and stop reading."""
function close_reader!(reader::StdinReader)
@@ -434,23 +420,25 @@ function update_progress_bar!(
head_node_occupation::Float64,
parallelism=:serial,
) where {T,L}
- equation_strings = string_dominating_pareto_curve(
- hall_of_fame, dataset, options; width=progress_bar.bar.width
- )
# TODO - include command about "q" here.
load_string = if length(equation_speed) > 0
average_speed = sum(equation_speed) / length(equation_speed)
@sprintf(
- "Expressions evaluated per second: %-5.2e. ",
+ "Full dataset evaluations per second: %-5.2e. ",
round(average_speed, sigdigits=3)
)
else
- @sprintf("Expressions evaluated per second: [.....]. ")
+ @sprintf("Full dataset evaluations per second: [.....]. ")
end
load_string *= get_load_string(; head_node_occupation, parallelism)
- load_string *= @sprintf("Press 'q' and then to stop execution early.\n")
- equation_strings = load_string * equation_strings
- set_multiline_postfix!(progress_bar, equation_strings)
+ load_string *= @sprintf("Press 'q' and then to stop execution early.")
+ equation_strings = string_dominating_pareto_curve(
+ hall_of_fame, dataset, options; width=barlen(progress_bar)
+ )
+ progress_bar.postfix = [
+ (styled"{italic:Info}", styled"{italic:$load_string}"),
+ (styled"{italic:Hall of Fame}", equation_strings),
+ ]
manually_iterate!(progress_bar)
return nothing
end
@@ -569,12 +557,18 @@ Base.@kwdef struct SearchState{T,L,N<:AbstractExpression{T},WorkerOutputType,Cha
end
function save_to_file(
- dominating, nout::Integer, j::Integer, dataset::Dataset{T,L}, options::AbstractOptions
+ dominating,
+ nout::Integer,
+ j::Integer,
+ dataset::Dataset{T,L},
+ options::AbstractOptions,
+ ropt::AbstractRuntimeOptions,
) where {T,L}
- output_file = options.output_file
- if nout > 1
- output_file = output_file * ".out$j"
- end
+ output_directory = joinpath(something(options.output_directory, "outputs"), ropt.run_id)
+ mkpath(output_directory)
+ filename = nout > 1 ? "hall_of_fame_output$(j).csv" : "hall_of_fame.csv"
+ output_file = joinpath(output_directory, filename)
+
dominating_n = length(dominating)
complexities = Vector{Int}(undef, dominating_n)
@@ -602,10 +596,8 @@ function save_to_file(
end
# Write file twice in case exit in middle of filewrite
- for out_file in (output_file, output_file * ".bkup")
- open(out_file, "w") do io
- write(io, s)
- end
+ for out_file in (output_file, output_file * ".bak")
+ open(Base.Fix2(write, s), out_file, "w")
end
return nothing
end
diff --git a/src/SingleIteration.jl b/src/SingleIteration.jl
index 90edb8ee7..2d36e6c87 100644
--- a/src/SingleIteration.jl
+++ b/src/SingleIteration.jl
@@ -33,7 +33,7 @@ function s_r_cycle(
if !options.annealing
min_temp = max_temp
end
- all_temperatures = LinRange(max_temp, min_temp, ncycles)
+ all_temperatures = ncycles > 1 ? LinRange(max_temp, min_temp, ncycles) : [max_temp]
best_examples_seen = HallOfFame(options, dataset)
num_evals = 0.0
diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl
index 53afae3af..75bdbfa19 100644
--- a/src/SymbolicRegression.jl
+++ b/src/SymbolicRegression.jl
@@ -13,6 +13,7 @@ export Population,
Expression,
ParametricExpression,
TemplateExpression,
+ TemplateStructure,
NodeSampler,
AbstractExpression,
AbstractExpressionNode,
@@ -156,7 +157,7 @@ using DynamicExpressions: with_type_parameters
LogitDistLoss,
QuantileLoss,
LogCoshLoss
-using Compat: @compat
+using Compat: @compat, Fix
@compat public AbstractOptions,
AbstractRuntimeOptions,
@@ -261,7 +262,8 @@ using .CoreModule:
erf,
erfc,
atanh_clip,
- create_expression
+ create_expression,
+ has_units
using .UtilsModule: is_anonymous_function, recursive_merge, json3_write, @ignore
using .ComplexityModule: compute_complexity
using .CheckConstraintsModule: check_constraints
@@ -314,7 +316,7 @@ using .SearchUtilsModule:
save_to_file,
get_cur_maxsize,
update_hall_of_fame!
-using .TemplateExpressionModule: TemplateExpression
+using .TemplateExpressionModule: TemplateExpression, TemplateStructure
using .ExpressionBuilderModule: embed_metadata, strip_metadata
@stable default_mode = "disable" begin
@@ -338,7 +340,7 @@ which is useful for debugging and profiling.
- `y::Union{AbstractMatrix{T}, AbstractVector{T}}`: The values to predict. The first dimension
is the output feature to predict with each equation, and the
second dimension is rows.
-- `niterations::Int=10`: The number of iterations to perform the search.
+- `niterations::Int=100`: The number of iterations to perform the search.
More iterations will improve the results.
- `weights::Union{AbstractMatrix{T}, AbstractVector{T}, Nothing}=nothing`: Optionally
weight the loss for each `y` by this value (same shape as `y`).
@@ -418,7 +420,7 @@ which is useful for debugging and profiling.
function equation_search(
X::AbstractMatrix{T},
y::AbstractMatrix;
- niterations::Int=10,
+ niterations::Int=100,
weights::Union{AbstractMatrix{T},AbstractVector{T},Nothing}=nothing,
options::AbstractOptions=Options(),
variable_names::Union{AbstractVector{String},Nothing}=nothing,
@@ -432,6 +434,7 @@ function equation_search(
runtests::Bool=true,
saved_state=nothing,
return_state::Union{Bool,Nothing,Val}=nothing,
+ run_id::Union{String,Nothing}=nothing,
loss_type::Type{L}=Nothing,
verbosity::Union{Integer,Nothing}=nothing,
progress::Union{Bool,Nothing}=nothing,
@@ -481,6 +484,7 @@ function equation_search(
runtests=runtests,
saved_state=saved_state,
return_state=return_state,
+ run_id=run_id,
verbosity=verbosity,
progress=progress,
v_dim_out=Val(DIM_OUT),
@@ -504,14 +508,19 @@ function equation_search(
runtime_options::Union{AbstractRuntimeOptions,Nothing}=nothing,
runtime_options_kws...,
) where {T<:DATA_TYPE,L<:LOSS_TYPE,D<:Dataset{T,L}}
- runtime_options = if runtime_options === nothing
- RuntimeOptions(; options, nout=length(datasets), runtime_options_kws...)
- else
- runtime_options
- end
+ _runtime_options = @something(
+ runtime_options,
+ RuntimeOptions(;
+ options_return_state=options.return_state,
+ options_verbosity=options.verbosity,
+ options_progress=options.progress,
+ nout=length(datasets),
+ runtime_options_kws...,
+ )
+ )
# Underscores here mean that we have mutated the variable
- return _equation_search(datasets, runtime_options, options, saved_state)
+ return _equation_search(datasets, _runtime_options, options, saved_state)
end
@noinline function _equation_search(
@@ -775,12 +784,14 @@ function _main_search_loop!(
ropt.verbosity > 0 && @info "Started!"
nout = length(datasets)
start_time = time()
- if ropt.progress
+ progress_bar = if ropt.progress
#TODO: need to iterate this on the max cycles remaining!
sum_cycle_remaining = sum(state.cycles_remaining)
- progress_bar = WrappedProgressBar(
- 1:sum_cycle_remaining; width=options.terminal_width
+ WrappedProgressBar(
+ sum_cycle_remaining, ropt.niterations; barlen=options.terminal_width
)
+ else
+ nothing
end
last_print_time = time()
last_speed_recording_time = time()
@@ -863,7 +874,7 @@ function _main_search_loop!(
dominating = calculate_pareto_frontier(state.halls_of_fame[j])
if options.save_to_file
- save_to_file(dominating, nout, j, dataset, options)
+ save_to_file(dominating, nout, j, dataset, options, ropt)
end
###################################################################
# Migration #######################################################
@@ -928,7 +939,7 @@ function _main_search_loop!(
options, total_cycles, cycles_remaining=state.cycles_remaining[j]
)
move_window!(state.all_running_search_statistics[j])
- if ropt.progress
+ if progress_bar !== nothing
head_node_occupation = estimate_work_fraction(resource_monitor)
update_progress_bar!(
progress_bar,
@@ -1026,13 +1037,10 @@ function _format_output(
out_hof = if ropt.dim_out == 1
embed_metadata(only(state.halls_of_fame), options, only(datasets))
else
- map(j -> embed_metadata(state.halls_of_fame[j], options, datasets[j]), 1:nout)
+ map(Fix{2}(embed_metadata, options), state.halls_of_fame, datasets)
end
if ropt.return_state
- return (
- map(j -> embed_metadata(state.last_pops[j], options, datasets[j]), 1:nout),
- out_hof,
- )
+ return (map(Fix{2}(embed_metadata, options), state.last_pops, datasets), out_hof)
else
return out_hof
end
diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl
index d88c07dcc..586589fab 100644
--- a/src/TemplateExpression.jl
+++ b/src/TemplateExpression.jl
@@ -1,7 +1,9 @@
module TemplateExpressionModule
using Random: AbstractRNG
+using Compat: Fix
using DispatchDoctor: @unstable
+using StyledStrings: @styled_str
using DynamicExpressions:
DynamicExpressions as DE,
AbstractStructuredExpression,
@@ -23,7 +25,8 @@ using DynamicExpressions:
using DynamicExpressions.InterfacesModule:
ExpressionInterface, Interfaces, @implements, all_ei_methods_except, Arguments
-using ..CoreModule: AbstractOptions, Dataset, CoreModule as CM, AbstractMutationWeights
+using ..CoreModule:
+ AbstractOptions, Dataset, CoreModule as CM, AbstractMutationWeights, has_units
using ..ConstantOptimizationModule: ConstantOptimizationModule as CO
using ..InterfaceDynamicExpressionsModule: InterfaceDynamicExpressionsModule as IDE
using ..MutationFunctionsModule: MutationFunctionsModule as MF
@@ -36,7 +39,131 @@ using ..MutateModule: MutateModule as MM
using ..PopMemberModule: PopMember
"""
- TemplateExpression{T,F,N,E,TS,C,D} <: AbstractStructuredExpression{T,F,N,E,D}
+ TemplateStructure{K,S,N,E,C} <: Function
+
+A struct that defines a prescribed structure for a `TemplateExpression`,
+including functions that define the result of combining sub-expressions in different contexts.
+
+The `K` parameter is used to specify the symbols representing the inner expressions.
+If not declared using the constructor `TemplateStructure{K}(...)`, the keys of the
+`variable_constraints` `NamedTuple` will be used to infer this.
+
+# Fields
+- `combine`: Optional function taking a `NamedTuple` of function keys => expressions,
+ returning a single expression. Fallback method used by `get_tree`
+ on a `TemplateExpression` to generate a single `Expression`.
+- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vectors,
+ returning a single vector. Used for evaluating the expression tree.
+ You may optionally define a method with a second argument `X` for if you wish
+ to include the data matrix `X` (of shape `[num_features, num_rows]`) in the
+ computation.
+- `combine_strings`: Optional function taking a `NamedTuple` of function keys => strings,
+ returning a single string. Used for printing the expression tree.
+- `variable_constraints`: Optional `NamedTuple` that defines which variables each sub-expression is allowed to access.
+ For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`.
+"""
+struct TemplateStructure{
+ K,
+ E<:Union{Nothing,Function},
+ N<:Union{Nothing,Function},
+ S<:Union{Nothing,Function},
+ C<:Union{Nothing,NamedTuple{<:Any,<:Tuple{Vararg{Vector{Int}}}}},
+} <: Function
+ combine::E
+ combine_vectors::N
+ combine_strings::S
+ variable_constraints::C
+end
+
+function TemplateStructure{K}(combine::E; kws...) where {K,E<:Function}
+ return TemplateStructure{K}(; combine, kws...)
+end
+function TemplateStructure{K}(; kws...) where {K}
+ return TemplateStructure(; _function_keys=Val(K), kws...)
+end
+function TemplateStructure(combine::E; kws...) where {E<:Function}
+ return TemplateStructure(; combine, kws...)
+end
+function TemplateStructure(;
+ combine::E=nothing,
+ combine_vectors::N=nothing,
+ combine_strings::S=nothing,
+ variable_constraints::C=nothing,
+ _function_keys::Val{K}=Val(nothing),
+) where {
+ K,
+ E<:Union{Nothing,Function},
+ N<:Union{Nothing,Function},
+ S<:Union{Nothing,Function},
+ C<:Union{Nothing,NamedTuple{<:Any,<:Tuple{Vararg{Vector{Int}}}}},
+}
+ Kout = if K !== nothing && variable_constraints !== nothing
+ K != keys(variable_constraints) &&
+ throw(ArgumentError("`K` must match the keys of `variable_constraints`."))
+ K
+ elseif K !== nothing
+ K
+ elseif variable_constraints !== nothing
+ keys(variable_constraints)
+ else
+ throw(
+ ArgumentError(
+ "If `variable_constraints` is not provided, " *
+ "you must initialize `TemplateStructure` with " *
+ "`TemplateStructure{K}(...)`, for tuple of symbols `K`.",
+ ),
+ )
+ end
+ return TemplateStructure{Kout,E,N,S,C}(
+ combine, combine_vectors, combine_strings, variable_constraints
+ )
+end
+# TODO: This interface is ugly. Part of this is due to AbstractStructuredExpression,
+# which was not written with this `TemplateStructure` in mind, but just with a
+# single callable function.
+
+function combine(template::TemplateStructure, nt::NamedTuple)
+ return (template.combine::Function)(nt)::AbstractExpression
+end
+function combine_vectors(
+ template::TemplateStructure, nt::NamedTuple, X::Union{AbstractMatrix,Nothing}=nothing
+)
+ combiner = template.combine_vectors::Function
+ if X !== nothing && hasmethod(combiner, typeof((nt, X)))
+ # TODO: Refactor this
+ return combiner(nt, X)::AbstractVector
+ else
+ return combiner(nt)::AbstractVector
+ end
+end
+function combine_strings(template::TemplateStructure, nt::NamedTuple)
+ return (template.combine_strings::Function)(nt)::AbstractString
+end
+
+function (template::TemplateStructure)(
+ nt::NamedTuple{<:Any,<:Tuple{AbstractExpression,Vararg{AbstractExpression}}}
+)
+ return combine(template, nt)
+end
+function (template::TemplateStructure)(
+ nt::NamedTuple{<:Any,<:Tuple{AbstractVector,Vararg{AbstractVector}}},
+ X::Union{AbstractMatrix,Nothing}=nothing,
+)
+ return combine_vectors(template, nt, X)
+end
+function (template::TemplateStructure)(
+ nt::NamedTuple{<:Any,<:Tuple{AbstractString,Vararg{AbstractString}}}
+)
+ return combine_strings(template, nt)
+end
+
+can_combine(template::TemplateStructure) = template.combine !== nothing
+can_combine_vectors(template::TemplateStructure) = template.combine_vectors !== nothing
+can_combine_strings(template::TemplateStructure) = template.combine_strings !== nothing
+get_function_keys(::TemplateStructure{K}) where {K} = K
+
+"""
+ TemplateExpression{T,F,N,E,TS,D} <: AbstractStructuredExpression{T,F,N,E,D}
A symbolic expression that allows the combination of multiple sub-expressions
in a structured way, with constraints on variable usage.
@@ -46,16 +173,12 @@ domain-specific knowledge or constraints must be imposed on the model's structur
# Constructor
-- `TemplateExpression(trees; structure, operators, variable_names, variable_mapping)`
+- `TemplateExpression(trees; structure, operators, variable_names)`
- `trees`: A `NamedTuple` holding the sub-expressions (e.g., `f = Expression(...)`, `g = Expression(...)`).
- - `structure`: A function that defines how the sub-expressions are combined. This should have one method
- that takes `trees` as input and returns a single `Expression` node, and another method which takes
- a `NamedTuple` of `Vector` (representing the numerical results of each sub-expression) and returns
- a single vector after combining them.
+ - `structure`: A `TemplateStructure` which holds functions that define how the sub-expressions are combined
+ in different contexts.
- `operators`: An `OperatorEnum` that defines the allowed operators for the sub-expressions.
- `variable_names`: An optional `Vector` of `String` that defines the names of the variables in the dataset.
- - `variable_mapping`: A `NamedTuple` that defines which variables each sub-expression is allowed to access.
- For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`.
# Example
@@ -63,7 +186,8 @@ Let's create an example `TemplateExpression` that combines two sub-expressions `
```julia
# Define operators and variable names
-operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
+options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
+operators = options.operators
variable_names = ["x1", "x2", "x3"]
# Create sub-expressions
@@ -71,41 +195,42 @@ x1 = Expression(Node{Float64}(; feature=1); operators, variable_names)
x2 = Expression(Node{Float64}(; feature=2); operators, variable_names)
x3 = Expression(Node{Float64}(; feature=3); operators, variable_names)
-# Define structure function for symbolic and numerical evaluation
-function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}})
- return sin(nt.f) + nt.g * nt.g
-end
-function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}})
- return @. sin(nt.f) + nt.g * nt.g
-end
-
-# Define variable constraints (if desired)
-variable_mapping = (; f=[1, 2], g=[3])
-
# Create TemplateExpression
example_expr = (; f=x1, g=x3)
st_expr = TemplateExpression(
example_expr;
- structure=my_structure, operators, variable_names, variable_mapping
+ structure=TemplateStructure{(:f, :g)}(nt -> sin(nt.f) + nt.g * nt.g),
+ operators,
+ variable_names,
+)
+```
+
+We can also define constraints on which variables each sub-expression is allowed to access:
+
+```julia
+variable_constraints = (; f=[1, 2], g=[3])
+st_expr = TemplateExpression(
+ example_expr;
+ structure=TemplateStructure(
+ nt -> sin(nt.f) + nt.g * nt.g; variable_constraints
+ ),
+ operators,
+ variable_names,
)
```
When fitting a model in SymbolicRegression.jl, you would provide the `TemplateExpression`
-as the `expression_type` argument, and then pass `expression_options=(; structure=my_structure, variable_mapping=variable_mapping)`
-as additional options. The `variable_mapping` will constraint `f` to only have access to `x1` and `x2`,
+as the `expression_type` argument, and then pass `expression_options=(; structure=TemplateStructure(...))`
+as additional options. The `variable_constraints` will constraint `f` to only have access to `x1` and `x2`,
and `g` to only have access to `x3`.
"""
struct TemplateExpression{
T,
- F<:Function,
+ F<:TemplateStructure,
N<:AbstractExpressionNode{T},
E<:Expression{T,N}, # TODO: Generalize this
TS<:NamedTuple{<:Any,<:NTuple{<:Any,E}},
- C<:NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}}, # The constraints
- # TODO: No need for this to be a parametric type
- D<:@NamedTuple{
- structure::F, operators::O, variable_names::V, variable_mapping::C
- } where {O,V},
+ D<:@NamedTuple{structure::F, operators::O, variable_names::V} where {O,V},
} <: AbstractStructuredExpression{T,F,N,E,D}
trees::TS
metadata::Metadata{D}
@@ -114,15 +239,13 @@ struct TemplateExpression{
trees::TS, metadata::Metadata{D}
) where {
TS,
- F<:Function,
- C<:NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}},
- D<:@NamedTuple{
- structure::F, operators::O, variable_names::V, variable_mapping::C
- } where {O,V},
+ F<:TemplateStructure,
+ D<:@NamedTuple{structure::F, operators::O, variable_names::V} where {O,V},
}
+ @assert keys(trees) == get_function_keys(metadata.structure)
E = typeof(first(values(trees)))
N = node_type(E)
- return new{eltype(N),F,N,E,TS,C,D}(trees, metadata)
+ return new{eltype(N),F,N,E,TS,D}(trees, metadata)
end
end
@@ -131,19 +254,11 @@ function TemplateExpression(
structure::F,
operators::Union{AbstractOperatorEnum,Nothing}=nothing,
variable_names::Union{AbstractVector{<:AbstractString},Nothing}=nothing,
- variable_mapping::NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}},
-) where {F<:Function}
- @assert length(trees) == length(variable_mapping)
- if variable_names !== nothing
- # TODO: Should this be removed?
- @assert Set(eachindex(variable_names)) ==
- Set(Iterators.flatten(values(variable_mapping)))
- end
- @assert keys(trees) == keys(variable_mapping)
+) where {F<:TemplateStructure}
example_tree = first(values(trees))::AbstractExpression
operators = get_operators(example_tree, operators)
variable_names = get_variable_names(example_tree, variable_names)
- metadata = (; structure, operators, variable_names, variable_mapping)
+ metadata = (; structure, operators, variable_names)
return TemplateExpression(trees, Metadata(metadata))
end
@@ -153,6 +268,29 @@ end
ExpressionInterface{all_ei_methods_except(())}, TemplateExpression, [Arguments()]
)
+function combine(ex::TemplateExpression, nt::NamedTuple)
+ return combine(get_metadata(ex).structure, nt)
+end
+function combine_vectors(
+ ex::TemplateExpression, nt::NamedTuple, X::Union{AbstractMatrix,Nothing}=nothing
+)
+ return combine_vectors(get_metadata(ex).structure, nt, X)
+end
+function combine_strings(ex::TemplateExpression, nt::NamedTuple)
+ return combine_strings(get_metadata(ex).structure, nt)
+end
+
+function can_combine(ex::TemplateExpression)
+ return can_combine(get_metadata(ex).structure)
+end
+function can_combine_vectors(ex::TemplateExpression)
+ return can_combine_vectors(get_metadata(ex).structure)
+end
+function can_combine_strings(ex::TemplateExpression)
+ return can_combine_strings(get_metadata(ex).structure)
+end
+get_function_keys(ex::TemplateExpression) = get_function_keys(get_metadata(ex).structure)
+
function EB.create_expression(
t::AbstractExpressionNode{T},
options::AbstractOptions,
@@ -161,7 +299,7 @@ function EB.create_expression(
::Type{E},
::Val{embed}=Val(false),
) where {T,L,embed,E<:TemplateExpression}
- function_keys = keys(options.expression_options.variable_mapping)
+ function_keys = get_function_keys(options.expression_options.structure)
# NOTE: We need to copy over the operators so we can call the structure function
operators = options.operators
@@ -186,9 +324,7 @@ function EB.extra_init_params(
return (; options.operators, options.expression_options...)
end
function EB.sort_params(params::NamedTuple, ::Type{<:TemplateExpression})
- return (;
- params.structure, params.operators, params.variable_names, params.variable_mapping
- )
+ return (; params.structure, params.operators, params.variable_names)
end
function ComplexityModule.compute_complexity(
@@ -202,16 +338,25 @@ function ComplexityModule.compute_complexity(
)
end
+_color_string(s::AbstractString, c::Symbol) = styled"{$c:$s}"
function DE.string_tree(
tree::TemplateExpression, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws...
)
raw_contents = get_contents(tree)
- function_keys = keys(raw_contents)
- inner_strings = NamedTuple{function_keys}(
- map(ex -> DE.string_tree(ex, operators; kws...), values(raw_contents))
- )
- # TODO: Make a fallback function in case the structure function is undefined.
- return get_metadata(tree).structure(inner_strings)
+ if can_combine_strings(tree)
+ function_keys = keys(raw_contents)
+ colors = Base.Iterators.cycle((:magenta, :green, :red, :blue, :yellow, :cyan))
+ inner_strings = NamedTuple{function_keys}(
+ map(ex -> DE.string_tree(ex, operators; kws...), values(raw_contents))
+ )
+ colored_strings = NamedTuple{function_keys}(
+ map(_color_string, inner_strings, colors)
+ )
+ return combine_strings(tree, colored_strings)
+ else
+ @assert can_combine(tree)
+ return DE.string_tree(combine(tree, raw_contents), operators; kws...)
+ end
end
function DE.eval_tree_array(
tree::TemplateExpression{T},
@@ -220,30 +365,49 @@ function DE.eval_tree_array(
kws...,
) where {T}
raw_contents = get_contents(tree)
-
- # Raw numerical results of each inner expression:
- outs = map(ex -> DE.eval_tree_array(ex, cX, operators; kws...), values(raw_contents))
-
- # Combine them using the structure function:
- results = NamedTuple{keys(raw_contents)}(map(first, outs))
- return get_metadata(tree).structure(results), all(last, outs)
+ if can_combine_vectors(tree)
+ # Raw numerical results of each inner expression:
+ outs = map(
+ ex -> DE.eval_tree_array(ex, cX, operators; kws...), values(raw_contents)
+ )
+ # Combine them using the structure function:
+ results = NamedTuple{keys(raw_contents)}(map(first, outs))
+ return combine_vectors(tree, results, cX), all(last, outs)
+ else
+ @assert can_combine(tree)
+ return DE.eval_tree_array(combine(tree, raw_contents), cX, operators; kws...)
+ end
end
function (ex::TemplateExpression)(
X, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws...
)
- # TODO: Why do we need to do this? It should automatically handle this!
- return DE.eval_tree_array(ex, X, operators; kws...)
+ raw_contents = get_contents(ex)
+ if can_combine_vectors(ex)
+ results = NamedTuple{keys(raw_contents)}(
+ map(ex -> ex(X, operators; kws...), values(raw_contents))
+ )
+ return combine_vectors(ex, results, X)
+ else
+ @assert can_combine(ex)
+ callable = combine(ex, raw_contents)
+ return callable(X, operators; kws...)
+ end
end
@unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any
function DA.violates_dimensional_constraints(
- tree::TemplateExpression, dataset::Dataset, options::AbstractOptions
+ @nospecialize(tree::TemplateExpression),
+ dataset::Dataset,
+ @nospecialize(options::AbstractOptions)
)
- @assert dataset.X_units === nothing && dataset.y_units === nothing
+ @assert !has_units(dataset)
return false
end
function MM.condition_mutation_weights!(
- weights::AbstractMutationWeights, member::P, options::AbstractOptions, curmaxsize::Int
+ @nospecialize(weights::AbstractMutationWeights),
+ @nospecialize(member::P),
+ @nospecialize(options::AbstractOptions),
+ curmaxsize::Int,
) where {T,L,N<:TemplateExpression,P<:PopMember{T,L,N}}
# HACK TODO
return nothing
@@ -330,12 +494,12 @@ function CC.check_constraints(
cursize::Union{Int,Nothing}=nothing,
)::Bool
raw_contents = get_contents(ex)
- variable_mapping = get_metadata(ex).variable_mapping
+ variable_constraints = get_metadata(ex).structure.variable_constraints
# First, we check the variable constraints at the top level:
has_invalid_variables = any(keys(raw_contents)) do key
tree = raw_contents[key]
- allowed_variables = variable_mapping[key]
+ allowed_variables = variable_constraints[key]
contains_other_features_than(tree, allowed_variables)
end
if has_invalid_variables
@@ -347,9 +511,12 @@ function CC.check_constraints(
maxsize && return false
# Then, we check other constraints for inner expressions:
- return all(
- t -> CC.check_constraints(t, options, maxsize, nothing), values(raw_contents)
- )
+ for t in values(raw_contents)
+ if !CC.check_constraints(t, options, maxsize, nothing)
+ return false
+ end
+ end
+ return true
# TODO: The concept of `cursize` doesn't really make sense here.
end
function contains_other_features_than(tree::AbstractExpression, features)
diff --git a/src/Utils.jl b/src/Utils.jl
index da67bcf4d..06935c4d7 100644
--- a/src/Utils.jl
+++ b/src/Utils.jl
@@ -3,6 +3,7 @@ module UtilsModule
using Printf: @printf
using MacroTools: splitdef
+using StyledStrings: StyledStrings
macro ignore(args...) end
@@ -26,6 +27,7 @@ function is_anonymous_function(op)
op_string[1] == '#' &&
op_string[2] in ('1', '2', '3', '4', '5', '6', '7', '8', '9')
end
+precompile(Tuple{typeof(is_anonymous_function),Function})
recursive_merge(x::AbstractVector...) = cat(x...; dims=1)
recursive_merge(x::AbstractDict...) = merge(recursive_merge, x...)
@@ -40,17 +42,14 @@ function subscriptify(number::Integer)
end
"""
- split_string(s::String, n::Integer)
+ split_string(s::AbstractString, n::Integer)
```jldoctest
-split_string("abcdefgh", 3)
-
-# output
-
+julia> split_string("abcdefgh", 3)
["abc", "def", "gh"]
```
"""
-function split_string(s::String, n::Integer)
+function split_string(s::AbstractString, n::Integer)
length(s) <= n && return [s]
# Due to unicode characters, need to split only at valid indices:
I = eachindex(s) |> collect
@@ -91,12 +90,13 @@ function _to_vec(v::MutableTuple{S,T}) where {S,T}
return x
end
-const max_ops = 8192
-const vals = ntuple(Val, max_ops)
-
"""Return the bottom k elements of x, and their indices."""
-bottomk_fast(x::AbstractVector{T}, k) where {T} =
- _bottomk_dispatch(x, vals[k])::Tuple{Vector{T},Vector{Int}}
+bottomk_fast(x::AbstractVector{T}, k) where {T} = Base.Cartesian.@nif(
+ 32,
+ d -> d == k,
+ d -> _bottomk_dispatch(x, Val(d))::Tuple{Vector{T},Vector{Int}},
+ _ -> _bottomk_dispatch(x, Val(k))::Tuple{Vector{T},Vector{Int}}
+)
function _bottomk_dispatch(x::AbstractVector{T}, ::Val{k}) where {T,k}
if k == 1
@@ -174,7 +174,16 @@ function _save_kwargs(log_variable::Symbol, fdef::Expr)
def = splitdef(fdef)
# Get kwargs:
kwargs = copy(def[:kwargs])
- filter!(kwargs) do k
+ kwargs = map(kwargs) do k
+ # If it's a macrocall for @nospecialize
+ if k.head == :macrocall && string(k.args[1]) == "@nospecialize"
+ # Find the actual argument - it's the last non-LineNumberNode argument
+ inner_arg = last(filter(arg -> !(arg isa LineNumberNode), k.args))
+ return inner_arg
+ end
+ return k
+ end
+ kwargs = filter(kwargs) do k
# Filter ...:
k.head == :... && return false
# Filter other deprecated kwargs:
@@ -259,4 +268,21 @@ function safe_call(f::F, x::T, default::D) where {F,T<:Tuple,D}
return output
end
+@static if VERSION >= v"1.11.0-"
+ @eval begin
+ const AnnotatedIOBuffer = Base.AnnotatedIOBuffer
+ const AnnotatedString = Base.AnnotatedString
+ end
+else
+ @eval begin
+ const AnnotatedIOBuffer = StyledStrings.AnnotatedStrings.AnnotatedIOBuffer
+ const AnnotatedString = StyledStrings.AnnotatedStrings.AnnotatedString
+ end
+end
+
+dump_buffer(buffer::IOBuffer) = String(take!(buffer))
+function dump_buffer(buffer::AnnotatedIOBuffer)
+ return AnnotatedString(dump_buffer(buffer.io), buffer.annotations)
+end
+
end
diff --git a/src/precompile.jl b/src/precompile.jl
index 13aaac06f..ca3c9c4f9 100644
--- a/src/precompile.jl
+++ b/src/precompile.jl
@@ -44,6 +44,7 @@ function do_precompilation(::Val{mode}) where {mode}
unary_operators=[sin, cos, exp, log, sqrt, abs],
populations=3,
population_size=start ? 50 : 12,
+ tournament_selection_n=6,
ncycles_per_iteration=start ? 30 : 1,
mutation_weights=MutationWeights(;
mutate_constant=1.0,
diff --git a/test/test_mlj.jl b/test/test_mlj.jl
index d26773485..a4348fd28 100644
--- a/test/test_mlj.jl
+++ b/test/test_mlj.jl
@@ -39,7 +39,7 @@ end
rep = report(mach)
@test occursin("a", rep.equation_strings[rep.best_idx])
ypred_good = predict(mach, X)
- @test sum(abs2, predict(mach, X) .- y) / length(y) < 1e-5
+ @test sum(abs2, predict(mach, X) .- y) / length(y) < 1e-4
# Check that we can choose the equation
ypred_same = predict(mach, (data=X, idx=rep.best_idx))
@@ -127,10 +127,61 @@ end
rng = MersenneTwister(0)
X = randn(rng, 100, 3)
Y = X
- model = MultitargetSRRegressor(; niterations=10, stop_kws...)
+
+ # Create a temporary directory
+ temp_dir = mktempdir()
+
+ # Set the run_id and output_directory
+ run_id = "test_run"
+ output_directory = temp_dir
+
+ # Instantiate the model with the specified run_id and output_directory
+ model = MultitargetSRRegressor(;
+ niterations=10, run_id=run_id, output_directory=output_directory, stop_kws...
+ )
+
mach = machine(model, X, Y)
fit!(mach)
+
+ # Check predictions
@test sum(abs2, predict(mach, X) .- Y) / length(X) < 1e-6
+
+ # Load the output CSV file
+ for i in 1:3
+ csv_file = joinpath(output_directory, run_id, "hall_of_fame_output$(i).csv")
+ csv_content = read(csv_file, String)
+
+ # Parse the CSV content using regex
+ lines = split(csv_content, '\n')
+ header = split(lines[1], ',')
+ data_lines = lines[2:end]
+
+ @test header[1] == "Complexity"
+ @test header[2] == "Loss"
+ @test header[3] == "Equation"
+
+ complexities = Int[]
+ losses = Float64[]
+ equations = String[]
+
+ for line in data_lines
+ if isempty(line)
+ continue
+ end
+ cols = split(line, ',')
+ push!(complexities, parse(Int, cols[1]))
+ push!(losses, parse(Float64, cols[2]))
+ @show cols
+ push!(equations, cols[3])
+ end
+
+ @test !isempty(complexities)
+ @test complexities == report(mach).complexities[i]
+ @test losses == report(mach).losses[i]
+ for (eq, eq_str) in zip(equations, report(mach).equation_strings[i])
+ @test eq[(begin + 1):(end - 1)] == eq_str
+ end
+ end
end
@testitem "Helpful errors" tags = [:part3] begin
diff --git a/test/test_params.jl b/test/test_params.jl
index b74b58013..6c2f35006 100644
--- a/test/test_params.jl
+++ b/test/test_params.jl
@@ -30,7 +30,6 @@ const default_params = (
hof_migration=true,
fraction_replaced_hof=0.1f0,
should_optimize_constants=true,
- output_file=nothing,
perturbation_factor=1.000000f0,
annealing=true,
batching=false,
diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl
index 56cfa1f6b..dcbc3f59c 100644
--- a/test/test_pretty_printing.jl
+++ b/test/test_pretty_printing.jl
@@ -105,3 +105,36 @@ end
s = sprint((io, ex) -> print_tree(io, ex, options), ex)
@test strip(s) == "sin(x) / (y - y)"
end
+
+@testitem "printing utilities" tags = [:part2] begin
+ using SymbolicRegression.UtilsModule: split_string
+ using SymbolicRegression.HallOfFameModule: wrap_equation_string
+
+ @test split_string("abc\ndefg", 3) == ["abc", "\nde", "fg"]
+
+ test_equation_string = "cos(x) + 1.5387438743 - y^2"
+ @test wrap_equation_string(test_equation_string, 0, 15) == """cos(x) + 1....
+ 5387438743 ...
+ - y^2\n"""
+
+ # Note how we have special treatment of explicit newlines:
+ test_equation_string = "(\nB = ( -0.012549, 0.0086419, 0.6175 )\nF_d = (-0.051546) * v\n)"
+ @test wrap_equation_string(test_equation_string, 4, 1000) == """(
+ B = ( -0.012549, 0.0086419, 0.6175 )
+ F_d = (-0.051546) * v
+ )
+"""
+
+ @test startswith(wrap_equation_string(test_equation_string, 0, 10), "(\n")
+ @test wrap_equation_string(test_equation_string, 0, 12) == """(
+B = ( -0...
+.012549,...
+ 0.00864...
+19, 0.61...
+75 )
+F_d = (-...
+0.051546...
+) * v
+)
+"""
+end
diff --git a/test/test_search_statistics.jl b/test/test_search_statistics.jl
index cc2f5360a..c22425b00 100644
--- a/test/test_search_statistics.jl
+++ b/test/test_search_statistics.jl
@@ -13,7 +13,7 @@ end
normalize_frequencies!(statistics)
-@test sum(statistics.frequencies) == 1020
+@test sum(statistics.frequencies) == 1030
@test sum(statistics.normalized_frequencies) ≈ 1.0
@test statistics.normalized_frequencies[5] > statistics.normalized_frequencies[15]
diff --git a/test/test_stop_on_clock.jl b/test/test_stop_on_clock.jl
index a7f925a20..238678b47 100644
--- a/test/test_stop_on_clock.jl
+++ b/test/test_stop_on_clock.jl
@@ -10,6 +10,7 @@ y = 2 * cos.(X[4, :])
options = Options(;
default_params...,
population_size=10,
+ tournament_selection_n=9,
ncycles_per_iteration=100,
maxsize=15,
timeout_in_seconds=1,
diff --git a/test/test_template_expression.jl b/test/test_template_expression.jl
index 2187ea334..04836cf15 100644
--- a/test/test_template_expression.jl
+++ b/test/test_template_expression.jl
@@ -6,26 +6,21 @@
options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
operators = options.operators
- variable_names = (i -> "x$i").(1:3)
+ variable_names = ["x1", "x2", "x3"]
x1, x2, x3 =
(i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
# For combining expressions to a single expression:
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) =
- "sin($(nt.f)) + $(nt.g)^2"
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) =
- @. sin(nt.f) + nt.g^2
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}}) =
- sin(nt.f) + nt.g * nt.g
-
- variable_mapping = (; f=[1, 2], g=[3])
- st_expr = TemplateExpression(
- (; f=x1, g=cos(x3));
- structure=my_structure,
- operators,
- variable_names,
- variable_mapping,
+ structure = TemplateStructure(;
+ combine=e -> sin(e.f) + e.g * e.g,
+ combine_vectors=e -> (@. sin(e.f) + e.g^2),
+ combine_strings=e -> "sin($(e.f)) + $(e.g)^2",
+ variable_constraints=(; f=[1, 2], g=[3]),
)
+
+ @test structure isa TemplateStructure{(:f, :g)}
+
+ st_expr = TemplateExpression((; f=x1, g=cos(x3)); structure, operators, variable_names)
@test string_tree(st_expr) == "sin(x1) + cos(x3)^2"
operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(cos, sin))
@@ -35,8 +30,7 @@
# We can evaluate with this too:
cX = [1.0 2.0; 3.0 4.0; 5.0 6.0]
- out, completed = st_expr(cX)
- @test completed
+ out = st_expr(cX)
@test out ≈ [sin(1.0) + cos(5.0)^2, sin(2.0) + cos(6.0)^2]
# And also check the contents:
@@ -68,17 +62,13 @@ end
(i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
# For combining expressions to a single expression:
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) =
- "sin($(nt.f)) + $(nt.g)^2"
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) =
- @. sin(nt.f) + nt.g^2
- my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}}) =
- sin(nt.f) + nt.g * nt.g
-
- variable_mapping = (; f=[1, 2], g=[3])
- st_expr = TemplateExpression(
- (; f=x1, g=x3); structure=my_structure, operators, variable_names, variable_mapping
+ structure = TemplateStructure{(:f, :g)}(;
+ combine=e -> sin(e.f) + e.g * e.g,
+ combine_strings=e -> "sin($(e.f)) + $(e.g)^2",
+ combine_vectors=e -> (@. sin(e.f) + e.g^2),
+ variable_constraints=(; f=[1, 2], g=[3]),
)
+ st_expr = TemplateExpression((; f=x1, g=x3); structure, operators, variable_names)
@test Interfaces.test(ExpressionInterface, TemplateExpression, [st_expr])
end
@testitem "Utilising TemplateExpression to build vector expressions" tags = [:part3] begin
@@ -86,15 +76,11 @@ end
using Random: rand
# Define the structure function, which returns a tuple:
- function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}})
- return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2), $(nt.f) + $(nt.g3) )"
- end
- function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}})
- return map(
- i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i], nt.f[i] + nt.g3[i]),
- eachindex(nt.f),
- )
- end
+ structure = TemplateStructure{(:f, :g1, :g2, :g3)}(;
+ combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2), $(e.f) + $(e.g3) )",
+ combine_vectors=e ->
+ map((f, g1, g2, g3) -> (f + g1, f + g2, f + g3), e.f, e.g1, e.g2, e.g3),
+ )
# Set up operators and variable names
options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
@@ -106,27 +92,21 @@ end
# Test with vector inputs:
nt_vector = NamedTuple{(:f, :g1, :g2, :g3)}((1:3, 4:6, 7:9, 10:12))
- @test my_structure(nt_vector) == [(5, 8, 11), (7, 10, 13), (9, 12, 15)]
+ @test structure(nt_vector) == [(5, 8, 11), (7, 10, 13), (9, 12, 15)]
# And string inputs:
nt_string = NamedTuple{(:f, :g1, :g2, :g3)}(("x1", "x2", "x3", "x2"))
- @test my_structure(nt_string) == "( x1 + x2, x1 + x3, x1 + x2 )"
+ @test structure(nt_string) == "( x1 + x2, x1 + x3, x1 + x2 )"
# Now, using TemplateExpression:
- variable_mapping = (; f=[1, 2], g1=[3], g2=[3], g3=[3])
st_expr = TemplateExpression(
- (; f=x1, g1=x2, g2=x3, g3=x2);
- structure=my_structure,
- options.operators,
- variable_names,
- variable_mapping,
+ (; f=x1, g1=x2, g2=x3, g3=x2); structure, options.operators, variable_names
)
@test string_tree(st_expr) == "( x1 + x2, x1 + x3, x1 + x2 )"
# We can directly call it:
cX = [1.0 2.0; 3.0 4.0; 5.0 6.0]
- out, completed = st_expr(cX)
- @test completed
+ out = st_expr(cX)
@test out == [(1 + 3, 1 + 5, 1 + 3), (2 + 4, 2 + 6, 2 + 4)]
end
@testitem "TemplateExpression getters" tags = [:part3] begin
@@ -139,23 +119,109 @@ end
x1, x2, x3 =
(i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
- my_structure(nt) = nt.f
-
- variable_mapping = (; f=[1, 2], g1=[3], g2=[3], g3=[3])
+ structure = TemplateStructure(;
+ combine=e -> e.f, variable_constraints=(; f=[1, 2], g1=[3], g2=[3], g3=[3])
+ )
st_expr = TemplateExpression(
- (; f=x1, g1=x3, g2=x3, g3=x3);
- structure=my_structure,
- operators,
- variable_names,
- variable_mapping,
+ (; f=x1, g1=x3, g2=x3, g3=x3); structure, operators, variable_names
)
@test st_expr isa TemplateExpression
@test get_operators(st_expr) == operators
@test get_variable_names(st_expr) == variable_names
- @test get_metadata(st_expr).structure == my_structure
+ @test get_metadata(st_expr).structure == structure
end
@testitem "Integration Test with fit! and Performance Check" tags = [:part3] begin
include("../examples/template_expression.jl")
end
+@testitem "TemplateExpression with only combine function" tags = [:part3] begin
+ using SymbolicRegression
+ using SymbolicRegression.TemplateExpressionModule:
+ can_combine_vectors, can_combine, get_function_keys
+ using SymbolicRegression.InterfaceDynamicExpressionsModule: expected_array_type
+ using DynamicExpressions: constructorof
+
+ # Set up basic operators and variables
+ options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos))
+ operators = options.operators
+ variable_names = ["x1", "x2", "x3"]
+ x1, x2, x3 =
+ (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
+
+ # Create a TemplateStructure with only combine (no combine_vectors)
+ structure = TemplateStructure(;
+ combine=e -> sin(e.f) + e.g * e.g, # Only define combine
+ variable_constraints=(; f=[1, 2], g=[3]),
+ )
+
+ # Create the TemplateExpression
+ st_expr = TemplateExpression((; f=x1, g=cos(x3)); structure, operators, variable_names)
+
+ @test constructorof(typeof(st_expr)) === TemplateExpression
+ @test get_function_keys(st_expr) == (:f, :g)
+
+ # Test evaluation
+ cX = [1.0 2.0; 3.0 4.0; 5.0 6.0]
+ out = st_expr(cX)
+ out_2, complete = eval_tree_array(st_expr, cX)
+
+ # The expression should evaluate by first combining to a single expression,
+ # then evaluating that expression
+ expected = sin.(cX[1, :]) .+ cos.(cX[3, :]) .^ 2
+ @test out ≈ expected
+
+ @test complete
+ @test out_2 ≈ expected
+
+ # Verify that can_combine_vectors is false but can_combine is true
+ @test !can_combine_vectors(st_expr)
+ @test can_combine(st_expr)
+
+ @test expected_array_type(cX, typeof(st_expr)) === Any
+
+ @test string_tree(st_expr) == "sin(x1) + (cos(x3) * cos(x3))"
+end
+@testitem "TemplateExpression with data in combine_vectors" tags = [:part3] begin
+ using SymbolicRegression
+
+ options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, exp))
+ operators = options.operators
+ variable_names = ["x1", "x2", "x3"]
+ x1, x2, x3 =
+ (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3)
+ f = exp(2.5 * x3)
+ g = x1
+ structure = TemplateStructure(;
+ combine_vectors=(e, X) -> e.f .+ X[2, :], variable_constraints=(; f=[3], g=[1])
+ )
+ st_expr = TemplateExpression((; f, g); structure, operators, variable_names)
+ X = randn(3, 100)
+ @test st_expr(X) ≈ @. exp(2.5 * X[3, :]) + X[2, :]
+end
+@testitem "TemplateStructure constructors" tags = [:part3] begin
+ using SymbolicRegression
+
+ operators = Options(; binary_operators=(+, *, /, -)).operators
+ variable_names = ["x1", "x2"]
+
+ # Create simple expressions with constant values
+ f = Expression(Node(Float64; val=1.0); operators, variable_names)
+ g = Expression(Node(Float64; val=2.0); operators, variable_names)
+
+ # Test TemplateStructure{K}(combine; kws...)
+ st1 = TemplateStructure{(:f, :g)}(e -> e.f + e.g)
+ @test st1.combine((; f, g)) == f + g
+
+ # Test TemplateStructure(combine; kws...)
+ st2 = TemplateStructure(e -> e.f + e.g; variable_constraints=(; f=[1], g=[2]))
+ @test st2.combine((; f, g)) == f + g
+
+ # Test error when no K or variable_constraints provided
+ @test_throws ArgumentError TemplateStructure(e -> e.f + e.g)
+ @test_throws ArgumentError(
+ "If `variable_constraints` is not provided, " *
+ "you must initialize `TemplateStructure` with " *
+ "`TemplateStructure{K}(...)`, for tuple of symbols `K`.",
+ ) TemplateStructure(e -> e.f + e.g)
+end