From b395358904839caeb198893129b3d44b4e53d12e Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sat, 19 Oct 2024 23:55:54 +0100 Subject: [PATCH 01/74] docs: create changelog from past releases --- CHANGELOG.md | 1257 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1257 insertions(+) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 000000000..ad4a7b831 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,1257 @@ +# SymbolicRegression.jl v1.0.0 + +## Summary of major recent changes + +- Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` + - This gives us new features, improves user hackability, and greatly improves ergonomics! +- Created "*Template Expressions*", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`). + - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function. + - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! + - (Note that this still does not permit learning using vector operators, though we are working on that!) +- Created "*Parametric Expressions*", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`). + - This lets you fit expressions that act as *models of multiple systems*, with per-system parameters! +- Introduced a variety of new abstractions for user extensibility and to **support new research on symbolic regression**. + - `AbstractExpression`, for increased flexibility in custom expression types. + - `mutate!` and `AbstractMutationWeights`, for user-defined mutation operators. + - `AbstractSearchState`, for holding custom metadata during searches. + - `AbstractOptions` and `AbstractRuntimeOptions`, for customizing everything else via multiple dispatch. + - Many of these were motivated to modularize the implementation of [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl), an LLM-guided version of SymbolicRegression.jl, so it can sit as a modular layer on top of SymbolicRegression.jl. +- Fundamental improvements to the underlying evolutionary algorithm. + - New mutation operators introduced, `swap_operands` and `rotate_tree`, which seem to help kick the evolution out of local optima. + - New hyperparameter defaults based on Pareto front volume rather than simply accuracy of the best expression. +- Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option. +- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator. + - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. + - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again. +- Major refactoring of the codebase to improve readability and modularity. +- Increased documentation and examples. +- Julia 1.10 is now the minimum supported Julia version. + +## Major Changes + +### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` + +https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 + +This is a breaking change in the format of expressions returned by SymbolicRegression. Now, instead of returning a `Node{T}`, SymbolicRegression will return a `Expression{T,Node{T},...}` (both from `equation_search` and from `report(mach).equations`). This type is much more convenient and high-level than the `Node` type, as it includes metadata relevant for the node, such as the operators and variable names. + +This means you can reliably do things like: + +```julia +using SymbolicRegression: Options, Expression, Node + +options = Options(binary_operators=[+, -, *, /], unary_operators=[cos, exp, sin]) +operators = options.operators +variable_names = ["x1", "x2", "x3"] +x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3] + +# Use the operators directly! +tree = cos(x1 - 3.2 * x2) - x1 * x1 +``` + +You can then do operations with this `tree`, without needing to track `operators`: + +```julia +println(tree) # Looks up the right operators based on internal metadata + +X = randn(3, 100) + +tree(X) # Call directly! +tree'(X) # gradients of expression +``` + +Each time you use an operator on or between two `Expression`s that include the operator in its list, it will look up the right enum index, and create the correct `Node`, and then return a new `Expression`. + +You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). + +### Customizing behavior + +DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. + +This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. + +### Auto-diff within optimization + +Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. + +However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use +```julia +Options( + autodiff_backend=:Zygote, +) +``` + +to use Zygote.jl for autodiff during BFGS optimization, or even + +```julia +Options( + autodiff_backend=:Enzyme, +) +``` + +for Enzyme.jl (though Enzyme support is highly experimental). + +## Other Changes + +* Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 + * This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. +* Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 +* CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 +* fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +* chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 + + +## New Contributors +* @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +* Thanks to @larsentom for the mutation idea + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 + +# SymbolicRegression.jl v1.0.0-beta1 + +This is a **_beta release_** that is not yet registered. To try it out, open a Julia REPL and hit `]`, then: + +```julia +pkg> add SymbolicRegression#v1.0.0-beta1 +``` + +Before the final release of v1.0.0, the hyperparameters will be re-tuned to optimize the new mutations: `swap_operands` and `rotate_tree`, which seem to be quite effective. + +## Major Changes + +### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` + +https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 + +This is a breaking change in the format of expressions returned by SymbolicRegression. Now, instead of returning a `Node{T}`, SymbolicRegression will return a `Expression{T,Node{T},...}` (both from `equation_search` and from `report(mach).equations`). This type is much more convenient and high-level than the `Node` type, as it includes metadata relevant for the node, such as the operators and variable names. + +This means you can reliably do things like: + +```julia +using SymbolicRegression: Options, Expression, Node + +options = Options(binary_operators=[+, -, *, /], unary_operators=[cos, exp, sin]) +operators = options.operators +variable_names = ["x1", "x2", "x3"] +x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3] + +# Use the operators directly! +tree = cos(x1 - 3.2 * x2) - x1 * x1 +``` + +You can then do operations with this `tree`, without needing to track `operators`: + +```julia +println(tree) # Looks up the right operators based on internal metadata + +X = randn(3, 100) + +tree(X) # Call directly! +tree'(X) # gradients of expression +``` + +Each time you use an operator on or between two `Expression`s that include the operator in its list, it will look up the right enum index, and create the correct `Node`, and then return a new `Expression`. + +You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). + +### Customizing behavior + +DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. + +This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. + +### Auto-diff within optimization + +Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. + +However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use +```julia +Options( + autodiff_backend=:Zygote, +) +``` + +to use Zygote.jl for autodiff during BFGS optimization, or even + +```julia +Options( + autodiff_backend=:Enzyme, +) +``` + +for Enzyme.jl (though Enzyme support is highly experimental). + +## Other Changes + +* Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 + * This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. +* Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 +* CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 +* fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +* chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 + + +## New Contributors +* @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +* Thanks to @larsentom for the mutation idea + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 + +# SymbolicRegression.jl v0.24.5 + +## SymbolicRegression v0.24.5 + +[Diff since v0.24.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.4...v0.24.5) + + +**Merged pull requests:** +- ci: split up test suite into multiple runners (#311) (@MilesCranmer) +- chore(deps): bump julia-actions/cache from 1 to 2 (#315) (@dependabot[bot]) +- CompatHelper: bump compat for DynamicQuantities to 0.14, (keep existing compat) (#317) (@github-actions[bot]) +- Use DispatchDoctor.jl to wrap entire package with `@stable` (#321) (@MilesCranmer) +- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) (#322) (@github-actions[bot]) +- Mark more functions as stable (#323) (@MilesCranmer) +- Allow per-variable complexity (#324) (@MilesCranmer) +- Refactor tests to use TestItems.jl (#325) (@MilesCranmer) + +# SymbolicRegression.jl v0.24.4 + +## SymbolicRegression v0.24.4 + +[Diff since v0.24.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.3...v0.24.4) + + +**Merged pull requests:** +- feat: use `?` for wildcard units instead of `⋅` (#307) (@MilesCranmer) +- refactor: fix some more type instabilities (#308) (@MilesCranmer) +- refactor: remove unused Tricks dependency (#309) (@MilesCranmer) +- Add option to force dimensionless constants (#310) (@MilesCranmer) + +# SymbolicRegression.jl v0.24.3 + +## SymbolicRegression v0.24.3 + +[Diff since v0.24.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.2...v0.24.3) + + +**Merged pull requests:** +- 40% speedup (for default settings) via more parallelism inside workers (#304) (@MilesCranmer) + +**Closed issues:** +- Silence warnings for Optim.jl (#255) + +# SymbolicRegression.jl v0.24.2 + +## SymbolicRegression v0.24.2 + +[Diff since v0.24.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.1...v0.24.2) + + +**Merged pull requests:** +- Bump julia-actions/setup-julia from 1 to 2 (#300) (@dependabot[bot]) +- [pre-commit.ci] pre-commit autoupdate (#301) (@pre-commit-ci[bot]) +- A small update on examples.md for 1-based indexing (#302) (@liuyxpp) +- Fixes for Julia 1.11 (#303) (@MilesCranmer) + +**Closed issues:** +- API Overhaul (#187) +- [Feature]: Training on high dimensions X (#299) + +# SymbolicRegression.jl v0.24.1 + +## What's Changed +* CompatHelper: bump compat for MLJModelInterface to 1.9, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/295 +* CompatHelper: bump compat for ProgressBars to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/294 +* Ensure we load ClusterManagers.jl on workers by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/297 +* Move test dependencies to test folder by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/298 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.0...v0.24.1 + +# SymbolicRegression.jl v0.24.0 + +## What's Changed + +* Experimental support for program synthesis / graph-like expressions instead of trees (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/271) + * **BREAKING**: many types now have a third type parameter, declaring the type of node. For example, `PopMember{T,L}` is now `PopMember{T,L,N}` for `N` the type of expression. + * Can now specify a `node_type` in creation of `Options`. This `node_type <: AbstractExpressionNode` can be a `GraphNode` which will result in expressions that care share nodes – and therefore have a lower complexity. + * Two new mutations: `form_connection` and `break_connection` – which control the merging and breaking of shared nodes in expressions. These are experimental. +* **BREAKING**: The `Dataset` struct has had many of its field declared immutable (for memory safety). If you had relied on the mutability of the struct to set parameters after initializing it, you will need to modify your code. +* **BREAKING**: LoopVectorization.jl moved to a package extension. Need to install it separately (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287). +* **DEPRECATED**: Now prefer to use new keyword-based constructors for nodes: +```julia +Node{T}(feature=...) # leaf referencing a particular feature column +Node{T}(val=...) # constant value leaf +Node{T}(op=1, l=x1) # operator unary node, using the 1st unary operator +Node{T}(op=1, l=x1, r=1.5) # binary unary node, using the 1st binary operator +``` +rather than the previous constructors `Node(op, l, r)` and `Node(T; val=...)` (though those will still work; just with a `depwarn`). +* Bumper.jl support added. Passing `bumper=true` to `Options()` will result in using bump-allocation for evaluation which can get speeds equivalent to LoopVectorization and sometimes even better due to better management of allocations. (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287) +* Upgraded Optim.jl to 1.9. +* Upgraded DynamicQuantities to 0.13 +* Upgraded DynamicExpressions to 0.16 +* The main search loop has been greatly refactored for readability and improved type inference. It now looks like this (down from a monolithic ~1000 line function) + +```julia +function _equation_search( + datasets::Vector{D}, ropt::RuntimeOptions, options::Options, saved_state +) where {D<:Dataset} + _validate_options(datasets, ropt, options) + state = _create_workers(datasets, ropt, options) + _initialize_search!(state, datasets, ropt, options, saved_state) + _warmup_search!(state, datasets, ropt, options) + _main_search_loop!(state, datasets, ropt, options) + _tear_down!(state, ropt, options) + return _format_output(state, ropt) +end +``` + + + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.3...v0.24.0 + +# SymbolicRegression.jl v0.23.3 + +## SymbolicRegression v0.23.3 + +[Diff since v0.23.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.2...v0.23.3) + + +**Merged pull requests:** +- Bump peter-evans/create-or-update-comment from 3 to 4 (#283) (@dependabot[bot]) +- Bump peter-evans/find-comment from 2 to 3 (#284) (@dependabot[bot]) +- Bump peter-evans/create-pull-request from 5 to 6 (#286) (@dependabot[bot]) + +# SymbolicRegression.jl v0.23.2 + +## SymbolicRegression v0.23.2 + +[Diff since v0.23.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.1...v0.23.2) + + +**Merged pull requests:** +- Formatting overhaul (#278) (@MilesCranmer) +- Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer) +- Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer) + +**Closed issues:** +- Garbage collection too passive on worker processes (#237) +- How can I set the maximum number of nests? (#285) + +# SymbolicRegression.jl v0.23.1 + +## What's Changed +* Implement swap operands mutation for binary operators by @foxtran in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 + +## New Contributors +* @foxtran made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.0...v0.23.1 + +# SymbolicRegression.jl v0.23.0 + +## SymbolicRegression v0.23.0 + +[Diff since v0.22.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.5...v0.23.0) + + +**Merged pull requests:** +- Automatically set heap size hint on workers (#270) (@MilesCranmer) + +**Closed issues:** +- How do I set up a basis function consisting of three different inputs x, y, z? (#268) + +# SymbolicRegression.jl v0.22.5 + +## SymbolicRegression v0.22.5 + +[Diff since v0.22.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.4...v0.22.5) + + +**Merged pull requests:** +- CompatHelper: bump compat for DynamicQuantities to 0.7, (keep existing compat) (#259) (@github-actions[bot]) +- Create `cond` operator (#260) (@MilesCranmer) +- Add `[compat]` entry for Documenter (#261) (@MilesCranmer) +- CompatHelper: bump compat for DynamicQuantities to 0.10 (#264) (@github-actions[bot]) + +# SymbolicRegression.jl v0.22.4 + +## SymbolicRegression v0.22.4 + +[Diff since v0.22.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.3...v0.22.4) + + + +**Merged pull requests:** +- Hotfix for breaking change in Optim.jl (#256) (@MilesCranmer) +- Fix worldage issues by avoiding `static_hasmethod` when not needed (#258) (@MilesCranmer) + +# SymbolicRegression.jl v0.22.3 + +## What's Changed +* CompatHelper: bump compat for DynamicExpressions to 0.13, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/250 +* Fix type stability of deterministic mode by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/251 +* Faster random sampling of nodes by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/252 +* Faster copying of `MutationWeights` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/253 +* Hotfix for breaking change in Optim.jl by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/256 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.2...v0.22.3 + +# SymbolicRegression.jl v0.22.2 + +## SymbolicRegression v0.22.2 + +[Diff since v0.22.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.1...v0.22.2) + + + +**Merged pull requests:** +- Expand aqua test suite (#246) (@MilesCranmer) +- Return more descriptive errors for poorly defined operators (#247) (@MilesCranmer) + +# SymbolicRegression.jl v0.22.1 + +## SymbolicRegression v0.22.1 + +[Diff since v0.22.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.0...v0.22.1) + +# SymbolicRegression.jl v0.22.0 + +## What's Changed +* (**Algorithm modification**) Evaluate on fixed batch when building per-population hall of fame in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/243 + * This only affects searches that use `batching=true`. It results in improved searches on large datasets, as the "winning expression" is not biased towards an expression that landed on a lucky batch. + * Note that this only occurs within an iteration. Evaluation on the entire dataset still happens at the end of an iteration and those loss measurements are used for absolute comparison between expressions. +* (**Algorithm modification**) Deprecates the `fast_cycle` feature in #243. Use of this parameter will have no effect. + * Was removed to ease maintenance burden and because it doesn't have a use. This feature was created early on in development as a way to get parallelism within a population. It is no longer useful as you can parallelize across populations. +* Add Aqua.jl to test suite in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/245 +* CompatHelper: bump compat for DynamicExpressions to 0.12, (keep existing compat) in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/242 + * Is able to avoids method invalidations when using operators to construct expressions manually by modifying a global constant mapping of operator => index, rather than `@eval`-ing new operators. + * This only matters if you were using operators to build trees, like `x1 + x2`. All internal search code uses `Node()` explicitly to build expressions, so did not rely on method invalidation at any point. +* Renames some parameters in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/234 + * `npop` => `population_size` + * `npopulations` => `populations` + * This is just to match PySR's API. Also note that the deprecated parameters will still work, and there will not be a warning unless you are running with `--depwarn=yes`. +* Ensure that `predict` uses units if trained with them in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/244 + * If you train on a dataset that has physical units, this ensures that `MLJ.predict` will output predictions in the same units. Before this change, `MLJ.predict` would return numerical arrays with no units. + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.5...v0.22.0 + +# SymbolicRegression.jl v0.21.5 + +## What's Changed +* Allow custom display variable names by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/240 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.4...v0.21.5 + +# SymbolicRegression.jl v0.21.4 + +## SymbolicRegression v0.21.4 + +[Diff since v0.21.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.3...v0.21.4) + + +**Closed issues:** +- [Cleanup] Better implementation of batching (#88) + +**Merged pull requests:** +- CompatHelper: bump compat for LossFunctions to 0.11, (keep existing compat) (#238) (@github-actions[bot]) +- Enable compatibility with MLJTuning.jl (#239) (@MilesCranmer) + +# SymbolicRegression.jl v0.21.3 + +## What's Changed +* Batching inside optimization loop + batching support for custom objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/235 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.2...v0.21.3 + +# SymbolicRegression.jl v0.21.2 + +## What's Changed +* Allow empty string units (==1) by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/233 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.1...v0.21.2 + +# SymbolicRegression.jl v0.21.1 + +## What's Changed +* Update DynamicExpressions.jl version by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/232 + * Makes Zygote.jl an extension + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.0...v0.21.1 + +# SymbolicRegression.jl v0.21.0 + +## What's Changed +* https://github.com/MilesCranmer/SymbolicRegression.jl/pull/228 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/230 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/231 + - **Dimensional analysis** (#228) + - Allows you to (softly) constrain discovered expressions to those that respect physical dimensions + - Pass vectors of DynamicQuantities.jl `Quantity` type to the MLJ interface. + - OR, specify `X_units`, `y_units` to low-level `equation_search`. + - **Printing improvements** (#228) + - By default, only 5 significant digits are now printed, rather than the entire float. You can change this with the `print_precision` option. + - In the default printed equations, `x₁` is used rather than `x1`. + - `y = ` is printed at the start (or `y₁ = ` for multi-output). With units this becomes, for example, `y[kg] =`. + - **Misc** + - Easier to convert from MLJ interface to SymbolicUtils (via `node_to_symbolic(::Node, ::AbstractSRRegressor)`) (#228) + - Improved precompilation (#228) + - Various performance and type stability improvements (#228) + - Inlined the recording option to speedup compilation (#230) + - Updated tutorials to use MLJ rather than low-level interface (#228) + - Moved JSON3.jl to extension (#231) + - Use PackageExtensionsCompat.jl over Requires.jl (#231) + - Require LossFunctions.jl to be 0.10 (#231) + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.20.0...v0.21.0 + +# SymbolicRegression.jl v0.20.0 + +## SymbolicRegression v0.20.0 + +[Diff since v0.19.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.1...v0.20.0) + + +**Closed issues:** +- [Feature]: MLJ integration (#225) + +**Merged pull requests:** +- MLJ Integration (#226) (@MilesCranmer, @OkonSamuel) + +# SymbolicRegression.jl v0.19.1 + +## SymbolicRegression v0.19.1 + +[Diff since v0.19.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.0...v0.19.1) + + + +**Merged pull requests:** +- CompatHelper: bump compat for StatsBase to 0.34, (keep existing compat) (#202) (@github-actions[bot]) +- (Soft deprecation) change `varMap` to `variable_names` (#219) (@MilesCranmer) +- (Soft deprecation) rename `EquationSearch` to `equation_search` (#222) (@MilesCranmer) +- Fix equation splitting for unicode variables (#223) (@MilesCranmer) + +# SymbolicRegression.jl v0.19.0 + +## What's Changed +* Time to load improved by 40% with the following changes in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/215 + * Moved SymbolicUtils.jl to extension/Requires.jl + * Removed StaticArrays.jl as a dependency and implement tiny version of MVector + * Removed `@generated` functions + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.18.0...v0.19.0 + +# SymbolicRegression.jl v0.18.0 + +## SymbolicRegression v0.18.0 + +[Diff since v0.17.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.1...v0.18.0) + + + +**Merged pull requests:** +- Overload ^ if user passes explicitly (#201) (@MilesCranmer) +- Upgrade DynamicExpressions to 0.8; LossFunctions to 0.10 (#206) (@github-actions[bot]) +- Show expressions evaluated per second (#209) (@MilesCranmer) +- Cache complexity of expressions whenever possible (#210) (@MilesCranmer) + +# SymbolicRegression.jl v0.17.1 + +## SymbolicRegression v0.17.1 + +[Diff since v0.17.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.0...v0.17.1) + + + +**Merged pull requests:** +- Faster custom losses (#197) (@MilesCranmer) +- Migrate from SnoopPrecompile to PrecompileTools (#198) (@timholy) + +# SymbolicRegression.jl v0.17.0 + +## SymbolicRegression v0.17.0 + +[Diff since v0.16.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.3...v0.17.0) + + +**Closed issues:** +- troubles in pysr.install() (#196) + +**Merged pull requests:** +- Multiple refactors: arbitrary data in `Dataset`, separate mutation weight conditioning, fix data races, cleaner API (#190) (@MilesCranmer) +- CompatHelper: bump compat for DynamicExpressions to 0.6, (keep existing compat) (#194) (@github-actions[bot]) + +# SymbolicRegression.jl v0.16.3 + +## SymbolicRegression v0.16.3 + +[Diff since v0.16.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.2...v0.16.3) + + + +**Merged pull requests:** +- CompatHelper: bump compat for SymbolicUtils to 1, (keep existing compat) (#168) (@github-actions[bot]) + +# SymbolicRegression.jl v0.16.2 + +## What's Changed +* Turn off simplification when constraints given by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/189 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.1...v0.16.2 + +# SymbolicRegression.jl v0.16.1 + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.0...v0.16.1 + +# SymbolicRegression.jl v0.16.0 + +## SymbolicRegression v0.16.0 + +[Diff since v0.15.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.3...v0.16.0) + + +**Closed issues:** +- Partially fixed trees (#166) +- Settings of `addprocs` (#180) +- Equation printout should split into multiple lines (#182) + +**Merged pull requests:** +- Force safe closing of threads (#175) (@MilesCranmer) +- Abstract number support (#183) (@MilesCranmer) +- Include datetime in default filename (#185) (@MilesCranmer) + +# SymbolicRegression.jl v0.15.3 + +## What's Changed +* Re-compute losses for warm start by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/177 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.2...v0.15.3 + +# SymbolicRegression.jl v0.15.2 + +## What's Changed +* Include depth check in `check_constraints` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/172 +* Fix data race in state saving by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/173 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.1...v0.15.2 + +# SymbolicRegression.jl v0.15.1 + +## What's Changed +* Fix bug in constraint checking by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/171 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.0...v0.15.1 + +# SymbolicRegression.jl v0.15.0 + +## What's Changed +* Fully-customizable training objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/143 +* Safely catch non-readable stdin stream by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/169 + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.5...v0.15.0 + +# SymbolicRegression.jl v0.14.5 + +## SymbolicRegression v0.14.5 + +[Diff since v0.14.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.4...v0.14.5) + + +**Closed issues:** +- Large test output (#159) + +**Merged pull requests:** +- Quiet progress bar during CI (#160) (@MilesCranmer) +- Proper SnoopCompilation (#161) (@MilesCranmer) + +# SymbolicRegression.jl v0.14.4 + +## SymbolicRegression v0.14.4 + +[Diff since v0.14.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.3...v0.14.4) + + + +**Merged pull requests:** +- Refactor monitoring of resources (#158) (@MilesCranmer) + +# SymbolicRegression.jl v0.14.3 + +## SymbolicRegression v0.14.3 + +[Diff since v0.14.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.2...v0.14.3) + + + +**Merged pull requests:** +- Turn off safe operators for turbo=true (#156) (@MilesCranmer) +- Use `ProgressBars.jl` instead of copying (#157) (@MilesCranmer) + +# SymbolicRegression.jl v0.14.2 + +## SymbolicRegression v0.14.2 + +[Diff since v0.14.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.1...v0.14.2) + +# SymbolicRegression.jl v0.14.1 + +## SymbolicRegression v0.14.1 + +[Diff since v0.14.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.0...v0.14.1) + + + +**Merged pull requests:** +- Do optimizations as a low-probability mutation (#154) (@MilesCranmer) + +# SymbolicRegression.jl v0.14.0 + +## SymbolicRegression v0.14.0 + +[Diff since v0.13.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.3...v0.14.0) + + + +**Merged pull requests:** +- Add `@extend_operators` from DynamicExpressions.jl v0.4.0 (#153) (@MilesCranmer) + +# SymbolicRegression.jl v0.13.3 + +## SymbolicRegression v0.13.3 + +[Diff since v0.13.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.3) + + + +**Merged pull requests:** +- 30% speed up by using LoopVectorization in DynamicExpressions.jl (#151) (@MilesCranmer) + +# SymbolicRegression.jl v0.13.2 + +- Allow strings to be passed for the `parallelism` argument of EquationSearch (e.g., `"multithreading"` instead of `:multithreading`). This is to allow compatibility with PyJulia calls, which can't pass symbols. + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.2 + +# SymbolicRegression.jl v0.13.1 + +## SymbolicRegression v0.13.1 + +[Diff since v0.13.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.0...v0.13.1) + + + +**Merged pull requests:** +- Refactor mutation probabilities (#140) (@MilesCranmer) + +# SymbolicRegression.jl v0.13.0 + +## SymbolicRegression v0.13.0 + +[Diff since v0.12.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.6...v0.13.0) + + + +**Merged pull requests:** +- Split codebase in two: DynamicExpressions.jl and SymbolicRegression.jl (#147) (@MilesCranmer) + +# SymbolicRegression.jl v0.12.6 + +## SymbolicRegression v0.12.6 + +[Diff since v0.12.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.5...v0.12.6) + + +**Closed issues:** +- [Feature] Integration of Existing Knowledge (#139) +- Search fidelity is much worse after v0.12.3 (#148) + +**Merged pull requests:** +- Fix search performance problem #148 (#149) (@MilesCranmer) + +# SymbolicRegression.jl v0.12.5 + +## SymbolicRegression v0.12.5 + +[Diff since v0.12.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.4...v0.12.5) + +# SymbolicRegression.jl v0.12.4 + +## SymbolicRegression v0.12.4 + +[Diff since v0.12.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.3...v0.12.4) + + + +**Merged pull requests:** +- Create logo (#145) (@MilesCranmer) + +# SymbolicRegression.jl v0.12.3 + +## SymbolicRegression v0.12.3 + +[Diff since v0.12.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.2...v0.12.3) + + + +**Merged pull requests:** +- Even faster evaluation (#144) (@MilesCranmer) + +# SymbolicRegression.jl v0.12.2 + +## SymbolicRegression v0.12.2 + +[Diff since v0.12.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.1...v0.12.2) + + +**Closed issues:** +- How to fix a number of variables in predicted equations (#130) + +**Merged pull requests:** +- Fast evaluation for constant trees (#129) (@MilesCranmer) + +# SymbolicRegression.jl v0.12.1 + +## SymbolicRegression v0.12.1 + +[Diff since v0.12.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.0...v0.12.1) + +# SymbolicRegression.jl v0.12.0 + +## What's Changed +* Use functions returning NaN on branch cuts instead of abs (issue #109) by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/123 + * By returning NaN, an expression will have infinite loss - this will make the expression search simply avoid expressions that hit out-of-domain errors, rather than using `abs` everywhere which results in fundamentally different functional forms. +* Generalize `Node{T}` type to non-floats by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/122 + * Will eventually enable integer-only expression searches + + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.11.1...v0.12.0 + +# SymbolicRegression.jl v0.11.1 + +## What's Changed +* Generalize expressions to have arbitrary constant types by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/119 +* Optimizer options by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 +* Fix recorder when `Inf` appears as loss for expression +* Fix normalization when dataset has zero variance: https://github.com/MilesCranmer/SymbolicRegression.jl/commit/85f4909e8156ba8ff6cf89122371901a13df5688 +* Set default parsimony to 0.0 + +## New Contributors +* @johanbluecreek made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.10.2...v0.11.1 + +# SymbolicRegression.jl v0.10.2 + +## SymbolicRegression v0.10.2 + +[Diff since v0.9.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.7...v0.10.2) + + + +**Merged pull requests:** +- Update losses.md (#114) (@pitmonticone) +- Set `timeout-minutes` for CI (#116) (@rikhuijzer) + +# SymbolicRegression.jl v0.9.7 + +## SymbolicRegression v0.9.7 + +[Diff since v0.9.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.6...v0.9.7) + +# SymbolicRegression.jl v0.9.6 + +## SymbolicRegression v0.9.6 + +[Diff since v0.9.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.5...v0.9.6) + +# SymbolicRegression.jl v0.9.5 + +## What's Changed +* Add deterministic option in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/108 +* Fix issue with infinite while loop due to numerical precision + +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.3...v0.9.5 + +# SymbolicRegression.jl v0.9.3 + +## SymbolicRegression v0.9.3 + +[Diff since v0.9.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.2...v0.9.3) + + + +**Merged pull requests:** +- CompatHelper: bump compat for LossFunctions to 0.8, (keep existing compat) (#106) (@github-actions[bot]) + +# SymbolicRegression.jl v0.9.2 + +## SymbolicRegression v0.9.2 + +[Diff since v0.9.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.0...v0.9.2) + + +**Closed issues:** +- Q : recording # of function calls (#74) +- Mangled name from @FromFile displayed in docs (#78) +- Consistent snake_case vs CamelCase (#85) + +**Merged pull requests:** +- Apply Blue formatting + change all internal methods to snake_case (#100) (@MilesCranmer) +- Limiting max evaluations (#104) (@MilesCranmer) +- Custom complexities of operators, variables, and constants (#105) (@MilesCranmer) + +# SymbolicRegression.jl v0.9.0 + +## SymbolicRegression v0.9.0 + +[Diff since v0.8.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.7...v0.9.0) + + +**Closed issues:** +- Update SymbolicUtils (#98) + +**Merged pull requests:** +- Bump SymbolicUtils.jl to 0.19 (#84) (@ChrisRackauckas) + +# SymbolicRegression.jl v0.8.7 + +## SymbolicRegression v0.8.7 + +[Diff since v0.8.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.6...v0.8.7) + +# SymbolicRegression.jl v0.8.6 + +## SymbolicRegression v0.8.6 + +[Diff since v0.8.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.5...v0.8.6) + + + +**Merged pull requests:** +- Switch from FromFile.jl to traditional module system (#95) (@MilesCranmer) +- Add constraints on the number of times operators can be nested (#96) (@MilesCranmer) + +# SymbolicRegression.jl v0.8.5 + +## SymbolicRegression v0.8.5 + +[Diff since v0.8.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.3...v0.8.5) + + +**Closed issues:** +- [CLEANUP] Default settings (#72) +- forcing variables to regression (#87) + +**Merged pull requests:** +- Autodiff for equations (#39) (@kazewong) +- fix worker connection timeout error (#91) (@CharFox1) +- Automatic multi-node compute setup by passing custom `addprocs` (#94) (@MilesCranmer) + +# SymbolicRegression.jl v0.8.3 + +## SymbolicRegression v0.8.3 + +[Diff since v0.8.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.2...v0.8.3) + +# SymbolicRegression.jl v0.8.2 + +## SymbolicRegression v0.8.2 + +[Diff since v0.8.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.1...v0.8.2) + + +**Closed issues:** +- Interactive regression / printing epochs (#80) + +# SymbolicRegression.jl v0.8.1 + +## SymbolicRegression v0.8.1 + +[Diff since v0.7.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.13...v0.8.1) + + +**Closed issues:** +- [BUG] Domain errors (#71) +- [Performance] Single evaluation results (#73) + +**Merged pull requests:** +- Refactoring PopMember + adding adaptive parsimony to tournament (#75) (@MilesCranmer) +- Introduce better default hyperparameters (#76) (@MilesCranmer) + +# SymbolicRegression.jl v0.7.13 + +## SymbolicRegression v0.7.13 + +[Diff since v0.7.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.10...v0.7.13) + +# SymbolicRegression.jl v0.7.10 + +## SymbolicRegression v0.7.10 + +[Diff since v0.7.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.9...v0.7.10) + +# SymbolicRegression.jl v0.7.9 + +## SymbolicRegression v0.7.9 + +[Diff since v0.7.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.8...v0.7.9) + +# SymbolicRegression.jl v0.7.8 + +## SymbolicRegression v0.7.8 + +[Diff since v0.7.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.7...v0.7.8) + + +**Closed issues:** +- Tournament selection p (#68) + +**Merged pull requests:** +- Fix tournament samples (#70) (@MilesCranmer) + +# SymbolicRegression.jl v0.7.7 + +## SymbolicRegression v0.7.7 + +[Diff since v0.7.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.6...v0.7.7) + +# SymbolicRegression.jl v0.7.6 + +## SymbolicRegression v0.7.6 + +[Diff since v0.7.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.5...v0.7.6) + + +**Closed issues:** +- Parsimony interference in pareto frontier (#66) +- DomainError when computing pareto curve (#67) + +# SymbolicRegression.jl v0.7.5 + +## SymbolicRegression v0.7.5 + +[Diff since v0.7.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.4...v0.7.5) + +# SymbolicRegression.jl v0.7.4 + +## SymbolicRegression v0.7.4 + +[Diff since v0.7.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.3...v0.7.4) + + +**Closed issues:** +- Base.print (#64) + +# SymbolicRegression.jl v0.7.3 + +## SymbolicRegression v0.7.3 + +[Diff since v0.7.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.2...v0.7.3) + +# SymbolicRegression.jl v0.7.2 + +## SymbolicRegression v0.7.2 + +[Diff since v0.7.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.1...v0.7.2) + +# SymbolicRegression.jl v0.7.1 + +## SymbolicRegression v0.7.1 + +[Diff since v0.7.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.0...v0.7.1) + + + +**Merged pull requests:** +- CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#56) (@github-actions[bot]) + +# SymbolicRegression.jl v0.7.0 + +## SymbolicRegression v0.7.0 + +[Diff since v0.6.19](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.19...v0.7.0) + + +**Closed issues:** +- Switching from Float to UInt8 ? (#58) + +**Merged pull requests:** +- Revert to SymbolicUtils.jl 0.6 (#60) (@MilesCranmer) + +# SymbolicRegression.jl v0.6.19 + +## SymbolicRegression v0.6.19 + +[Diff since v0.6.18](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.18...v0.6.19) + +# SymbolicRegression.jl v0.6.18 + +## SymbolicRegression v0.6.18 + +[Diff since v0.6.17](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.17...v0.6.18) + +# SymbolicRegression.jl v0.6.17 + +## SymbolicRegression v0.6.17 + +[Diff since v0.6.16](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.16...v0.6.17) + + +**Closed issues:** +- Can't define options as listed in Tutorial, causes Method Error. (#54) +- Using recorder to only track specific information? (#55) + +# SymbolicRegression.jl v0.6.16 + +## SymbolicRegression v0.6.16 + +[Diff since v0.6.15](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.15...v0.6.16) + + + +**Merged pull requests:** +- Expand compatibility to other SymbolicUtils.jl versions (#53) (@MilesCranmer) + +# SymbolicRegression.jl v0.6.15 + +## SymbolicRegression v0.6.15 + +[Diff since v0.6.14](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.14...v0.6.15) + + +**Closed issues:** +- Unsatisfiable requirements detected for package SymbolicUtils (#51) + +**Merged pull requests:** +- SymbolicUtils v0.18 (#50) (@AlCap23) + +# SymbolicRegression.jl v0.6.14 + +## SymbolicRegression v0.6.14 + +[Diff since v0.6.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.13...v0.6.14) + + +**Closed issues:** +- nested task error (#43) +- MethodError: Cannot `convert` an object of type SymbolicUtils.Term{Number, Nothing} to an object of type SymbolicUtils.Pow{Number, SymbolicUtils.Term{Number, Nothing}, Float32, Nothing} (#44) + +# SymbolicRegression.jl v0.6.13 + +## SymbolicRegression v0.6.13 + +[Diff since v0.6.12](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.12...v0.6.13) + +# SymbolicRegression.jl v0.6.12 + +## SymbolicRegression v0.6.12 + +[Diff since v0.6.11](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.11...v0.6.12) + + +**Closed issues:** +- Options.npopulations = nothing, does not detect number of cores (#38) + +**Merged pull requests:** +- Fix index functions in SymbolicUtils (#40) (@MilesCranmer) + +# SymbolicRegression.jl v0.6.11 + +## SymbolicRegression v0.6.11 + +[Diff since v0.6.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.10...v0.6.11) + + + +**Merged pull requests:** +- Updates for SymbolicUtils 0.13 (#37) (@AlCap23) + +# SymbolicRegression.jl v0.6.10 + +## SymbolicRegression v0.6.10 + +[Diff since v0.6.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.9...v0.6.10) + + +**Closed issues:** +- Saving equations throughout runtime (#33) + +**Merged pull requests:** +- Add multithreading as alternative to distributed (#34) (@MilesCranmer) +- Allow infinities in recorder (#36) (@cobac) + +# SymbolicRegression.jl v0.6.9 + +## SymbolicRegression v0.6.9 + +[Diff since v0.6.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.8...v0.6.9) + +# SymbolicRegression.jl v0.6.8 + +## SymbolicRegression v0.6.8 + +[Diff since v0.6.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.7...v0.6.8) + +# SymbolicRegression.jl v0.6.7 + +## SymbolicRegression v0.6.7 + +[Diff since v0.6.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.6...v0.6.7) + +# SymbolicRegression.jl v0.6.6 + +## SymbolicRegression v0.6.6 + +[Diff since v0.6.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.5...v0.6.6) + +# SymbolicRegression.jl v0.6.5 + +## SymbolicRegression v0.6.5 + +[Diff since v0.6.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.4...v0.6.5) + +# SymbolicRegression.jl v0.6.4 + +## SymbolicRegression v0.6.4 + +[Diff since v0.6.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.3...v0.6.4) + +# SymbolicRegression.jl v0.6.3 + +## SymbolicRegression v0.6.3 + +[Diff since v0.6.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.2...v0.6.3) + +# SymbolicRegression.jl v0.6.2 + +## SymbolicRegression v0.6.2 + +[Diff since v0.6.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.1...v0.6.2) + + +**Closed issues:** +- Data recorder (#27) +- Long-running parallel jobs have small percentage of processes hang (#28) + +# SymbolicRegression.jl v0.6.1 + +## SymbolicRegression v0.6.1 + +[Diff since v0.6.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.0...v0.6.1) + + + +**Merged pull requests:** +- Recorder and improved tournament selection (#29) (@MilesCranmer) + From b80a122d9d9614846ee4717f1b1fc5ea004d8221 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sat, 19 Oct 2024 23:56:17 +0100 Subject: [PATCH 02/74] docs: markdown linting --- CHANGELOG.md | 388 ++++++++++++++++++++++++++------------------------- 1 file changed, 197 insertions(+), 191 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ad4a7b831..8c7170f12 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,12 +4,12 @@ - Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` - This gives us new features, improves user hackability, and greatly improves ergonomics! -- Created "*Template Expressions*", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`). +- Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`). - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function. - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! - (Note that this still does not permit learning using vector operators, though we are working on that!) -- Created "*Parametric Expressions*", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`). - - This lets you fit expressions that act as *models of multiple systems*, with per-system parameters! +- Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`). + - This lets you fit expressions that act as _models of multiple systems_, with per-system parameters! - Introduced a variety of new abstractions for user extensibility and to **support new research on symbolic regression**. - `AbstractExpression`, for increased flexibility in custom expression types. - `mutate!` and `AbstractMutationWeights`, for user-defined mutation operators. @@ -20,7 +20,7 @@ - New mutation operators introduced, `swap_operands` and `rotate_tree`, which seem to help kick the evolution out of local optima. - New hyperparameter defaults based on Pareto front volume rather than simply accuracy of the best expression. - Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option. -- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator. +- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator. - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again. - Major refactoring of the codebase to improve readability and modularity. @@ -74,7 +74,8 @@ This `ParametricExpression` is meant partly as an example of the types of things Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. -However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use +However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use + ```julia Options( autodiff_backend=:Zygote, @@ -89,21 +90,21 @@ Options( ) ``` -for Enzyme.jl (though Enzyme support is highly experimental). +for Enzyme.jl (though Enzyme support is highly experimental). ## Other Changes -* Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - * This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. -* Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 -* CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 -* fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -* chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 - +- Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 + - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. +- Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 +- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 +- fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +- chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 ## New Contributors -* @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -* Thanks to @larsentom for the mutation idea + +- @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +- Thanks to @larsentom for the mutation idea **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 @@ -119,7 +120,7 @@ Before the final release of v1.0.0, the hyperparameters will be re-tuned to opti ## Major Changes -### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` +### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -164,7 +165,8 @@ This `ParametricExpression` is meant partly as an example of the types of things Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. -However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use +However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use + ```julia Options( autodiff_backend=:Zygote, @@ -179,21 +181,21 @@ Options( ) ``` -for Enzyme.jl (though Enzyme support is highly experimental). +for Enzyme.jl (though Enzyme support is highly experimental). ## Other Changes -* Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - * This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. -* Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 -* CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 -* fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -* chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 - +- Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 + - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. +- Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 +- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 +- fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +- chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 ## New Contributors -* @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -* Thanks to @larsentom for the mutation idea + +- @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 +- Thanks to @larsentom for the mutation idea **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 @@ -203,8 +205,8 @@ for Enzyme.jl (though Enzyme support is highly experimental). [Diff since v0.24.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.4...v0.24.5) - **Merged pull requests:** + - ci: split up test suite into multiple runners (#311) (@MilesCranmer) - chore(deps): bump julia-actions/cache from 1 to 2 (#315) (@dependabot[bot]) - CompatHelper: bump compat for DynamicQuantities to 0.14, (keep existing compat) (#317) (@github-actions[bot]) @@ -220,8 +222,8 @@ for Enzyme.jl (though Enzyme support is highly experimental). [Diff since v0.24.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.3...v0.24.4) - **Merged pull requests:** + - feat: use `?` for wildcard units instead of `⋅` (#307) (@MilesCranmer) - refactor: fix some more type instabilities (#308) (@MilesCranmer) - refactor: remove unused Tricks dependency (#309) (@MilesCranmer) @@ -233,11 +235,12 @@ for Enzyme.jl (though Enzyme support is highly experimental). [Diff since v0.24.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.2...v0.24.3) - **Merged pull requests:** + - 40% speedup (for default settings) via more parallelism inside workers (#304) (@MilesCranmer) **Closed issues:** + - Silence warnings for Optim.jl (#255) # SymbolicRegression.jl v0.24.2 @@ -246,25 +249,26 @@ for Enzyme.jl (though Enzyme support is highly experimental). [Diff since v0.24.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.1...v0.24.2) - **Merged pull requests:** + - Bump julia-actions/setup-julia from 1 to 2 (#300) (@dependabot[bot]) - [pre-commit.ci] pre-commit autoupdate (#301) (@pre-commit-ci[bot]) - A small update on examples.md for 1-based indexing (#302) (@liuyxpp) - Fixes for Julia 1.11 (#303) (@MilesCranmer) **Closed issues:** + - API Overhaul (#187) -- [Feature]: Training on high dimensions X (#299) +- [Feature]: Training on high dimensions X (#299) # SymbolicRegression.jl v0.24.1 ## What's Changed -* CompatHelper: bump compat for MLJModelInterface to 1.9, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/295 -* CompatHelper: bump compat for ProgressBars to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/294 -* Ensure we load ClusterManagers.jl on workers by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/297 -* Move test dependencies to test folder by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/298 +- CompatHelper: bump compat for MLJModelInterface to 1.9, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/295 +- CompatHelper: bump compat for ProgressBars to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/294 +- Ensure we load ClusterManagers.jl on workers by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/297 +- Move test dependencies to test folder by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/298 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.0...v0.24.1 @@ -272,25 +276,28 @@ for Enzyme.jl (though Enzyme support is highly experimental). ## What's Changed -* Experimental support for program synthesis / graph-like expressions instead of trees (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/271) - * **BREAKING**: many types now have a third type parameter, declaring the type of node. For example, `PopMember{T,L}` is now `PopMember{T,L,N}` for `N` the type of expression. - * Can now specify a `node_type` in creation of `Options`. This `node_type <: AbstractExpressionNode` can be a `GraphNode` which will result in expressions that care share nodes – and therefore have a lower complexity. - * Two new mutations: `form_connection` and `break_connection` – which control the merging and breaking of shared nodes in expressions. These are experimental. -* **BREAKING**: The `Dataset` struct has had many of its field declared immutable (for memory safety). If you had relied on the mutability of the struct to set parameters after initializing it, you will need to modify your code. -* **BREAKING**: LoopVectorization.jl moved to a package extension. Need to install it separately (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287). -* **DEPRECATED**: Now prefer to use new keyword-based constructors for nodes: +- Experimental support for program synthesis / graph-like expressions instead of trees (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/271) + - **BREAKING**: many types now have a third type parameter, declaring the type of node. For example, `PopMember{T,L}` is now `PopMember{T,L,N}` for `N` the type of expression. + - Can now specify a `node_type` in creation of `Options`. This `node_type <: AbstractExpressionNode` can be a `GraphNode` which will result in expressions that care share nodes – and therefore have a lower complexity. + - Two new mutations: `form_connection` and `break_connection` – which control the merging and breaking of shared nodes in expressions. These are experimental. +- **BREAKING**: The `Dataset` struct has had many of its field declared immutable (for memory safety). If you had relied on the mutability of the struct to set parameters after initializing it, you will need to modify your code. +- **BREAKING**: LoopVectorization.jl moved to a package extension. Need to install it separately (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287). +- **DEPRECATED**: Now prefer to use new keyword-based constructors for nodes: + ```julia Node{T}(feature=...) # leaf referencing a particular feature column Node{T}(val=...) # constant value leaf Node{T}(op=1, l=x1) # operator unary node, using the 1st unary operator Node{T}(op=1, l=x1, r=1.5) # binary unary node, using the 1st binary operator ``` + rather than the previous constructors `Node(op, l, r)` and `Node(T; val=...)` (though those will still work; just with a `depwarn`). -* Bumper.jl support added. Passing `bumper=true` to `Options()` will result in using bump-allocation for evaluation which can get speeds equivalent to LoopVectorization and sometimes even better due to better management of allocations. (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287) -* Upgraded Optim.jl to 1.9. -* Upgraded DynamicQuantities to 0.13 -* Upgraded DynamicExpressions to 0.16 -* The main search loop has been greatly refactored for readability and improved type inference. It now looks like this (down from a monolithic ~1000 line function) + +- Bumper.jl support added. Passing `bumper=true` to `Options()` will result in using bump-allocation for evaluation which can get speeds equivalent to LoopVectorization and sometimes even better due to better management of allocations. (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/287) +- Upgraded Optim.jl to 1.9. +- Upgraded DynamicQuantities to 0.13 +- Upgraded DynamicExpressions to 0.16 +- The main search loop has been greatly refactored for readability and improved type inference. It now looks like this (down from a monolithic ~1000 line function) ```julia function _equation_search( @@ -306,9 +313,6 @@ function _equation_search( end ``` - - - **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.3...v0.24.0 # SymbolicRegression.jl v0.23.3 @@ -317,8 +321,8 @@ end [Diff since v0.23.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.2...v0.23.3) - **Merged pull requests:** + - Bump peter-evans/create-or-update-comment from 3 to 4 (#283) (@dependabot[bot]) - Bump peter-evans/find-comment from 2 to 3 (#284) (@dependabot[bot]) - Bump peter-evans/create-pull-request from 5 to 6 (#286) (@dependabot[bot]) @@ -329,23 +333,26 @@ end [Diff since v0.23.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.1...v0.23.2) - **Merged pull requests:** + - Formatting overhaul (#278) (@MilesCranmer) - Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer) - Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer) **Closed issues:** + - Garbage collection too passive on worker processes (#237) - How can I set the maximum number of nests? (#285) # SymbolicRegression.jl v0.23.1 ## What's Changed -* Implement swap operands mutation for binary operators by @foxtran in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 + +- Implement swap operands mutation for binary operators by @foxtran in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 ## New Contributors -* @foxtran made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 + +- @foxtran made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.0...v0.23.1 @@ -355,11 +362,12 @@ end [Diff since v0.22.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.5...v0.23.0) - **Merged pull requests:** + - Automatically set heap size hint on workers (#270) (@MilesCranmer) **Closed issues:** + - How do I set up a basis function consisting of three different inputs x, y, z? (#268) # SymbolicRegression.jl v0.22.5 @@ -368,8 +376,8 @@ end [Diff since v0.22.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.4...v0.22.5) - **Merged pull requests:** + - CompatHelper: bump compat for DynamicQuantities to 0.7, (keep existing compat) (#259) (@github-actions[bot]) - Create `cond` operator (#260) (@MilesCranmer) - Add `[compat]` entry for Documenter (#261) (@MilesCranmer) @@ -381,21 +389,20 @@ end [Diff since v0.22.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.3...v0.22.4) - - **Merged pull requests:** + - Hotfix for breaking change in Optim.jl (#256) (@MilesCranmer) - Fix worldage issues by avoiding `static_hasmethod` when not needed (#258) (@MilesCranmer) # SymbolicRegression.jl v0.22.3 ## What's Changed -* CompatHelper: bump compat for DynamicExpressions to 0.13, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/250 -* Fix type stability of deterministic mode by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/251 -* Faster random sampling of nodes by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/252 -* Faster copying of `MutationWeights` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/253 -* Hotfix for breaking change in Optim.jl by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/256 +- CompatHelper: bump compat for DynamicExpressions to 0.13, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/250 +- Fix type stability of deterministic mode by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/251 +- Faster random sampling of nodes by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/252 +- Faster copying of `MutationWeights` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/253 +- Hotfix for breaking change in Optim.jl by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/256 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.2...v0.22.3 @@ -405,9 +412,8 @@ end [Diff since v0.22.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.1...v0.22.2) - - **Merged pull requests:** + - Expand aqua test suite (#246) (@MilesCranmer) - Return more descriptive errors for poorly defined operators (#247) (@MilesCranmer) @@ -420,30 +426,30 @@ end # SymbolicRegression.jl v0.22.0 ## What's Changed -* (**Algorithm modification**) Evaluate on fixed batch when building per-population hall of fame in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/243 - * This only affects searches that use `batching=true`. It results in improved searches on large datasets, as the "winning expression" is not biased towards an expression that landed on a lucky batch. - * Note that this only occurs within an iteration. Evaluation on the entire dataset still happens at the end of an iteration and those loss measurements are used for absolute comparison between expressions. -* (**Algorithm modification**) Deprecates the `fast_cycle` feature in #243. Use of this parameter will have no effect. - * Was removed to ease maintenance burden and because it doesn't have a use. This feature was created early on in development as a way to get parallelism within a population. It is no longer useful as you can parallelize across populations. -* Add Aqua.jl to test suite in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/245 -* CompatHelper: bump compat for DynamicExpressions to 0.12, (keep existing compat) in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/242 - * Is able to avoids method invalidations when using operators to construct expressions manually by modifying a global constant mapping of operator => index, rather than `@eval`-ing new operators. - * This only matters if you were using operators to build trees, like `x1 + x2`. All internal search code uses `Node()` explicitly to build expressions, so did not rely on method invalidation at any point. -* Renames some parameters in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/234 - * `npop` => `population_size` - * `npopulations` => `populations` - * This is just to match PySR's API. Also note that the deprecated parameters will still work, and there will not be a warning unless you are running with `--depwarn=yes`. -* Ensure that `predict` uses units if trained with them in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/244 - * If you train on a dataset that has physical units, this ensures that `MLJ.predict` will output predictions in the same units. Before this change, `MLJ.predict` would return numerical arrays with no units. +- (**Algorithm modification**) Evaluate on fixed batch when building per-population hall of fame in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/243 + - This only affects searches that use `batching=true`. It results in improved searches on large datasets, as the "winning expression" is not biased towards an expression that landed on a lucky batch. + - Note that this only occurs within an iteration. Evaluation on the entire dataset still happens at the end of an iteration and those loss measurements are used for absolute comparison between expressions. +- (**Algorithm modification**) Deprecates the `fast_cycle` feature in #243. Use of this parameter will have no effect. + - Was removed to ease maintenance burden and because it doesn't have a use. This feature was created early on in development as a way to get parallelism within a population. It is no longer useful as you can parallelize across populations. +- Add Aqua.jl to test suite in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/245 +- CompatHelper: bump compat for DynamicExpressions to 0.12, (keep existing compat) in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/242 + - Is able to avoids method invalidations when using operators to construct expressions manually by modifying a global constant mapping of operator => index, rather than `@eval`-ing new operators. + - This only matters if you were using operators to build trees, like `x1 + x2`. All internal search code uses `Node()` explicitly to build expressions, so did not rely on method invalidation at any point. +- Renames some parameters in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/234 + - `npop` => `population_size` + - `npopulations` => `populations` + - This is just to match PySR's API. Also note that the deprecated parameters will still work, and there will not be a warning unless you are running with `--depwarn=yes`. +- Ensure that `predict` uses units if trained with them in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/244 + - If you train on a dataset that has physical units, this ensures that `MLJ.predict` will output predictions in the same units. Before this change, `MLJ.predict` would return numerical arrays with no units. **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.5...v0.22.0 # SymbolicRegression.jl v0.21.5 ## What's Changed -* Allow custom display variable names by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/240 +- Allow custom display variable names by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/240 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.4...v0.21.5 @@ -453,60 +459,62 @@ end [Diff since v0.21.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.3...v0.21.4) - **Closed issues:** + - [Cleanup] Better implementation of batching (#88) **Merged pull requests:** + - CompatHelper: bump compat for LossFunctions to 0.11, (keep existing compat) (#238) (@github-actions[bot]) - Enable compatibility with MLJTuning.jl (#239) (@MilesCranmer) # SymbolicRegression.jl v0.21.3 ## What's Changed -* Batching inside optimization loop + batching support for custom objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/235 +- Batching inside optimization loop + batching support for custom objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/235 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.2...v0.21.3 # SymbolicRegression.jl v0.21.2 ## What's Changed -* Allow empty string units (==1) by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/233 +- Allow empty string units (==1) by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/233 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.1...v0.21.2 # SymbolicRegression.jl v0.21.1 ## What's Changed -* Update DynamicExpressions.jl version by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/232 - * Makes Zygote.jl an extension +- Update DynamicExpressions.jl version by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/232 + - Makes Zygote.jl an extension **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.0...v0.21.1 # SymbolicRegression.jl v0.21.0 ## What's Changed -* https://github.com/MilesCranmer/SymbolicRegression.jl/pull/228 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/230 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/231 - - **Dimensional analysis** (#228) - - Allows you to (softly) constrain discovered expressions to those that respect physical dimensions - - Pass vectors of DynamicQuantities.jl `Quantity` type to the MLJ interface. - - OR, specify `X_units`, `y_units` to low-level `equation_search`. - - **Printing improvements** (#228) - - By default, only 5 significant digits are now printed, rather than the entire float. You can change this with the `print_precision` option. - - In the default printed equations, `x₁` is used rather than `x1`. - - `y = ` is printed at the start (or `y₁ = ` for multi-output). With units this becomes, for example, `y[kg] =`. - - **Misc** - - Easier to convert from MLJ interface to SymbolicUtils (via `node_to_symbolic(::Node, ::AbstractSRRegressor)`) (#228) - - Improved precompilation (#228) - - Various performance and type stability improvements (#228) - - Inlined the recording option to speedup compilation (#230) - - Updated tutorials to use MLJ rather than low-level interface (#228) - - Moved JSON3.jl to extension (#231) - - Use PackageExtensionsCompat.jl over Requires.jl (#231) - - Require LossFunctions.jl to be 0.10 (#231) + +- https://github.com/MilesCranmer/SymbolicRegression.jl/pull/228 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/230 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/231 + - **Dimensional analysis** (#228) + - Allows you to (softly) constrain discovered expressions to those that respect physical dimensions + - Pass vectors of DynamicQuantities.jl `Quantity` type to the MLJ interface. + - OR, specify `X_units`, `y_units` to low-level `equation_search`. + - **Printing improvements** (#228) + - By default, only 5 significant digits are now printed, rather than the entire float. You can change this with the `print_precision` option. + - In the default printed equations, `x₁` is used rather than `x1`. + - `y = ` is printed at the start (or `y₁ = ` for multi-output). With units this becomes, for example, `y[kg] =`. + - **Misc** + - Easier to convert from MLJ interface to SymbolicUtils (via `node_to_symbolic(::Node, ::AbstractSRRegressor)`) (#228) + - Improved precompilation (#228) + - Various performance and type stability improvements (#228) + - Inlined the recording option to speedup compilation (#230) + - Updated tutorials to use MLJ rather than low-level interface (#228) + - Moved JSON3.jl to extension (#231) + - Use PackageExtensionsCompat.jl over Requires.jl (#231) + - Require LossFunctions.jl to be 0.10 (#231) **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.20.0...v0.21.0 @@ -516,11 +524,12 @@ end [Diff since v0.19.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.1...v0.20.0) - **Closed issues:** + - [Feature]: MLJ integration (#225) **Merged pull requests:** + - MLJ Integration (#226) (@MilesCranmer, @OkonSamuel) # SymbolicRegression.jl v0.19.1 @@ -529,9 +538,8 @@ end [Diff since v0.19.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.0...v0.19.1) - - **Merged pull requests:** + - CompatHelper: bump compat for StatsBase to 0.34, (keep existing compat) (#202) (@github-actions[bot]) - (Soft deprecation) change `varMap` to `variable_names` (#219) (@MilesCranmer) - (Soft deprecation) rename `EquationSearch` to `equation_search` (#222) (@MilesCranmer) @@ -540,10 +548,11 @@ end # SymbolicRegression.jl v0.19.0 ## What's Changed -* Time to load improved by 40% with the following changes in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/215 - * Moved SymbolicUtils.jl to extension/Requires.jl - * Removed StaticArrays.jl as a dependency and implement tiny version of MVector - * Removed `@generated` functions + +- Time to load improved by 40% with the following changes in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/215 + - Moved SymbolicUtils.jl to extension/Requires.jl + - Removed StaticArrays.jl as a dependency and implement tiny version of MVector + - Removed `@generated` functions **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.18.0...v0.19.0 @@ -553,9 +562,8 @@ end [Diff since v0.17.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.1...v0.18.0) - - **Merged pull requests:** + - Overload ^ if user passes explicitly (#201) (@MilesCranmer) - Upgrade DynamicExpressions to 0.8; LossFunctions to 0.10 (#206) (@github-actions[bot]) - Show expressions evaluated per second (#209) (@MilesCranmer) @@ -567,9 +575,8 @@ end [Diff since v0.17.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.0...v0.17.1) - - **Merged pull requests:** + - Faster custom losses (#197) (@MilesCranmer) - Migrate from SnoopPrecompile to PrecompileTools (#198) (@timholy) @@ -579,11 +586,12 @@ end [Diff since v0.16.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.3...v0.17.0) - **Closed issues:** + - troubles in pysr.install() (#196) **Merged pull requests:** + - Multiple refactors: arbitrary data in `Dataset`, separate mutation weight conditioning, fix data races, cleaner API (#190) (@MilesCranmer) - CompatHelper: bump compat for DynamicExpressions to 0.6, (keep existing compat) (#194) (@github-actions[bot]) @@ -593,16 +601,15 @@ end [Diff since v0.16.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.2...v0.16.3) - - **Merged pull requests:** + - CompatHelper: bump compat for SymbolicUtils to 1, (keep existing compat) (#168) (@github-actions[bot]) # SymbolicRegression.jl v0.16.2 ## What's Changed -* Turn off simplification when constraints given by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/189 +- Turn off simplification when constraints given by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/189 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.1...v0.16.2 @@ -616,13 +623,14 @@ end [Diff since v0.15.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.3...v0.16.0) - **Closed issues:** + - Partially fixed trees (#166) - Settings of `addprocs` (#180) - Equation printout should split into multiple lines (#182) **Merged pull requests:** + - Force safe closing of threads (#175) (@MilesCranmer) - Abstract number support (#183) (@MilesCranmer) - Include datetime in default filename (#185) (@MilesCranmer) @@ -630,34 +638,34 @@ end # SymbolicRegression.jl v0.15.3 ## What's Changed -* Re-compute losses for warm start by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/177 +- Re-compute losses for warm start by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/177 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.2...v0.15.3 # SymbolicRegression.jl v0.15.2 ## What's Changed -* Include depth check in `check_constraints` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/172 -* Fix data race in state saving by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/173 +- Include depth check in `check_constraints` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/172 +- Fix data race in state saving by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/173 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.1...v0.15.2 # SymbolicRegression.jl v0.15.1 ## What's Changed -* Fix bug in constraint checking by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/171 +- Fix bug in constraint checking by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/171 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.0...v0.15.1 # SymbolicRegression.jl v0.15.0 ## What's Changed -* Fully-customizable training objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/143 -* Safely catch non-readable stdin stream by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/169 +- Fully-customizable training objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/143 +- Safely catch non-readable stdin stream by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/169 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.5...v0.15.0 @@ -667,11 +675,12 @@ end [Diff since v0.14.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.4...v0.14.5) - **Closed issues:** + - Large test output (#159) **Merged pull requests:** + - Quiet progress bar during CI (#160) (@MilesCranmer) - Proper SnoopCompilation (#161) (@MilesCranmer) @@ -681,9 +690,8 @@ end [Diff since v0.14.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.3...v0.14.4) - - **Merged pull requests:** + - Refactor monitoring of resources (#158) (@MilesCranmer) # SymbolicRegression.jl v0.14.3 @@ -692,9 +700,8 @@ end [Diff since v0.14.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.2...v0.14.3) - - **Merged pull requests:** + - Turn off safe operators for turbo=true (#156) (@MilesCranmer) - Use `ProgressBars.jl` instead of copying (#157) (@MilesCranmer) @@ -710,9 +717,8 @@ end [Diff since v0.14.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.0...v0.14.1) - - **Merged pull requests:** + - Do optimizations as a low-probability mutation (#154) (@MilesCranmer) # SymbolicRegression.jl v0.14.0 @@ -721,9 +727,8 @@ end [Diff since v0.13.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.3...v0.14.0) - - **Merged pull requests:** + - Add `@extend_operators` from DynamicExpressions.jl v0.4.0 (#153) (@MilesCranmer) # SymbolicRegression.jl v0.13.3 @@ -732,9 +737,8 @@ end [Diff since v0.13.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.3) - - **Merged pull requests:** + - 30% speed up by using LoopVectorization in DynamicExpressions.jl (#151) (@MilesCranmer) # SymbolicRegression.jl v0.13.2 @@ -749,9 +753,8 @@ end [Diff since v0.13.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.0...v0.13.1) - - **Merged pull requests:** + - Refactor mutation probabilities (#140) (@MilesCranmer) # SymbolicRegression.jl v0.13.0 @@ -760,9 +763,8 @@ end [Diff since v0.12.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.6...v0.13.0) - - **Merged pull requests:** + - Split codebase in two: DynamicExpressions.jl and SymbolicRegression.jl (#147) (@MilesCranmer) # SymbolicRegression.jl v0.12.6 @@ -771,12 +773,13 @@ end [Diff since v0.12.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.5...v0.12.6) - **Closed issues:** + - [Feature] Integration of Existing Knowledge (#139) - Search fidelity is much worse after v0.12.3 (#148) **Merged pull requests:** + - Fix search performance problem #148 (#149) (@MilesCranmer) # SymbolicRegression.jl v0.12.5 @@ -791,9 +794,8 @@ end [Diff since v0.12.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.3...v0.12.4) - - **Merged pull requests:** + - Create logo (#145) (@MilesCranmer) # SymbolicRegression.jl v0.12.3 @@ -802,9 +804,8 @@ end [Diff since v0.12.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.2...v0.12.3) - - **Merged pull requests:** + - Even faster evaluation (#144) (@MilesCranmer) # SymbolicRegression.jl v0.12.2 @@ -813,11 +814,12 @@ end [Diff since v0.12.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.1...v0.12.2) - **Closed issues:** + - How to fix a number of variables in predicted equations (#130) **Merged pull requests:** + - Fast evaluation for constant trees (#129) (@MilesCranmer) # SymbolicRegression.jl v0.12.1 @@ -829,25 +831,27 @@ end # SymbolicRegression.jl v0.12.0 ## What's Changed -* Use functions returning NaN on branch cuts instead of abs (issue #109) by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/123 - * By returning NaN, an expression will have infinite loss - this will make the expression search simply avoid expressions that hit out-of-domain errors, rather than using `abs` everywhere which results in fundamentally different functional forms. -* Generalize `Node{T}` type to non-floats by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/122 - * Will eventually enable integer-only expression searches +- Use functions returning NaN on branch cuts instead of abs (issue #109) by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/123 + - By returning NaN, an expression will have infinite loss - this will make the expression search simply avoid expressions that hit out-of-domain errors, rather than using `abs` everywhere which results in fundamentally different functional forms. +- Generalize `Node{T}` type to non-floats by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/122 + - Will eventually enable integer-only expression searches **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.11.1...v0.12.0 # SymbolicRegression.jl v0.11.1 ## What's Changed -* Generalize expressions to have arbitrary constant types by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/119 -* Optimizer options by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 -* Fix recorder when `Inf` appears as loss for expression -* Fix normalization when dataset has zero variance: https://github.com/MilesCranmer/SymbolicRegression.jl/commit/85f4909e8156ba8ff6cf89122371901a13df5688 -* Set default parsimony to 0.0 + +- Generalize expressions to have arbitrary constant types by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/119 +- Optimizer options by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 +- Fix recorder when `Inf` appears as loss for expression +- Fix normalization when dataset has zero variance: https://github.com/MilesCranmer/SymbolicRegression.jl/commit/85f4909e8156ba8ff6cf89122371901a13df5688 +- Set default parsimony to 0.0 ## New Contributors -* @johanbluecreek made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 + +- @johanbluecreek made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.10.2...v0.11.1 @@ -857,9 +861,8 @@ end [Diff since v0.9.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.7...v0.10.2) - - **Merged pull requests:** + - Update losses.md (#114) (@pitmonticone) - Set `timeout-minutes` for CI (#116) (@rikhuijzer) @@ -878,8 +881,9 @@ end # SymbolicRegression.jl v0.9.5 ## What's Changed -* Add deterministic option in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/108 -* Fix issue with infinite while loop due to numerical precision + +- Add deterministic option in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/108 +- Fix issue with infinite while loop due to numerical precision **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.3...v0.9.5 @@ -889,9 +893,8 @@ end [Diff since v0.9.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.2...v0.9.3) - - **Merged pull requests:** + - CompatHelper: bump compat for LossFunctions to 0.8, (keep existing compat) (#106) (@github-actions[bot]) # SymbolicRegression.jl v0.9.2 @@ -900,13 +903,14 @@ end [Diff since v0.9.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.0...v0.9.2) - **Closed issues:** + - Q : recording # of function calls (#74) - Mangled name from @FromFile displayed in docs (#78) - Consistent snake_case vs CamelCase (#85) **Merged pull requests:** + - Apply Blue formatting + change all internal methods to snake_case (#100) (@MilesCranmer) - Limiting max evaluations (#104) (@MilesCranmer) - Custom complexities of operators, variables, and constants (#105) (@MilesCranmer) @@ -917,11 +921,12 @@ end [Diff since v0.8.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.7...v0.9.0) - **Closed issues:** + - Update SymbolicUtils (#98) **Merged pull requests:** + - Bump SymbolicUtils.jl to 0.19 (#84) (@ChrisRackauckas) # SymbolicRegression.jl v0.8.7 @@ -936,9 +941,8 @@ end [Diff since v0.8.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.5...v0.8.6) - - **Merged pull requests:** + - Switch from FromFile.jl to traditional module system (#95) (@MilesCranmer) - Add constraints on the number of times operators can be nested (#96) (@MilesCranmer) @@ -948,12 +952,13 @@ end [Diff since v0.8.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.3...v0.8.5) - **Closed issues:** + - [CLEANUP] Default settings (#72) - forcing variables to regression (#87) **Merged pull requests:** + - Autodiff for equations (#39) (@kazewong) - fix worker connection timeout error (#91) (@CharFox1) - Automatic multi-node compute setup by passing custom `addprocs` (#94) (@MilesCranmer) @@ -970,8 +975,8 @@ end [Diff since v0.8.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.1...v0.8.2) - **Closed issues:** + - Interactive regression / printing epochs (#80) # SymbolicRegression.jl v0.8.1 @@ -980,12 +985,13 @@ end [Diff since v0.7.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.13...v0.8.1) - **Closed issues:** + - [BUG] Domain errors (#71) - [Performance] Single evaluation results (#73) **Merged pull requests:** + - Refactoring PopMember + adding adaptive parsimony to tournament (#75) (@MilesCranmer) - Introduce better default hyperparameters (#76) (@MilesCranmer) @@ -1013,11 +1019,12 @@ end [Diff since v0.7.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.7...v0.7.8) - **Closed issues:** + - Tournament selection p (#68) **Merged pull requests:** + - Fix tournament samples (#70) (@MilesCranmer) # SymbolicRegression.jl v0.7.7 @@ -1032,8 +1039,8 @@ end [Diff since v0.7.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.5...v0.7.6) - **Closed issues:** + - Parsimony interference in pareto frontier (#66) - DomainError when computing pareto curve (#67) @@ -1049,8 +1056,8 @@ end [Diff since v0.7.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.3...v0.7.4) - **Closed issues:** + - Base.print (#64) # SymbolicRegression.jl v0.7.3 @@ -1071,9 +1078,8 @@ end [Diff since v0.7.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.0...v0.7.1) - - **Merged pull requests:** + - CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#56) (@github-actions[bot]) # SymbolicRegression.jl v0.7.0 @@ -1082,11 +1088,12 @@ end [Diff since v0.6.19](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.19...v0.7.0) - **Closed issues:** + - Switching from Float to UInt8 ? (#58) **Merged pull requests:** + - Revert to SymbolicUtils.jl 0.6 (#60) (@MilesCranmer) # SymbolicRegression.jl v0.6.19 @@ -1107,8 +1114,8 @@ end [Diff since v0.6.16](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.16...v0.6.17) - **Closed issues:** + - Can't define options as listed in Tutorial, causes Method Error. (#54) - Using recorder to only track specific information? (#55) @@ -1118,9 +1125,8 @@ end [Diff since v0.6.15](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.15...v0.6.16) - - **Merged pull requests:** + - Expand compatibility to other SymbolicUtils.jl versions (#53) (@MilesCranmer) # SymbolicRegression.jl v0.6.15 @@ -1129,11 +1135,12 @@ end [Diff since v0.6.14](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.14...v0.6.15) - **Closed issues:** + - Unsatisfiable requirements detected for package SymbolicUtils (#51) **Merged pull requests:** + - SymbolicUtils v0.18 (#50) (@AlCap23) # SymbolicRegression.jl v0.6.14 @@ -1142,8 +1149,8 @@ end [Diff since v0.6.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.13...v0.6.14) - **Closed issues:** + - nested task error (#43) - MethodError: Cannot `convert` an object of type SymbolicUtils.Term{Number, Nothing} to an object of type SymbolicUtils.Pow{Number, SymbolicUtils.Term{Number, Nothing}, Float32, Nothing} (#44) @@ -1159,11 +1166,12 @@ end [Diff since v0.6.11](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.11...v0.6.12) - **Closed issues:** + - Options.npopulations = nothing, does not detect number of cores (#38) **Merged pull requests:** + - Fix index functions in SymbolicUtils (#40) (@MilesCranmer) # SymbolicRegression.jl v0.6.11 @@ -1172,9 +1180,8 @@ end [Diff since v0.6.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.10...v0.6.11) - - **Merged pull requests:** + - Updates for SymbolicUtils 0.13 (#37) (@AlCap23) # SymbolicRegression.jl v0.6.10 @@ -1183,11 +1190,12 @@ end [Diff since v0.6.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.9...v0.6.10) - **Closed issues:** -- Saving equations throughout runtime (#33) + +- Saving equations throughout runtime (#33) **Merged pull requests:** + - Add multithreading as alternative to distributed (#34) (@MilesCranmer) - Allow infinities in recorder (#36) (@cobac) @@ -1239,8 +1247,8 @@ end [Diff since v0.6.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.1...v0.6.2) - **Closed issues:** + - Data recorder (#27) - Long-running parallel jobs have small percentage of processes hang (#28) @@ -1250,8 +1258,6 @@ end [Diff since v0.6.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.0...v0.6.1) - - **Merged pull requests:** -- Recorder and improved tournament selection (#29) (@MilesCranmer) +- Recorder and improved tournament selection (#29) (@MilesCranmer) From ba2944f1ad61aefe0b2ba302c0928aca09e420e8 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:05:51 +0100 Subject: [PATCH 03/74] docs: make all headings subheadings --- CHANGELOG.md | 428 ++++++++++++++++++++++++++------------------------- 1 file changed, 215 insertions(+), 213 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8c7170f12..ec25a0abb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,8 @@ -# SymbolicRegression.jl v1.0.0 +# Changelog -## Summary of major recent changes +## SymbolicRegression.jl v1.0.0 + +### Summary of major recent changes - Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` - This gives us new features, improves user hackability, and greatly improves ergonomics! @@ -27,9 +29,9 @@ - Increased documentation and examples. - Julia 1.10 is now the minimum supported Julia version. -## Major Changes +### Major Changes -### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` +#### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -45,7 +47,7 @@ operators = options.operators variable_names = ["x1", "x2", "x3"] x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3] -# Use the operators directly! +## Use the operators directly! tree = cos(x1 - 3.2 * x2) - x1 * x1 ``` @@ -64,13 +66,13 @@ Each time you use an operator on or between two `Expression`s that include the o You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). -### Customizing behavior +#### Customizing behavior DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. -### Auto-diff within optimization +#### Auto-diff within optimization Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. @@ -92,7 +94,7 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). -## Other Changes +### Other Changes - Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. @@ -101,14 +103,14 @@ for Enzyme.jl (though Enzyme support is highly experimental). - fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 - chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 -## New Contributors +### New Contributors - @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 - Thanks to @larsentom for the mutation idea **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 -# SymbolicRegression.jl v1.0.0-beta1 +## SymbolicRegression.jl v1.0.0-beta1 This is a **_beta release_** that is not yet registered. To try it out, open a Julia REPL and hit `]`, then: @@ -118,9 +120,9 @@ pkg> add SymbolicRegression#v1.0.0-beta1 Before the final release of v1.0.0, the hyperparameters will be re-tuned to optimize the new mutations: `swap_operands` and `rotate_tree`, which seem to be quite effective. -## Major Changes +### Major Changes -### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` +#### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -136,7 +138,7 @@ operators = options.operators variable_names = ["x1", "x2", "x3"] x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3] -# Use the operators directly! +## Use the operators directly! tree = cos(x1 - 3.2 * x2) - x1 * x1 ``` @@ -155,13 +157,13 @@ Each time you use an operator on or between two `Expression`s that include the o You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). -### Customizing behavior +#### Customizing behavior DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. -### Auto-diff within optimization +#### Auto-diff within optimization Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. @@ -183,7 +185,7 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). -## Other Changes +### Other Changes - Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. @@ -192,16 +194,16 @@ for Enzyme.jl (though Enzyme support is highly experimental). - fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 - chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 -## New Contributors +### New Contributors - @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 - Thanks to @larsentom for the mutation idea **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 -# SymbolicRegression.jl v0.24.5 +## SymbolicRegression.jl v0.24.5 -## SymbolicRegression v0.24.5 +### SymbolicRegression v0.24.5 [Diff since v0.24.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.4...v0.24.5) @@ -216,9 +218,9 @@ for Enzyme.jl (though Enzyme support is highly experimental). - Allow per-variable complexity (#324) (@MilesCranmer) - Refactor tests to use TestItems.jl (#325) (@MilesCranmer) -# SymbolicRegression.jl v0.24.4 +## SymbolicRegression.jl v0.24.4 -## SymbolicRegression v0.24.4 +### SymbolicRegression v0.24.4 [Diff since v0.24.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.3...v0.24.4) @@ -229,9 +231,9 @@ for Enzyme.jl (though Enzyme support is highly experimental). - refactor: remove unused Tricks dependency (#309) (@MilesCranmer) - Add option to force dimensionless constants (#310) (@MilesCranmer) -# SymbolicRegression.jl v0.24.3 +## SymbolicRegression.jl v0.24.3 -## SymbolicRegression v0.24.3 +### SymbolicRegression v0.24.3 [Diff since v0.24.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.2...v0.24.3) @@ -243,9 +245,9 @@ for Enzyme.jl (though Enzyme support is highly experimental). - Silence warnings for Optim.jl (#255) -# SymbolicRegression.jl v0.24.2 +## SymbolicRegression.jl v0.24.2 -## SymbolicRegression v0.24.2 +### SymbolicRegression v0.24.2 [Diff since v0.24.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.1...v0.24.2) @@ -261,9 +263,9 @@ for Enzyme.jl (though Enzyme support is highly experimental). - API Overhaul (#187) - [Feature]: Training on high dimensions X (#299) -# SymbolicRegression.jl v0.24.1 +## SymbolicRegression.jl v0.24.1 -## What's Changed +### What's Changed - CompatHelper: bump compat for MLJModelInterface to 1.9, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/295 - CompatHelper: bump compat for ProgressBars to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/294 @@ -272,9 +274,9 @@ for Enzyme.jl (though Enzyme support is highly experimental). **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.0...v0.24.1 -# SymbolicRegression.jl v0.24.0 +## SymbolicRegression.jl v0.24.0 -## What's Changed +### What's Changed - Experimental support for program synthesis / graph-like expressions instead of trees (https://github.com/MilesCranmer/SymbolicRegression.jl/pull/271) - **BREAKING**: many types now have a third type parameter, declaring the type of node. For example, `PopMember{T,L}` is now `PopMember{T,L,N}` for `N` the type of expression. @@ -315,9 +317,9 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.3...v0.24.0 -# SymbolicRegression.jl v0.23.3 +## SymbolicRegression.jl v0.23.3 -## SymbolicRegression v0.23.3 +### SymbolicRegression v0.23.3 [Diff since v0.23.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.2...v0.23.3) @@ -327,9 +329,9 @@ end - Bump peter-evans/find-comment from 2 to 3 (#284) (@dependabot[bot]) - Bump peter-evans/create-pull-request from 5 to 6 (#286) (@dependabot[bot]) -# SymbolicRegression.jl v0.23.2 +## SymbolicRegression.jl v0.23.2 -## SymbolicRegression v0.23.2 +### SymbolicRegression v0.23.2 [Diff since v0.23.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.1...v0.23.2) @@ -344,21 +346,21 @@ end - Garbage collection too passive on worker processes (#237) - How can I set the maximum number of nests? (#285) -# SymbolicRegression.jl v0.23.1 +## SymbolicRegression.jl v0.23.1 -## What's Changed +### What's Changed - Implement swap operands mutation for binary operators by @foxtran in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 -## New Contributors +### New Contributors - @foxtran made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/276 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.23.0...v0.23.1 -# SymbolicRegression.jl v0.23.0 +## SymbolicRegression.jl v0.23.0 -## SymbolicRegression v0.23.0 +### SymbolicRegression v0.23.0 [Diff since v0.22.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.5...v0.23.0) @@ -370,9 +372,9 @@ end - How do I set up a basis function consisting of three different inputs x, y, z? (#268) -# SymbolicRegression.jl v0.22.5 +## SymbolicRegression.jl v0.22.5 -## SymbolicRegression v0.22.5 +### SymbolicRegression v0.22.5 [Diff since v0.22.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.4...v0.22.5) @@ -383,9 +385,9 @@ end - Add `[compat]` entry for Documenter (#261) (@MilesCranmer) - CompatHelper: bump compat for DynamicQuantities to 0.10 (#264) (@github-actions[bot]) -# SymbolicRegression.jl v0.22.4 +## SymbolicRegression.jl v0.22.4 -## SymbolicRegression v0.22.4 +### SymbolicRegression v0.22.4 [Diff since v0.22.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.3...v0.22.4) @@ -394,9 +396,9 @@ end - Hotfix for breaking change in Optim.jl (#256) (@MilesCranmer) - Fix worldage issues by avoiding `static_hasmethod` when not needed (#258) (@MilesCranmer) -# SymbolicRegression.jl v0.22.3 +## SymbolicRegression.jl v0.22.3 -## What's Changed +### What's Changed - CompatHelper: bump compat for DynamicExpressions to 0.13, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/250 - Fix type stability of deterministic mode by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/251 @@ -406,9 +408,9 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.2...v0.22.3 -# SymbolicRegression.jl v0.22.2 +## SymbolicRegression.jl v0.22.2 -## SymbolicRegression v0.22.2 +### SymbolicRegression v0.22.2 [Diff since v0.22.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.1...v0.22.2) @@ -417,15 +419,15 @@ end - Expand aqua test suite (#246) (@MilesCranmer) - Return more descriptive errors for poorly defined operators (#247) (@MilesCranmer) -# SymbolicRegression.jl v0.22.1 +## SymbolicRegression.jl v0.22.1 -## SymbolicRegression v0.22.1 +### SymbolicRegression v0.22.1 [Diff since v0.22.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.22.0...v0.22.1) -# SymbolicRegression.jl v0.22.0 +## SymbolicRegression.jl v0.22.0 -## What's Changed +### What's Changed - (**Algorithm modification**) Evaluate on fixed batch when building per-population hall of fame in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/243 - This only affects searches that use `batching=true`. It results in improved searches on large datasets, as the "winning expression" is not biased towards an expression that landed on a lucky batch. @@ -445,17 +447,17 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.5...v0.22.0 -# SymbolicRegression.jl v0.21.5 +## SymbolicRegression.jl v0.21.5 -## What's Changed +### What's Changed - Allow custom display variable names by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/240 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.4...v0.21.5 -# SymbolicRegression.jl v0.21.4 +## SymbolicRegression.jl v0.21.4 -## SymbolicRegression v0.21.4 +### SymbolicRegression v0.21.4 [Diff since v0.21.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.3...v0.21.4) @@ -468,34 +470,34 @@ end - CompatHelper: bump compat for LossFunctions to 0.11, (keep existing compat) (#238) (@github-actions[bot]) - Enable compatibility with MLJTuning.jl (#239) (@MilesCranmer) -# SymbolicRegression.jl v0.21.3 +## SymbolicRegression.jl v0.21.3 -## What's Changed +### What's Changed - Batching inside optimization loop + batching support for custom objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/235 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.2...v0.21.3 -# SymbolicRegression.jl v0.21.2 +## SymbolicRegression.jl v0.21.2 -## What's Changed +### What's Changed - Allow empty string units (==1) by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/233 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.1...v0.21.2 -# SymbolicRegression.jl v0.21.1 +## SymbolicRegression.jl v0.21.1 -## What's Changed +### What's Changed - Update DynamicExpressions.jl version by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/232 - Makes Zygote.jl an extension **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.21.0...v0.21.1 -# SymbolicRegression.jl v0.21.0 +## SymbolicRegression.jl v0.21.0 -## What's Changed +### What's Changed - https://github.com/MilesCranmer/SymbolicRegression.jl/pull/228 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/230 and https://github.com/MilesCranmer/SymbolicRegression.jl/pull/231 - **Dimensional analysis** (#228) @@ -518,9 +520,9 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.20.0...v0.21.0 -# SymbolicRegression.jl v0.20.0 +## SymbolicRegression.jl v0.20.0 -## SymbolicRegression v0.20.0 +### SymbolicRegression v0.20.0 [Diff since v0.19.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.1...v0.20.0) @@ -532,9 +534,9 @@ end - MLJ Integration (#226) (@MilesCranmer, @OkonSamuel) -# SymbolicRegression.jl v0.19.1 +## SymbolicRegression.jl v0.19.1 -## SymbolicRegression v0.19.1 +### SymbolicRegression v0.19.1 [Diff since v0.19.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.19.0...v0.19.1) @@ -545,9 +547,9 @@ end - (Soft deprecation) rename `EquationSearch` to `equation_search` (#222) (@MilesCranmer) - Fix equation splitting for unicode variables (#223) (@MilesCranmer) -# SymbolicRegression.jl v0.19.0 +## SymbolicRegression.jl v0.19.0 -## What's Changed +### What's Changed - Time to load improved by 40% with the following changes in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/215 - Moved SymbolicUtils.jl to extension/Requires.jl @@ -556,9 +558,9 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.18.0...v0.19.0 -# SymbolicRegression.jl v0.18.0 +## SymbolicRegression.jl v0.18.0 -## SymbolicRegression v0.18.0 +### SymbolicRegression v0.18.0 [Diff since v0.17.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.1...v0.18.0) @@ -569,9 +571,9 @@ end - Show expressions evaluated per second (#209) (@MilesCranmer) - Cache complexity of expressions whenever possible (#210) (@MilesCranmer) -# SymbolicRegression.jl v0.17.1 +## SymbolicRegression.jl v0.17.1 -## SymbolicRegression v0.17.1 +### SymbolicRegression v0.17.1 [Diff since v0.17.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.17.0...v0.17.1) @@ -580,9 +582,9 @@ end - Faster custom losses (#197) (@MilesCranmer) - Migrate from SnoopPrecompile to PrecompileTools (#198) (@timholy) -# SymbolicRegression.jl v0.17.0 +## SymbolicRegression.jl v0.17.0 -## SymbolicRegression v0.17.0 +### SymbolicRegression v0.17.0 [Diff since v0.16.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.3...v0.17.0) @@ -595,9 +597,9 @@ end - Multiple refactors: arbitrary data in `Dataset`, separate mutation weight conditioning, fix data races, cleaner API (#190) (@MilesCranmer) - CompatHelper: bump compat for DynamicExpressions to 0.6, (keep existing compat) (#194) (@github-actions[bot]) -# SymbolicRegression.jl v0.16.3 +## SymbolicRegression.jl v0.16.3 -## SymbolicRegression v0.16.3 +### SymbolicRegression v0.16.3 [Diff since v0.16.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.2...v0.16.3) @@ -605,21 +607,21 @@ end - CompatHelper: bump compat for SymbolicUtils to 1, (keep existing compat) (#168) (@github-actions[bot]) -# SymbolicRegression.jl v0.16.2 +## SymbolicRegression.jl v0.16.2 -## What's Changed +### What's Changed - Turn off simplification when constraints given by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/189 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.1...v0.16.2 -# SymbolicRegression.jl v0.16.1 +## SymbolicRegression.jl v0.16.1 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.16.0...v0.16.1 -# SymbolicRegression.jl v0.16.0 +## SymbolicRegression.jl v0.16.0 -## SymbolicRegression v0.16.0 +### SymbolicRegression v0.16.0 [Diff since v0.15.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.3...v0.16.0) @@ -635,43 +637,43 @@ end - Abstract number support (#183) (@MilesCranmer) - Include datetime in default filename (#185) (@MilesCranmer) -# SymbolicRegression.jl v0.15.3 +## SymbolicRegression.jl v0.15.3 -## What's Changed +### What's Changed - Re-compute losses for warm start by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/177 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.2...v0.15.3 -# SymbolicRegression.jl v0.15.2 +## SymbolicRegression.jl v0.15.2 -## What's Changed +### What's Changed - Include depth check in `check_constraints` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/172 - Fix data race in state saving by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/173 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.1...v0.15.2 -# SymbolicRegression.jl v0.15.1 +## SymbolicRegression.jl v0.15.1 -## What's Changed +### What's Changed - Fix bug in constraint checking by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/171 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.15.0...v0.15.1 -# SymbolicRegression.jl v0.15.0 +## SymbolicRegression.jl v0.15.0 -## What's Changed +### What's Changed - Fully-customizable training objectives by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/143 - Safely catch non-readable stdin stream by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/169 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.5...v0.15.0 -# SymbolicRegression.jl v0.14.5 +## SymbolicRegression.jl v0.14.5 -## SymbolicRegression v0.14.5 +### SymbolicRegression v0.14.5 [Diff since v0.14.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.4...v0.14.5) @@ -684,9 +686,9 @@ end - Quiet progress bar during CI (#160) (@MilesCranmer) - Proper SnoopCompilation (#161) (@MilesCranmer) -# SymbolicRegression.jl v0.14.4 +## SymbolicRegression.jl v0.14.4 -## SymbolicRegression v0.14.4 +### SymbolicRegression v0.14.4 [Diff since v0.14.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.3...v0.14.4) @@ -694,9 +696,9 @@ end - Refactor monitoring of resources (#158) (@MilesCranmer) -# SymbolicRegression.jl v0.14.3 +## SymbolicRegression.jl v0.14.3 -## SymbolicRegression v0.14.3 +### SymbolicRegression v0.14.3 [Diff since v0.14.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.2...v0.14.3) @@ -705,15 +707,15 @@ end - Turn off safe operators for turbo=true (#156) (@MilesCranmer) - Use `ProgressBars.jl` instead of copying (#157) (@MilesCranmer) -# SymbolicRegression.jl v0.14.2 +## SymbolicRegression.jl v0.14.2 -## SymbolicRegression v0.14.2 +### SymbolicRegression v0.14.2 [Diff since v0.14.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.1...v0.14.2) -# SymbolicRegression.jl v0.14.1 +## SymbolicRegression.jl v0.14.1 -## SymbolicRegression v0.14.1 +### SymbolicRegression v0.14.1 [Diff since v0.14.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.14.0...v0.14.1) @@ -721,9 +723,9 @@ end - Do optimizations as a low-probability mutation (#154) (@MilesCranmer) -# SymbolicRegression.jl v0.14.0 +## SymbolicRegression.jl v0.14.0 -## SymbolicRegression v0.14.0 +### SymbolicRegression v0.14.0 [Diff since v0.13.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.3...v0.14.0) @@ -731,9 +733,9 @@ end - Add `@extend_operators` from DynamicExpressions.jl v0.4.0 (#153) (@MilesCranmer) -# SymbolicRegression.jl v0.13.3 +## SymbolicRegression.jl v0.13.3 -## SymbolicRegression v0.13.3 +### SymbolicRegression v0.13.3 [Diff since v0.13.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.3) @@ -741,15 +743,15 @@ end - 30% speed up by using LoopVectorization in DynamicExpressions.jl (#151) (@MilesCranmer) -# SymbolicRegression.jl v0.13.2 +## SymbolicRegression.jl v0.13.2 - Allow strings to be passed for the `parallelism` argument of EquationSearch (e.g., `"multithreading"` instead of `:multithreading`). This is to allow compatibility with PyJulia calls, which can't pass symbols. **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.1...v0.13.2 -# SymbolicRegression.jl v0.13.1 +## SymbolicRegression.jl v0.13.1 -## SymbolicRegression v0.13.1 +### SymbolicRegression v0.13.1 [Diff since v0.13.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.13.0...v0.13.1) @@ -757,9 +759,9 @@ end - Refactor mutation probabilities (#140) (@MilesCranmer) -# SymbolicRegression.jl v0.13.0 +## SymbolicRegression.jl v0.13.0 -## SymbolicRegression v0.13.0 +### SymbolicRegression v0.13.0 [Diff since v0.12.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.6...v0.13.0) @@ -767,9 +769,9 @@ end - Split codebase in two: DynamicExpressions.jl and SymbolicRegression.jl (#147) (@MilesCranmer) -# SymbolicRegression.jl v0.12.6 +## SymbolicRegression.jl v0.12.6 -## SymbolicRegression v0.12.6 +### SymbolicRegression v0.12.6 [Diff since v0.12.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.5...v0.12.6) @@ -782,15 +784,15 @@ end - Fix search performance problem #148 (#149) (@MilesCranmer) -# SymbolicRegression.jl v0.12.5 +## SymbolicRegression.jl v0.12.5 -## SymbolicRegression v0.12.5 +### SymbolicRegression v0.12.5 [Diff since v0.12.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.4...v0.12.5) -# SymbolicRegression.jl v0.12.4 +## SymbolicRegression.jl v0.12.4 -## SymbolicRegression v0.12.4 +### SymbolicRegression v0.12.4 [Diff since v0.12.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.3...v0.12.4) @@ -798,9 +800,9 @@ end - Create logo (#145) (@MilesCranmer) -# SymbolicRegression.jl v0.12.3 +## SymbolicRegression.jl v0.12.3 -## SymbolicRegression v0.12.3 +### SymbolicRegression v0.12.3 [Diff since v0.12.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.2...v0.12.3) @@ -808,9 +810,9 @@ end - Even faster evaluation (#144) (@MilesCranmer) -# SymbolicRegression.jl v0.12.2 +## SymbolicRegression.jl v0.12.2 -## SymbolicRegression v0.12.2 +### SymbolicRegression v0.12.2 [Diff since v0.12.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.1...v0.12.2) @@ -822,15 +824,15 @@ end - Fast evaluation for constant trees (#129) (@MilesCranmer) -# SymbolicRegression.jl v0.12.1 +## SymbolicRegression.jl v0.12.1 -## SymbolicRegression v0.12.1 +### SymbolicRegression v0.12.1 [Diff since v0.12.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.12.0...v0.12.1) -# SymbolicRegression.jl v0.12.0 +## SymbolicRegression.jl v0.12.0 -## What's Changed +### What's Changed - Use functions returning NaN on branch cuts instead of abs (issue #109) by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/123 - By returning NaN, an expression will have infinite loss - this will make the expression search simply avoid expressions that hit out-of-domain errors, rather than using `abs` everywhere which results in fundamentally different functional forms. @@ -839,9 +841,9 @@ end **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.11.1...v0.12.0 -# SymbolicRegression.jl v0.11.1 +## SymbolicRegression.jl v0.11.1 -## What's Changed +### What's Changed - Generalize expressions to have arbitrary constant types by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/119 - Optimizer options by @johanbluecreek in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 @@ -849,15 +851,15 @@ end - Fix normalization when dataset has zero variance: https://github.com/MilesCranmer/SymbolicRegression.jl/commit/85f4909e8156ba8ff6cf89122371901a13df5688 - Set default parsimony to 0.0 -## New Contributors +### New Contributors - @johanbluecreek made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/121 **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.10.2...v0.11.1 -# SymbolicRegression.jl v0.10.2 +## SymbolicRegression.jl v0.10.2 -## SymbolicRegression v0.10.2 +### SymbolicRegression v0.10.2 [Diff since v0.9.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.7...v0.10.2) @@ -866,30 +868,30 @@ end - Update losses.md (#114) (@pitmonticone) - Set `timeout-minutes` for CI (#116) (@rikhuijzer) -# SymbolicRegression.jl v0.9.7 +## SymbolicRegression.jl v0.9.7 -## SymbolicRegression v0.9.7 +### SymbolicRegression v0.9.7 [Diff since v0.9.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.6...v0.9.7) -# SymbolicRegression.jl v0.9.6 +## SymbolicRegression.jl v0.9.6 -## SymbolicRegression v0.9.6 +### SymbolicRegression v0.9.6 [Diff since v0.9.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.5...v0.9.6) -# SymbolicRegression.jl v0.9.5 +## SymbolicRegression.jl v0.9.5 -## What's Changed +### What's Changed - Add deterministic option in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/108 - Fix issue with infinite while loop due to numerical precision **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.3...v0.9.5 -# SymbolicRegression.jl v0.9.3 +## SymbolicRegression.jl v0.9.3 -## SymbolicRegression v0.9.3 +### SymbolicRegression v0.9.3 [Diff since v0.9.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.2...v0.9.3) @@ -897,9 +899,9 @@ end - CompatHelper: bump compat for LossFunctions to 0.8, (keep existing compat) (#106) (@github-actions[bot]) -# SymbolicRegression.jl v0.9.2 +## SymbolicRegression.jl v0.9.2 -## SymbolicRegression v0.9.2 +### SymbolicRegression v0.9.2 [Diff since v0.9.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.9.0...v0.9.2) @@ -915,9 +917,9 @@ end - Limiting max evaluations (#104) (@MilesCranmer) - Custom complexities of operators, variables, and constants (#105) (@MilesCranmer) -# SymbolicRegression.jl v0.9.0 +## SymbolicRegression.jl v0.9.0 -## SymbolicRegression v0.9.0 +### SymbolicRegression v0.9.0 [Diff since v0.8.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.7...v0.9.0) @@ -929,15 +931,15 @@ end - Bump SymbolicUtils.jl to 0.19 (#84) (@ChrisRackauckas) -# SymbolicRegression.jl v0.8.7 +## SymbolicRegression.jl v0.8.7 -## SymbolicRegression v0.8.7 +### SymbolicRegression v0.8.7 [Diff since v0.8.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.6...v0.8.7) -# SymbolicRegression.jl v0.8.6 +## SymbolicRegression.jl v0.8.6 -## SymbolicRegression v0.8.6 +### SymbolicRegression v0.8.6 [Diff since v0.8.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.5...v0.8.6) @@ -946,9 +948,9 @@ end - Switch from FromFile.jl to traditional module system (#95) (@MilesCranmer) - Add constraints on the number of times operators can be nested (#96) (@MilesCranmer) -# SymbolicRegression.jl v0.8.5 +## SymbolicRegression.jl v0.8.5 -## SymbolicRegression v0.8.5 +### SymbolicRegression v0.8.5 [Diff since v0.8.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.3...v0.8.5) @@ -963,15 +965,15 @@ end - fix worker connection timeout error (#91) (@CharFox1) - Automatic multi-node compute setup by passing custom `addprocs` (#94) (@MilesCranmer) -# SymbolicRegression.jl v0.8.3 +## SymbolicRegression.jl v0.8.3 -## SymbolicRegression v0.8.3 +### SymbolicRegression v0.8.3 [Diff since v0.8.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.2...v0.8.3) -# SymbolicRegression.jl v0.8.2 +## SymbolicRegression.jl v0.8.2 -## SymbolicRegression v0.8.2 +### SymbolicRegression v0.8.2 [Diff since v0.8.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.8.1...v0.8.2) @@ -979,9 +981,9 @@ end - Interactive regression / printing epochs (#80) -# SymbolicRegression.jl v0.8.1 +## SymbolicRegression.jl v0.8.1 -## SymbolicRegression v0.8.1 +### SymbolicRegression v0.8.1 [Diff since v0.7.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.13...v0.8.1) @@ -995,27 +997,27 @@ end - Refactoring PopMember + adding adaptive parsimony to tournament (#75) (@MilesCranmer) - Introduce better default hyperparameters (#76) (@MilesCranmer) -# SymbolicRegression.jl v0.7.13 +## SymbolicRegression.jl v0.7.13 -## SymbolicRegression v0.7.13 +### SymbolicRegression v0.7.13 [Diff since v0.7.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.10...v0.7.13) -# SymbolicRegression.jl v0.7.10 +## SymbolicRegression.jl v0.7.10 -## SymbolicRegression v0.7.10 +### SymbolicRegression v0.7.10 [Diff since v0.7.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.9...v0.7.10) -# SymbolicRegression.jl v0.7.9 +## SymbolicRegression.jl v0.7.9 -## SymbolicRegression v0.7.9 +### SymbolicRegression v0.7.9 [Diff since v0.7.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.8...v0.7.9) -# SymbolicRegression.jl v0.7.8 +## SymbolicRegression.jl v0.7.8 -## SymbolicRegression v0.7.8 +### SymbolicRegression v0.7.8 [Diff since v0.7.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.7...v0.7.8) @@ -1027,15 +1029,15 @@ end - Fix tournament samples (#70) (@MilesCranmer) -# SymbolicRegression.jl v0.7.7 +## SymbolicRegression.jl v0.7.7 -## SymbolicRegression v0.7.7 +### SymbolicRegression v0.7.7 [Diff since v0.7.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.6...v0.7.7) -# SymbolicRegression.jl v0.7.6 +## SymbolicRegression.jl v0.7.6 -## SymbolicRegression v0.7.6 +### SymbolicRegression v0.7.6 [Diff since v0.7.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.5...v0.7.6) @@ -1044,15 +1046,15 @@ end - Parsimony interference in pareto frontier (#66) - DomainError when computing pareto curve (#67) -# SymbolicRegression.jl v0.7.5 +## SymbolicRegression.jl v0.7.5 -## SymbolicRegression v0.7.5 +### SymbolicRegression v0.7.5 [Diff since v0.7.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.4...v0.7.5) -# SymbolicRegression.jl v0.7.4 +## SymbolicRegression.jl v0.7.4 -## SymbolicRegression v0.7.4 +### SymbolicRegression v0.7.4 [Diff since v0.7.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.3...v0.7.4) @@ -1060,21 +1062,21 @@ end - Base.print (#64) -# SymbolicRegression.jl v0.7.3 +## SymbolicRegression.jl v0.7.3 -## SymbolicRegression v0.7.3 +### SymbolicRegression v0.7.3 [Diff since v0.7.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.2...v0.7.3) -# SymbolicRegression.jl v0.7.2 +## SymbolicRegression.jl v0.7.2 -## SymbolicRegression v0.7.2 +### SymbolicRegression v0.7.2 [Diff since v0.7.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.1...v0.7.2) -# SymbolicRegression.jl v0.7.1 +## SymbolicRegression.jl v0.7.1 -## SymbolicRegression v0.7.1 +### SymbolicRegression v0.7.1 [Diff since v0.7.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.7.0...v0.7.1) @@ -1082,9 +1084,9 @@ end - CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#56) (@github-actions[bot]) -# SymbolicRegression.jl v0.7.0 +## SymbolicRegression.jl v0.7.0 -## SymbolicRegression v0.7.0 +### SymbolicRegression v0.7.0 [Diff since v0.6.19](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.19...v0.7.0) @@ -1096,21 +1098,21 @@ end - Revert to SymbolicUtils.jl 0.6 (#60) (@MilesCranmer) -# SymbolicRegression.jl v0.6.19 +## SymbolicRegression.jl v0.6.19 -## SymbolicRegression v0.6.19 +### SymbolicRegression v0.6.19 [Diff since v0.6.18](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.18...v0.6.19) -# SymbolicRegression.jl v0.6.18 +## SymbolicRegression.jl v0.6.18 -## SymbolicRegression v0.6.18 +### SymbolicRegression v0.6.18 [Diff since v0.6.17](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.17...v0.6.18) -# SymbolicRegression.jl v0.6.17 +## SymbolicRegression.jl v0.6.17 -## SymbolicRegression v0.6.17 +### SymbolicRegression v0.6.17 [Diff since v0.6.16](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.16...v0.6.17) @@ -1119,9 +1121,9 @@ end - Can't define options as listed in Tutorial, causes Method Error. (#54) - Using recorder to only track specific information? (#55) -# SymbolicRegression.jl v0.6.16 +## SymbolicRegression.jl v0.6.16 -## SymbolicRegression v0.6.16 +### SymbolicRegression v0.6.16 [Diff since v0.6.15](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.15...v0.6.16) @@ -1129,9 +1131,9 @@ end - Expand compatibility to other SymbolicUtils.jl versions (#53) (@MilesCranmer) -# SymbolicRegression.jl v0.6.15 +## SymbolicRegression.jl v0.6.15 -## SymbolicRegression v0.6.15 +### SymbolicRegression v0.6.15 [Diff since v0.6.14](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.14...v0.6.15) @@ -1143,9 +1145,9 @@ end - SymbolicUtils v0.18 (#50) (@AlCap23) -# SymbolicRegression.jl v0.6.14 +## SymbolicRegression.jl v0.6.14 -## SymbolicRegression v0.6.14 +### SymbolicRegression v0.6.14 [Diff since v0.6.13](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.13...v0.6.14) @@ -1154,15 +1156,15 @@ end - nested task error (#43) - MethodError: Cannot `convert` an object of type SymbolicUtils.Term{Number, Nothing} to an object of type SymbolicUtils.Pow{Number, SymbolicUtils.Term{Number, Nothing}, Float32, Nothing} (#44) -# SymbolicRegression.jl v0.6.13 +## SymbolicRegression.jl v0.6.13 -## SymbolicRegression v0.6.13 +### SymbolicRegression v0.6.13 [Diff since v0.6.12](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.12...v0.6.13) -# SymbolicRegression.jl v0.6.12 +## SymbolicRegression.jl v0.6.12 -## SymbolicRegression v0.6.12 +### SymbolicRegression v0.6.12 [Diff since v0.6.11](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.11...v0.6.12) @@ -1174,9 +1176,9 @@ end - Fix index functions in SymbolicUtils (#40) (@MilesCranmer) -# SymbolicRegression.jl v0.6.11 +## SymbolicRegression.jl v0.6.11 -## SymbolicRegression v0.6.11 +### SymbolicRegression v0.6.11 [Diff since v0.6.10](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.10...v0.6.11) @@ -1184,9 +1186,9 @@ end - Updates for SymbolicUtils 0.13 (#37) (@AlCap23) -# SymbolicRegression.jl v0.6.10 +## SymbolicRegression.jl v0.6.10 -## SymbolicRegression v0.6.10 +### SymbolicRegression v0.6.10 [Diff since v0.6.9](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.9...v0.6.10) @@ -1199,51 +1201,51 @@ end - Add multithreading as alternative to distributed (#34) (@MilesCranmer) - Allow infinities in recorder (#36) (@cobac) -# SymbolicRegression.jl v0.6.9 +## SymbolicRegression.jl v0.6.9 -## SymbolicRegression v0.6.9 +### SymbolicRegression v0.6.9 [Diff since v0.6.8](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.8...v0.6.9) -# SymbolicRegression.jl v0.6.8 +## SymbolicRegression.jl v0.6.8 -## SymbolicRegression v0.6.8 +### SymbolicRegression v0.6.8 [Diff since v0.6.7](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.7...v0.6.8) -# SymbolicRegression.jl v0.6.7 +## SymbolicRegression.jl v0.6.7 -## SymbolicRegression v0.6.7 +### SymbolicRegression v0.6.7 [Diff since v0.6.6](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.6...v0.6.7) -# SymbolicRegression.jl v0.6.6 +## SymbolicRegression.jl v0.6.6 -## SymbolicRegression v0.6.6 +### SymbolicRegression v0.6.6 [Diff since v0.6.5](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.5...v0.6.6) -# SymbolicRegression.jl v0.6.5 +## SymbolicRegression.jl v0.6.5 -## SymbolicRegression v0.6.5 +### SymbolicRegression v0.6.5 [Diff since v0.6.4](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.4...v0.6.5) -# SymbolicRegression.jl v0.6.4 +## SymbolicRegression.jl v0.6.4 -## SymbolicRegression v0.6.4 +### SymbolicRegression v0.6.4 [Diff since v0.6.3](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.3...v0.6.4) -# SymbolicRegression.jl v0.6.3 +## SymbolicRegression.jl v0.6.3 -## SymbolicRegression v0.6.3 +### SymbolicRegression v0.6.3 [Diff since v0.6.2](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.2...v0.6.3) -# SymbolicRegression.jl v0.6.2 +## SymbolicRegression.jl v0.6.2 -## SymbolicRegression v0.6.2 +### SymbolicRegression v0.6.2 [Diff since v0.6.1](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.1...v0.6.2) @@ -1252,9 +1254,9 @@ end - Data recorder (#27) - Long-running parallel jobs have small percentage of processes hang (#28) -# SymbolicRegression.jl v0.6.1 +## SymbolicRegression.jl v0.6.1 -## SymbolicRegression v0.6.1 +### SymbolicRegression v0.6.1 [Diff since v0.6.0](https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.6.0...v0.6.1) From fb673b0bede3c0842f9616450b4460ac9d19ceac Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:06:59 +0100 Subject: [PATCH 04/74] docs: hide duplicate beta release --- CHANGELOG.md | 91 ---------------------------------------------------- 1 file changed, 91 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ec25a0abb..648df85a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -110,97 +110,6 @@ for Enzyme.jl (though Enzyme support is highly experimental). **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 -## SymbolicRegression.jl v1.0.0-beta1 - -This is a **_beta release_** that is not yet registered. To try it out, open a Julia REPL and hit `]`, then: - -```julia -pkg> add SymbolicRegression#v1.0.0-beta1 -``` - -Before the final release of v1.0.0, the hyperparameters will be re-tuned to optimize the new mutations: `swap_operands` and `rotate_tree`, which seem to be quite effective. - -### Major Changes - -#### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` - -https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 - -This is a breaking change in the format of expressions returned by SymbolicRegression. Now, instead of returning a `Node{T}`, SymbolicRegression will return a `Expression{T,Node{T},...}` (both from `equation_search` and from `report(mach).equations`). This type is much more convenient and high-level than the `Node` type, as it includes metadata relevant for the node, such as the operators and variable names. - -This means you can reliably do things like: - -```julia -using SymbolicRegression: Options, Expression, Node - -options = Options(binary_operators=[+, -, *, /], unary_operators=[cos, exp, sin]) -operators = options.operators -variable_names = ["x1", "x2", "x3"] -x1, x2, x3 = [Expression(Node(Float64; feature=i); operators, variable_names) for i=1:3] - -## Use the operators directly! -tree = cos(x1 - 3.2 * x2) - x1 * x1 -``` - -You can then do operations with this `tree`, without needing to track `operators`: - -```julia -println(tree) # Looks up the right operators based on internal metadata - -X = randn(3, 100) - -tree(X) # Call directly! -tree'(X) # gradients of expression -``` - -Each time you use an operator on or between two `Expression`s that include the operator in its list, it will look up the right enum index, and create the correct `Node`, and then return a new `Expression`. - -You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). - -#### Customizing behavior - -DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. - -This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. - -#### Auto-diff within optimization - -Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. - -However, with the introduction of `ParametricExpression`s, full support for autodiff-within-Optim.jl was needed. v1 includes support for some parts of DifferentiationInterface.jl, allowing you to actually turn on various automatic differentiation backends when optimizing constants. For example, you can use - -```julia -Options( - autodiff_backend=:Zygote, -) -``` - -to use Zygote.jl for autodiff during BFGS optimization, or even - -```julia -Options( - autodiff_backend=:Enzyme, -) -``` - -for Enzyme.jl (though Enzyme support is highly experimental). - -### Other Changes - -- Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. -- Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 -- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 -- fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -- chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 - -### New Contributors - -- @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -- Thanks to @larsentom for the mutation idea - -**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 - ## SymbolicRegression.jl v0.24.5 ### SymbolicRegression v0.24.5 From e808fd7566934ba2cd859fcbb20d7bdd7f57a20f Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:10:30 +0100 Subject: [PATCH 05/74] docs: fix other linter issues --- CHANGELOG.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 648df85a9..258d6c13d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,6 @@ + + + # Changelog ## SymbolicRegression.jl v1.0.0 @@ -416,7 +419,7 @@ end - **Printing improvements** (#228) - By default, only 5 significant digits are now printed, rather than the entire float. You can change this with the `print_precision` option. - In the default printed equations, `x₁` is used rather than `x1`. - - `y = ` is printed at the start (or `y₁ = ` for multi-output). With units this becomes, for example, `y[kg] =`. + - `y =` is printed at the start (or `y₁ =` for multi-output). With units this becomes, for example, `y[kg] =`. - **Misc** - Easier to convert from MLJ interface to SymbolicUtils (via `node_to_symbolic(::Node, ::AbstractSRRegressor)`) (#228) - Improved precompilation (#228) From 4afabd749ff2fd2bcb1b3c702651fa2762bc1cb8 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:31:25 +0100 Subject: [PATCH 06/74] docs: more headings for 1.0.0 announcement --- CHANGELOG.md | 50 ++++++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 258d6c13d..ee3cef7b1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,4 @@ + @@ -7,34 +8,38 @@ ### Summary of major recent changes -- Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` +- [Changed the core expression type from `Node{T} → Expression{T,Node{T},...}`](#changed-the-core-expression-type-from-nodet--expressiontnodet) - This gives us new features, improves user hackability, and greatly improves ergonomics! -- Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`). +- [Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`)](#created-template-expressions-for-fitting-expressions-under-a-user-specified-functional-form-templateexpression--abstractexpression) - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function. - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! - (Note that this still does not permit learning using vector operators, though we are working on that!) -- Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`). +- [Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`)](#created-parametric-expressions-for-custom-functional-forms-with-per-class-parameters-parametricexpression--abstractexpression) - This lets you fit expressions that act as _models of multiple systems_, with per-system parameters! -- Introduced a variety of new abstractions for user extensibility and to **support new research on symbolic regression**. +- [Introduced a variety of new abstractions for user extensibility](#introduced-a-variety-of-new-abstractions-for-user-extensibility) (**and to support new research on symbolic regression!**) - `AbstractExpression`, for increased flexibility in custom expression types. - `mutate!` and `AbstractMutationWeights`, for user-defined mutation operators. - `AbstractSearchState`, for holding custom metadata during searches. - - `AbstractOptions` and `AbstractRuntimeOptions`, for customizing everything else via multiple dispatch. + - `AbstractOptions` and `AbstractRuntimeOptions`, for customizing pretty much everything else in the library via multiple dispatch. Please make an issue/PR if you would like any particular internal functions be declared `public` to enable stability across versions for your tool. - Many of these were motivated to modularize the implementation of [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl), an LLM-guided version of SymbolicRegression.jl, so it can sit as a modular layer on top of SymbolicRegression.jl. -- Fundamental improvements to the underlying evolutionary algorithm. - - New mutation operators introduced, `swap_operands` and `rotate_tree`, which seem to help kick the evolution out of local optima. - - New hyperparameter defaults based on Pareto front volume rather than simply accuracy of the best expression. -- Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option. -- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator. +- [Fundamental improvements to the underlying evolutionary algorithm](#fundamental-improvements-to-the-underlying-evolutionary-algorithm) + - New mutation operators introduced, `swap_operands` and `rotate_tree` – both seem to help kick the evolution out of local optima. + - New hyperparameter defaults created, based on a Pareto front volume calculation, rather than simply accuracy of the best expression. +- [Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option](#support-for-zygotejl-and-enzymejl-within-the-constant-optimizer-specified-using-the-autodiff_backend-option) +- [Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator](#identified-and-fixed-a-major-internal-bug-involving-unexpected-aliasing-produced-by-the-crossover-operator) - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again. -- Major refactoring of the codebase to improve readability and modularity. +- [Major refactoring of the codebase to improve readability and modularity](#major-refactoring-of-the-codebase-to-improve-readability-and-modularity) - Increased documentation and examples. - Julia 1.10 is now the minimum supported Julia version. +### Update Guide + +TODO + ### Major Changes -#### **Breaking**: Changes default expressions from `Node` to the user-friendly `Expression` +#### Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -69,13 +74,15 @@ Each time you use an operator on or between two `Expression`s that include the o You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). -#### Customizing behavior +#### Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`) -DynamicExpressions v1.0 has a full `AbstractExpression` interface to customize behavior of pretty much anything. As an example, there is this included `ParametricExpression` type, with an example available in `examples/parametrized_function.jl`. You can use this to find _basis functions_ with per-class parameters. It still needs some tuning but it works for simple examples. +#### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`) -This `ParametricExpression` is meant partly as an example of the types of things you can do with the new `AbstractExpression` interface, though it should hopefully be a useful feature by itself. +#### Introduced a variety of new abstractions for user extensibility -#### Auto-diff within optimization +#### Fundamental improvements to the underlying evolutionary algorithm + +#### Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. @@ -97,6 +104,10 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). +#### Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator + +#### Major refactoring of the codebase to improve readability and modularity + ### Other Changes - Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 @@ -106,12 +117,7 @@ for Enzyme.jl (though Enzyme support is highly experimental). - fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 - chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 -### New Contributors - -- @spaette made their first contribution in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -- Thanks to @larsentom for the mutation idea - -**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0-beta1 +**Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0 ## SymbolicRegression.jl v0.24.5 From 5622a44f080ac4e6fc7b34a358d1e9ece28f1ecc Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:39:41 +0100 Subject: [PATCH 07/74] docs: tweak formatting of v1.0.0 --- CHANGELOG.md | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ee3cef7b1..7289db465 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,7 @@ ## SymbolicRegression.jl v1.0.0 -### Summary of major recent changes +Summary of major recent changes, described in more detail below: - [Changed the core expression type from `Node{T} → Expression{T,Node{T},...}`](#changed-the-core-expression-type-from-nodet--expressiontnodet) - This gives us new features, improves user hackability, and greatly improves ergonomics! @@ -32,14 +32,15 @@ - [Major refactoring of the codebase to improve readability and modularity](#major-refactoring-of-the-codebase-to-improve-readability-and-modularity) - Increased documentation and examples. - Julia 1.10 is now the minimum supported Julia version. +- [Other various features](#other-various-changes-in-v100) + +Note that some of these features were recently introduced in patch releases since they were backwards compatible. I am noting them here for visibility. ### Update Guide TODO -### Major Changes - -#### Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` +### Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -74,15 +75,17 @@ Each time you use an operator on or between two `Expression`s that include the o You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get_contents` – which returns the full info of an `AbstractExpression`, which might contain multiple expressions (which get stitched together when calling `get_tree`). -#### Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`) +### Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`) + +### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`) -#### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`) +### Introduced a variety of new abstractions for user extensibility -#### Introduced a variety of new abstractions for user extensibility +TODO: Describe `expression_type` and `node_type` options. -#### Fundamental improvements to the underlying evolutionary algorithm +### Fundamental improvements to the underlying evolutionary algorithm -#### Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option +### Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option Historically, SymbolicRegression has mostly relied on finite differences to estimate derivatives – which actually works well for small numbers of parameters. This is what Optim.jl selects unless you can provide it with gradients. @@ -104,18 +107,14 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). -#### Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator +### Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator -#### Major refactoring of the codebase to improve readability and modularity +### Major refactoring of the codebase to improve readability and modularity -### Other Changes +### Other Various Changes in v1.0.0 -- Implement tree rotation operator by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/348 - - This seems to help search performance overall – the new mutation is available as `rotate_tree` in the weights – which has been set to a default 0.3. -- Avoid `Base.sleep` by @MilesCranmer in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/305 -- CompatHelper: bump compat for MLJModelInterface to 1, (keep existing compat) by @github-actions in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/328 -- fix typos by @spaette in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/331 -- chore(deps): bump peter-evans/create-pull-request from 6 to 7 by @dependabot in https://github.com/MilesCranmer/SymbolicRegression.jl/pull/343 +- Support for per-variable complexity, via the `complexity_of_variables` option. +- Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option. **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0 From fcdefdc2d4b5f210bf6449fd9b7659fe184aa1a0 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 00:42:25 +0100 Subject: [PATCH 08/74] docs: tweak changelog --- CHANGELOG.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7289db465..a95b951ed 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ Summary of major recent changes, described in more detail below: -- [Changed the core expression type from `Node{T} → Expression{T,Node{T},...}`](#changed-the-core-expression-type-from-nodet--expressiontnodet) +- [Changed the core expression type from `Node{T} → Expression{T,Node{T},Metadata{...}}`](#changed-the-core-expression-type-from-nodet--expressiontnodet) - This gives us new features, improves user hackability, and greatly improves ergonomics! - [Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`)](#created-template-expressions-for-fitting-expressions-under-a-user-specified-functional-form-templateexpression--abstractexpression) - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function. @@ -32,14 +32,11 @@ Summary of major recent changes, described in more detail below: - [Major refactoring of the codebase to improve readability and modularity](#major-refactoring-of-the-codebase-to-improve-readability-and-modularity) - Increased documentation and examples. - Julia 1.10 is now the minimum supported Julia version. -- [Other various features](#other-various-changes-in-v100) +- [Other small features](#other-small-features-in-v100) +- Also see the [Update Guide](#update-guide) below for more details on upgrading. Note that some of these features were recently introduced in patch releases since they were backwards compatible. I am noting them here for visibility. -### Update Guide - -TODO - ### Changed the core expression type from `Node{T} → Expression{T,Node{T},...}` https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326 @@ -111,11 +108,15 @@ for Enzyme.jl (though Enzyme support is highly experimental). ### Major refactoring of the codebase to improve readability and modularity -### Other Various Changes in v1.0.0 +### Other Small Features in v1.0.0 - Support for per-variable complexity, via the `complexity_of_variables` option. - Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option. +### Update Guide + +TODO: + **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0 ## SymbolicRegression.jl v0.24.5 From d21b4941f25a71a9f48a2119d4f67b47947844de Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 01:16:02 +0100 Subject: [PATCH 09/74] docs: describe template expression in release notes --- CHANGELOG.md | 149 ++++++++++++++++++++++++++++++++++++++ src/TemplateExpression.jl | 11 ++- 2 files changed, 156 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a95b951ed..f69428c1a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -74,6 +74,155 @@ You can access the tree with `get_tree` (guaranteed to return a `Node`), or `get ### Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`) +Template Expressions allow users to define symbolic expressions with a fixed structure, combining multiple sub-expressions under user-specified constraints. +This is particularly useful for symbolic regression tasks where domain-specific knowledge or constraints must be imposed on the model's structure. + +This also lets you fit vector expressions using SymbolicRegression.jl, where vector components can also be shared! + +A `TemplateExpression` is constructed by specifying: + +- A named tuple of sub-expressions (e.g., `(; f=x1 - x2 * x2, g=1.5 * x3)`). +- A structure function that defines how these sub-expressions are combined both numerically and when printing. +- A `variable_mapping` that defines which variables each sub-expression can access. + +For example, you can create a `TemplateExpression` that enforces +the constraint: `sin(f(x1, x2)) + g(x3)^2` - where we evolve `f` and `g` simultaneously. + +Let's see some code for this. First, we define some base expressions for each input feature: + +```julia +using SymbolicRegression + +options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) +operators = options.operators +variable_names = ["x1", "x2", "x3"] + +# Base expressions: +x1 = Expression(Node{Float64}(; feature=1); operators, variable_names) +x2 = Expression(Node{Float64}(; feature=2); operators, variable_names) +x3 = Expression(Node{Float64}(; feature=3); operators, variable_names) +``` + +A `TemplateExpression` is basically a named tuple of expressions, with a structure function that defines how to combine them +in different contexts. +It also has a `variable_mapping` that defines which variables each sub-expression can access. For example: + +```julia +variable_mapping = (; f=[1, 2], g=[3]) # We have functions f(x1, x2) and g(x3) + +# Combine f and g them into a single scalar expression: +function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) + return @. sin(nt.f) + nt.g * nt.g +end +function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}}) + return "sin($(nt.f)) + $(nt.g)^2" # Generates a string representation of the expression +end +``` + +This defines how the `TemplateExpression` should be evaluated numerically on a given input, +and also how it should be represented as a string: + +```julia +julia> f_example = x1 - x2 * x2 + +julia> g_example = 1.5 * x3 # Normal `Expression` object + +julia> # Create TemplateExpression from these sub-expressions: + st_expr = TemplateExpression((; f=f_example, g=g_example); structure=my_structure, operators, variable_names, variable_mapping); + +julia> st_expr # Prints using `my_structure`! +sin(x1 - (x2 * x2)) + 1.5 * x3^2 + +julia> st_expr([0.0; 1.0; 2.0;;]) # Combines evaluation of `f` and `g` via `my_structure`! +1-element Vector{Float64}: + 8.158529015192103 +``` + +We can also use this `TemplateExpression` in SymbolicRegression.jl searches! + +
+For example, say that we want to fit *vector expressions*: + +```julia +using SymbolicRegression +using MLJBase: machine, fit!, report + +``` + +We first define our structure: + +```julia +function my_structure2(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}}) + return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2) )" +end +function my_structure2(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) + return map(i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i]), eachindex(nt.f)) +end +``` + +As well as our variable mapping, which says +we are fitting `f(x1, x2)`, `g1(x3)`, and `g2(x3)`: + +```julia +variable_mapping = (; f=[1, 2], g1=[3], g2=[3]) +``` + +Now, our dataset is a regular 2D array of inputs for `X`. +But our `y` is actually a _vector of 2-tuples_! + +```julia +X = rand(100, 3) .* 10 + +y = [ + ( + sin(X[i, 1]) + X[i, 3]^2, + sin(X[i, 1]) + X[i, 3] + ) + for i in eachindex(axes(X, 1)) +] +``` + +Now, since this is a vector-valued expression, we need to specify a custom `elementwise_loss` function: + +```julia +elementwise_loss = ((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2 +``` + +This reduces `y` and the predicted value of `y` returned by the structure function. + +Our regressor is then: + +```julia +model = SRRegressor(; + binary_operators=(+, *), + unary_operators=(sin,), + maxsize=15, + expression_type=TemplateExpression, + # Note - this is where we pass custom options to the expression type: + expression_options=(; structure=my_structure2, variable_mapping), +) + +mach = machine(model, X, y) +fit!(mach) +``` + +Let's see the performance of the model: + +```julia +report(mach) +``` + +We can also check the expression is split up correctly: + +```julia +best_expr = r.equations[idx] +best_f = get_contents(best_expr).f +best_g1 = get_contents(best_expr).g1 +best_g2 = get_contents(best_expr).g2 +``` + +
+ ### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`) ### Introduced a variety of new abstractions for user extensibility diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index d88c07dcc..e101571fc 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -72,10 +72,10 @@ x2 = Expression(Node{Float64}(; feature=2); operators, variable_names) x3 = Expression(Node{Float64}(; feature=3); operators, variable_names) # Define structure function for symbolic and numerical evaluation -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}}) +function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{Expression}}}) return sin(nt.f) + nt.g * nt.g end -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) +function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) return @. sin(nt.f) + nt.g * nt.g end @@ -231,8 +231,11 @@ end function (ex::TemplateExpression)( X, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws... ) - # TODO: Why do we need to do this? It should automatically handle this! - return DE.eval_tree_array(ex, X, operators; kws...) + raw_contents = get_contents(ex) + results = NamedTuple{keys(raw_contents)}( + map(ex -> ex(X, operators; kws...), values(raw_contents)) + ) + return get_metadata(ex).structure(results) end @unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any From 64b15d9a13e5b0a7b18848610ffc3b8ebeba2ed7 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 05:45:58 +0100 Subject: [PATCH 10/74] docs: describe ParametricExpression feature --- CHANGELOG.md | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index f69428c1a..aa912932d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -225,6 +225,83 @@ best_g2 = get_contents(best_expr).g2 ### Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`) +Parametric Expressions are another example of an `AbstractExpression` with additional features than a normal `Expression`. +This type allows SymbolicRegression.jl to fit a _parametric functional form_, rather than an expression with fixed constants. +This is particularly useful when modeling multiple systems or categories where each may have unique parameters but share +a common functional form and certain constants. + +A parametric expression is constructed with a tree represented as a `ParametricNode <: AbstractExpressionNode` – this is an alternative +type to the usual `Node` type which includes extra fields: `is_parameter::Bool`, and `parameter::UInt16`. +A `ParametricExpression` wraps this type and stores the actual parameter matrix (under `.metadata.parameters`) as well as +the parameter names (under `.metadata.parameter_names`). + +Various internal functions have been overloaded for custom behavior when fitting parametric expressions. +For example, `mutate_constant` will perturb a row of the parameter matrix, rather than a single parameter. + +When fitting a `ParametricExpression`, the `expression_options` parameter in `Options/SRRegressor` +should include a `max_parameters` keyword, which specifies the maximum number of separate parameters +in the functional form. + +
+Let's see an example of fitting a parametric expression: + +```julia +using SymbolicRegression +using Random: MersenneTwister +using Zygote +using MLJBase: machine, fit!, predict, report +``` + +Let's generate two classes of model and try to find it: + +```julia +rng = MersenneTwister(0) +X = NamedTuple{(:x1, :x2, :x3, :x4, :x5)}(ntuple(_ -> randn(rng, Float32, 30), Val(5))) +X = (; X..., classes=rand(rng, 1:2, 30)) # Add class labels (1 or 2) + +# Define per-class parameters +p1 = [0.0f0, 3.2f0] +p2 = [1.5f0, 0.5f0] + +# Generate target variable y with class-specific parameters +y = [ + 2 * cos(X.x4[i] + p1[X.classes[i]]) + X.x1[i]^2 - p2[X.classes[i]] + for i in eachindex(X.classes) +] +``` + +When fitting a `ParametricExpression`, it tends to be more important to set up +an `autodiff_backend` such as `:Zygote` or `:Enzyme`, as the default backend (finite differences) +can be too slow for the high-dimensional parameter spaces. + +```julia +model = SRRegressor( + niterations=100, + binary_operators=[+, *, /, -], + unary_operators=[cos, exp], + populations=30, + expression_type=ParametricExpression, + expression_options=(; max_parameters=2), # Allow up to 2 parameters + autodiff_backend=:Zygote, # Use Zygote for automatic differentiation + parallelism=:multithreading, +) + +mach = machine(model, X, y) + +fit!(mach) +``` + +The expressions are returned with the parameters: + +```julia +r = report(mach); +best_expr = r.equations[r.best_idx] +@show best_expr +@show get_metadata(best_expr).parameters +``` + +
+ ### Introduced a variety of new abstractions for user extensibility TODO: Describe `expression_type` and `node_type` options. From 24d3356857cb76ed1c09bd50999f7dedb858a68d Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 06:55:47 +0100 Subject: [PATCH 11/74] docs: improve v1.0.0 release --- CHANGELOG.md | 131 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 121 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index aa912932d..778533db9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,14 +22,14 @@ Summary of major recent changes, described in more detail below: - `AbstractSearchState`, for holding custom metadata during searches. - `AbstractOptions` and `AbstractRuntimeOptions`, for customizing pretty much everything else in the library via multiple dispatch. Please make an issue/PR if you would like any particular internal functions be declared `public` to enable stability across versions for your tool. - Many of these were motivated to modularize the implementation of [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl), an LLM-guided version of SymbolicRegression.jl, so it can sit as a modular layer on top of SymbolicRegression.jl. -- [Fundamental improvements to the underlying evolutionary algorithm](#fundamental-improvements-to-the-underlying-evolutionary-algorithm) - - New mutation operators introduced, `swap_operands` and `rotate_tree` – both seem to help kick the evolution out of local optima. +- Fundamental improvements to the underlying evolutionary algorithm + - New mutation operators introduced, `swap_operands` and `rotate_tree` – both of which seem to help kick the evolution out of local optima. - New hyperparameter defaults created, based on a Pareto front volume calculation, rather than simply accuracy of the best expression. - [Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option](#support-for-zygotejl-and-enzymejl-within-the-constant-optimizer-specified-using-the-autodiff_backend-option) -- [Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator](#identified-and-fixed-a-major-internal-bug-involving-unexpected-aliasing-produced-by-the-crossover-operator) +- Major refactoring of the codebase to improve readability and modularity +- Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again. -- [Major refactoring of the codebase to improve readability and modularity](#major-refactoring-of-the-codebase-to-improve-readability-and-modularity) - Increased documentation and examples. - Julia 1.10 is now the minimum supported Julia version. - [Other small features](#other-small-features-in-v100) @@ -304,7 +304,61 @@ best_expr = r.equations[r.best_idx] ### Introduced a variety of new abstractions for user extensibility -TODO: Describe `expression_type` and `node_type` options. +v1 introduces several new abstract types to improve extensibility. +These allow you to define custom behaviors by leveraging Julia's multiple dispatch system. + +**Expression types**: `AbstractExpression`: As explained above, SymbolicRegression now works on `Expression` rather than `Node`, by default. Actually, most internal functions in SymbolicRegression.jl are now defined on `AbstractExpression`, which allows overloading behavior. The expression type used can be modified with the `expression_type` and `node_type` options in `Options`. + +- `expression_type`: By default, this is `Expression`, which simply stores a binary tree (`Node`) as well as the `variable_names::Vector{String}` and `operators::DynamicExpressions.OperatorEnum`. See the implementation of `TemplateExpression` and `ParametricExpression` for examples of what needs to be overloaded. +- `node_type`: By default, this will be `DynamicExpressions.default_node_type(expression_type)`, which allows `ParametricExpression` to default to `ParametricNode` as the underlying node type. + +**Mutation types**: `mutate!(tree::N, member::P, ::Val{S}, mutation_weights::AbstractMutationWeights, options::AbstractOptions; kws...) where {N<:AbstractExpression,P<:PopMember,S}`, where `S` is a symbol representing the type of mutation to perform (where the symbols are taken from the `mutation_weights` fields). This allows you to define new mutation types by subtyping `AbstractMutationWeights` and creating new `mutate!` methods (simply pass the `mutation_weights` instance to `Options` or `SRRegressor`). + +**Search states**: `AbstractSearchState`: this is the abstract type for `SearchState` which stores the search process's state (such as the populations and halls of fame). For advanced users, you may wish to overload this to store additional state details. (For example, [LaSR](https://github.com/trishullab/LibraryAugmentedSymbolicRegression.jl) stores some history of the search process to feed the language model.) + +**Global options and full customization**: `AbstractOptions` and `AbstractRuntimeOptions`: Many functions throughout SymbolicRegression.jl take `AbstractOptions` as an input. The default assumed implementation is `Options`. However, you can subtype `AbstractOptions` to overload certain behavior. + +For example, if we have new options that we want to add to `Options`: + +```julia +Base.@kwdef struct MyNewOptions + a::Float64 = 1.0 + b::Int = 3 +end +``` + +we can create a combined options type that forwards properties to each corresponding type: + +```julia +struct MyOptions{O<:SymbolicRegression.Options} <: SymbolicRegression.AbstractOptions + new_options::MyNewOptions + sr_options::O +end +const NEW_OPTIONS_KEYS = fieldnames(MyNewOptions) + +# Constructor with both sets of parameters: +function MyOptions(; kws...) + new_options_keys = filter(k -> k in NEW_OPTIONS_KEYS, keys(kws)) + new_options = MyNewOptions(; NamedTuple(new_options_keys .=> Tuple(kws[k] for k in new_options_keys))...) + sr_options_keys = filter(k -> !(k in NEW_OPTIONS_KEYS), keys(kws)) + sr_options = SymbolicRegression.Options(; NamedTuple(sr_options_keys .=> Tuple(kws[k] for k in sr_options_keys))...) + return MyOptions(new_options, sr_options) +end + +# Make all `Options` available while also making `new_options` accessible +function Base.getproperty(options::MyOptions, k::Symbol) + if k in NEW_OPTIONS_KEYS + return getproperty(getfield(options, :new_options), k) + else + return getproperty(getfield(options, :sr_options), k) + end +end + +Base.propertynames(options::MyOptions) = (NEW_OPTIONS_KEYS..., fieldnames(SymbolicRegression.Options)...) +``` + +These new abstractions provide users with greater flexibility in defining the structure and behavior of expressions, nodes, and the search process itself. +These are also of course used as the basis for alternate behavior such as `ParametricExpression` and `TemplateExpression`. ### Fundamental improvements to the underlying evolutionary algorithm @@ -330,10 +384,6 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). -### Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator - -### Major refactoring of the codebase to improve readability and modularity - ### Other Small Features in v1.0.0 - Support for per-variable complexity, via the `complexity_of_variables` option. @@ -341,7 +391,68 @@ for Enzyme.jl (though Enzyme support is highly experimental). ### Update Guide -TODO: +Note that most code should work without changes! +Only if you are interacting with the return types of +`equation_search` or `report(mach)`, +or if you have modified any internals, +should you need to make some changes. + +So, the key changes are, as discussed [above](#changed-the-core-expression-type-from-nodet--expressiontnodet), the change from `Node` to `Expression` as the default type for representing expressions. +This includes the hall of fame object returned by `equation_search`, as well as the vector of +expressions stored in `report(mach).equations` for the MLJ interface. +If you need to interact with the internal tree structure, you can use `get_contents(expression)` (which returns the tree of an `Expression`, or the named tuple of a `ParametricExpression` - use `get_tree` to map it to a single tree format). + +To access other info stored in expressions, such as the operators or variable names, use `get_metadata(expression)`. + +This also means that expressions are now basically self-contained. +Functions like `eval_tree_array` no longer require options as arguments (though you can pass it to override the expression's stored options). +This means you can also simply call the expression directly with input data (in `[n_features, n_rows]` format). + +Before this change, you might have written something like this: + +```julia +using SymbolicRegression + +x1 = Node{Float64}(; feature=1) +options = Options(; binary_operators=(+, *)) +tree = x1 * x1 +``` + +This had worked, but only because of some spooky action at a distance behavior +involving a global store of last-used operators! +(Noting that `Node` simply stores an index to the operator to be lightweight.) + +After this change, things are much cleaner: + +```julia +options = Options(; binary_operators=(+, *)) +operators = options.operators +variable_names = ["x1"] +x1 = Expression(Node{Float64}(; feature=1); operators, variable_names) + +tree = x1 * x1 +``` + +This is now a safe and explicit construction, since `*` can lookup what operators each expression uses, and infer the right indices! +This `operators::OperatorEnum` is a tuple of functions, so does not incur dispatch costs at runtime. +(The `variable_names` is optional, and gets stripped during the evolution process, but is embedded when returned to the user.) + +We can now use this directly: + +```julia +println(tree) # Uses the `variable_names`, if stored +tree(randn(1, 50)) # Evaluates the expression using the stored operators +``` + +Also note that the minimum supported version of Julia is now 1.10. +This is because Julia 1.9 and earlier have now reached end-of-life status, +and 1.10 is the new LTS release. + +### Additional Notes + +- **Custom Loss Functions**: Continue to define these on `AbstractExpressionNode`. +- **General Usage**: Most existing code should work with minimal changes. +- **CI Updates**: Tests are now split into parts for faster runs, and use TestItems.jl for better scoping of test variables. **Full Changelog**: https://github.com/MilesCranmer/SymbolicRegression.jl/compare/v0.24.5...v1.0.0 From 235d44e876cea13b66561ee4e4fd53d4b609c7c5 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 20 Oct 2024 08:30:52 +0100 Subject: [PATCH 12/74] test: fix template expression call --- test/test_template_expression.jl | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/test/test_template_expression.jl b/test/test_template_expression.jl index 2187ea334..1f7f44b89 100644 --- a/test/test_template_expression.jl +++ b/test/test_template_expression.jl @@ -35,8 +35,7 @@ # We can evaluate with this too: cX = [1.0 2.0; 3.0 4.0; 5.0 6.0] - out, completed = st_expr(cX) - @test completed + out = st_expr(cX) @test out ≈ [sin(1.0) + cos(5.0)^2, sin(2.0) + cos(6.0)^2] # And also check the contents: @@ -125,8 +124,7 @@ end # We can directly call it: cX = [1.0 2.0; 3.0 4.0; 5.0 6.0] - out, completed = st_expr(cX) - @test completed + out = st_expr(cX) @test out == [(1 + 3, 1 + 5, 1 + 3), (2 + 4, 2 + 6, 2 + 4)] end @testitem "TemplateExpression getters" tags = [:part3] begin From 88a80ae3c6e20e0b6a530691914a6b172226d56e Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 02:42:45 +0100 Subject: [PATCH 13/74] feat: introduce `TemplateStructure` for cleaner `TemplateExpression` --- src/SymbolicRegression.jl | 3 +- src/TemplateExpression.jl | 274 ++++++++++++++++++++++++++++---------- 2 files changed, 208 insertions(+), 69 deletions(-) diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index 53afae3af..0ffeac249 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -13,6 +13,7 @@ export Population, Expression, ParametricExpression, TemplateExpression, + TemplateStructure, NodeSampler, AbstractExpression, AbstractExpressionNode, @@ -314,7 +315,7 @@ using .SearchUtilsModule: save_to_file, get_cur_maxsize, update_hall_of_fame! -using .TemplateExpressionModule: TemplateExpression +using .TemplateExpressionModule: TemplateExpression, TemplateStructure using .ExpressionBuilderModule: embed_metadata, strip_metadata @stable default_mode = "disable" begin diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index e101571fc..7a1095079 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -36,7 +36,120 @@ using ..MutateModule: MutateModule as MM using ..PopMemberModule: PopMember """ - TemplateExpression{T,F,N,E,TS,C,D} <: AbstractStructuredExpression{T,F,N,E,D} + TemplateStructure{K,S,N,E,C} <: Function + +A struct that defines a prescribed structure for a `TemplateExpression`, +including functions that define the result of combining sub-expressions in different contexts. + +The `K` parameter is used to specify the symbols representing the inner expressions. +If not declared using the constructor `TemplateStructure{K}(...)`, the keys of the +`variable_constraints` `NamedTuple` will be used to infer this. + +# Fields +- `combine`: Optional function taking a `NamedTuple` of function keys => expression + pairs, returning a single expression. Fallback method used by `get_tree` + on a `TemplateExpression` to generate a single `Expression`. +- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vector pairs, + returning a single vector. Used for evaluating the expression tree. +- `combine_strings`: Optional function taking a `NamedTuple` of function keys => string pairs, + returning a single string. Used for printing the expression tree. +- `variable_constraints`: Optional `NamedTuple` that defines which variables each sub-expression is allowed to access. + For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`. +""" +struct TemplateStructure{ + K, + E<:Union{Nothing,Function}, + N<:Union{Nothing,Function}, + S<:Union{Nothing,Function}, + C<:Union{Nothing,NamedTuple{<:Any,<:Tuple{Vararg{Vector{Int}}}}}, +} <: Function + combine::E + combine_vectors::N + combine_strings::S + variable_constraints::C +end + +function TemplateStructure{K}(combine::E; kws...) where {K,E<:Function} + return TemplateStructure{K}(; combine, kws...) +end +function TemplateStructure{K}(; kws...) where {K} + return TemplateStructure(; _function_keys=Val(K), kws...) +end +function TemplateStructure(combine::E; kws...) where {E<:Function} + return TemplateStructure(; combine, kws...) +end +function TemplateStructure(; + combine::E=nothing, + combine_vectors::N=nothing, + combine_strings::S=nothing, + variable_constraints::C=nothing, + _function_keys::Val{K}=Val(nothing), +) where {E,N,S,C,K} + K === nothing && + variable_constraints === nothing && + throw( + ArgumentError( + "If `variable_constraints` is not provided, " * + "you must initialize `TemplateStructure` with " * + "`TemplateStructure{K}(...)`, for tuple of symbols `K`.", + ), + ) + K !== nothing && + variable_constraints !== nothing && + K != keys(variable_constraints) && + throw(ArgumentError("`K` must match the keys of `variable_constraints`.")) + + Kout = K === nothing ? keys(variable_constraints::NamedTuple) : K + return TemplateStructure{Kout,E,N,S,C}( + combine, combine_vectors, combine_strings, variable_constraints + ) +end +# TODO: This interface is ugly. Part of this is due to AbstractStructuredExpression, +# which was not written with this `TemplateStructure` in mind, but just with a +# single callable function. + +function combine(template::TemplateStructure, nt::NamedTuple) + return (template.combine::Function)(nt) +end +function combine_vectors( + template::TemplateStructure, nt::NamedTuple, X::Union{AbstractMatrix,Nothing}=nothing +) + combiner = template.combine_vectors::Function + if X !== nothing && hasmethod(combiner, typeof((nt, X))) + # TODO: Refactor this + return combiner(nt, X) + else + return combiner(nt) + end +end +function combine_strings(template::TemplateStructure, nt::NamedTuple) + return (template.combine_strings::Function)(nt) +end + +function (template::TemplateStructure)( + nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractExpression}}} +) + return combine(template, nt) +end +function (template::TemplateStructure)( + nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}, + X::Union{AbstractMatrix,Nothing}=nothing, +) + return combine_vectors(template, nt, X) +end +function (template::TemplateStructure)( + nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}} +) + return combine_strings(template, nt) +end + +can_combine(template::TemplateStructure) = template.combine !== nothing +can_combine_vectors(template::TemplateStructure) = template.combine_vectors !== nothing +can_combine_strings(template::TemplateStructure) = template.combine_strings !== nothing +get_function_keys(::TemplateStructure{K}) where {K} = K + +""" + TemplateExpression{T,F,N,E,TS,D} <: AbstractStructuredExpression{T,F,N,E,D} A symbolic expression that allows the combination of multiple sub-expressions in a structured way, with constraints on variable usage. @@ -46,16 +159,12 @@ domain-specific knowledge or constraints must be imposed on the model's structur # Constructor -- `TemplateExpression(trees; structure, operators, variable_names, variable_mapping)` +- `TemplateExpression(trees; structure, operators, variable_names)` - `trees`: A `NamedTuple` holding the sub-expressions (e.g., `f = Expression(...)`, `g = Expression(...)`). - - `structure`: A function that defines how the sub-expressions are combined. This should have one method - that takes `trees` as input and returns a single `Expression` node, and another method which takes - a `NamedTuple` of `Vector` (representing the numerical results of each sub-expression) and returns - a single vector after combining them. + - `structure`: A `TemplateStructure` which holds functions that define how the sub-expressions are combined + in different contexts. - `operators`: An `OperatorEnum` that defines the allowed operators for the sub-expressions. - `variable_names`: An optional `Vector` of `String` that defines the names of the variables in the dataset. - - `variable_mapping`: A `NamedTuple` that defines which variables each sub-expression is allowed to access. - For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`. # Example @@ -63,7 +172,8 @@ Let's create an example `TemplateExpression` that combines two sub-expressions ` ```julia # Define operators and variable names -operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) +options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) +operators = options.operators variable_names = ["x1", "x2", "x3"] # Create sub-expressions @@ -71,41 +181,42 @@ x1 = Expression(Node{Float64}(; feature=1); operators, variable_names) x2 = Expression(Node{Float64}(; feature=2); operators, variable_names) x3 = Expression(Node{Float64}(; feature=3); operators, variable_names) -# Define structure function for symbolic and numerical evaluation -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{Expression}}}) - return sin(nt.f) + nt.g * nt.g -end -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) - return @. sin(nt.f) + nt.g * nt.g -end - -# Define variable constraints (if desired) -variable_mapping = (; f=[1, 2], g=[3]) - # Create TemplateExpression example_expr = (; f=x1, g=x3) st_expr = TemplateExpression( example_expr; - structure=my_structure, operators, variable_names, variable_mapping + structure=TemplateStructure{(:f, :g)}(nt -> sin(nt.f) + nt.g * nt.g), + operators, + variable_names, +) +``` + +We can also define constraints on which variables each sub-expression is allowed to access: + +```julia +variable_constraints = (; f=[1, 2], g=[3]) +st_expr = TemplateExpression( + example_expr; + structure=TemplateStructure( + nt -> sin(nt.f) + nt.g * nt.g; variable_constraints + ), + operators, + variable_names, ) ``` When fitting a model in SymbolicRegression.jl, you would provide the `TemplateExpression` -as the `expression_type` argument, and then pass `expression_options=(; structure=my_structure, variable_mapping=variable_mapping)` -as additional options. The `variable_mapping` will constraint `f` to only have access to `x1` and `x2`, +as the `expression_type` argument, and then pass `expression_options=(; structure=TemplateStructure(...))` +as additional options. The `variable_constraints` will constraint `f` to only have access to `x1` and `x2`, and `g` to only have access to `x3`. """ struct TemplateExpression{ T, - F<:Function, + F<:TemplateStructure, N<:AbstractExpressionNode{T}, E<:Expression{T,N}, # TODO: Generalize this TS<:NamedTuple{<:Any,<:NTuple{<:Any,E}}, - C<:NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}}, # The constraints - # TODO: No need for this to be a parametric type - D<:@NamedTuple{ - structure::F, operators::O, variable_names::V, variable_mapping::C - } where {O,V}, + D<:@NamedTuple{structure::F, operators::O, variable_names::V} where {O,V}, } <: AbstractStructuredExpression{T,F,N,E,D} trees::TS metadata::Metadata{D} @@ -114,15 +225,13 @@ struct TemplateExpression{ trees::TS, metadata::Metadata{D} ) where { TS, - F<:Function, - C<:NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}}, - D<:@NamedTuple{ - structure::F, operators::O, variable_names::V, variable_mapping::C - } where {O,V}, + F<:TemplateStructure, + D<:@NamedTuple{structure::F, operators::O, variable_names::V} where {O,V}, } + @assert keys(trees) == get_function_keys(metadata.structure) E = typeof(first(values(trees))) N = node_type(E) - return new{eltype(N),F,N,E,TS,C,D}(trees, metadata) + return new{eltype(N),F,N,E,TS,D}(trees, metadata) end end @@ -131,19 +240,11 @@ function TemplateExpression( structure::F, operators::Union{AbstractOperatorEnum,Nothing}=nothing, variable_names::Union{AbstractVector{<:AbstractString},Nothing}=nothing, - variable_mapping::NamedTuple{<:Any,<:NTuple{<:Any,Vector{Int}}}, -) where {F<:Function} - @assert length(trees) == length(variable_mapping) - if variable_names !== nothing - # TODO: Should this be removed? - @assert Set(eachindex(variable_names)) == - Set(Iterators.flatten(values(variable_mapping))) - end - @assert keys(trees) == keys(variable_mapping) +) where {F<:TemplateStructure} example_tree = first(values(trees))::AbstractExpression operators = get_operators(example_tree, operators) variable_names = get_variable_names(example_tree, variable_names) - metadata = (; structure, operators, variable_names, variable_mapping) + metadata = (; structure, operators, variable_names) return TemplateExpression(trees, Metadata(metadata)) end @@ -153,6 +254,26 @@ end ExpressionInterface{all_ei_methods_except(())}, TemplateExpression, [Arguments()] ) +function combine(ex::TemplateExpression, nt::NamedTuple) + return combine(get_metadata(ex).structure, nt) +end +function combine_vectors( + ex::TemplateExpression, nt::NamedTuple, X::Union{AbstractMatrix,Nothing}=nothing +) + return combine_vectors(get_metadata(ex).structure, nt, X) +end +function combine_strings(ex::TemplateExpression, nt::NamedTuple) + return combine_strings(get_metadata(ex).structure, nt) +end + +function can_combine_vectors(ex::TemplateExpression) + return can_combine_vectors(get_metadata(ex).structure) +end +function can_combine_strings(ex::TemplateExpression) + return can_combine_strings(get_metadata(ex).structure) +end +get_function_keys(ex::TemplateExpression) = get_function_keys(get_metadata(ex).structure) + function EB.create_expression( t::AbstractExpressionNode{T}, options::AbstractOptions, @@ -161,7 +282,7 @@ function EB.create_expression( ::Type{E}, ::Val{embed}=Val(false), ) where {T,L,embed,E<:TemplateExpression} - function_keys = keys(options.expression_options.variable_mapping) + function_keys = get_function_keys(options.expression_options.structure) # NOTE: We need to copy over the operators so we can call the structure function operators = options.operators @@ -186,9 +307,7 @@ function EB.extra_init_params( return (; options.operators, options.expression_options...) end function EB.sort_params(params::NamedTuple, ::Type{<:TemplateExpression}) - return (; - params.structure, params.operators, params.variable_names, params.variable_mapping - ) + return (; params.structure, params.operators, params.variable_names) end function ComplexityModule.compute_complexity( @@ -206,12 +325,16 @@ function DE.string_tree( tree::TemplateExpression, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws... ) raw_contents = get_contents(tree) - function_keys = keys(raw_contents) - inner_strings = NamedTuple{function_keys}( - map(ex -> DE.string_tree(ex, operators; kws...), values(raw_contents)) - ) - # TODO: Make a fallback function in case the structure function is undefined. - return get_metadata(tree).structure(inner_strings) + if can_combine_strings(tree) + function_keys = keys(raw_contents) + inner_strings = NamedTuple{function_keys}( + map(ex -> DE.string_tree(ex, operators; kws...), values(raw_contents)) + ) + return combine_strings(tree, inner_strings) + else + @assert can_combine(tree) + return DE.string_tree(combine(tree, raw_contents), operators; kws...) + end end function DE.eval_tree_array( tree::TemplateExpression{T}, @@ -220,22 +343,37 @@ function DE.eval_tree_array( kws..., ) where {T} raw_contents = get_contents(tree) - - # Raw numerical results of each inner expression: - outs = map(ex -> DE.eval_tree_array(ex, cX, operators; kws...), values(raw_contents)) - - # Combine them using the structure function: - results = NamedTuple{keys(raw_contents)}(map(first, outs)) - return get_metadata(tree).structure(results), all(last, outs) + if can_combine_vectors(tree) + # Raw numerical results of each inner expression: + outs = map( + ex -> DE.eval_tree_array(ex, cX, operators; kws...), values(raw_contents) + ) + # Combine them using the structure function: + results = NamedTuple{keys(raw_contents)}(map(first, outs)) + if !all(last, outs) + return first(first(outs)), false + else + return combine_vectors(tree, results, cX), true + end + else + @assert can_combine(tree) + return DE.eval_tree_array(combine(tree, raw_contents), cX, operators; kws...) + end end function (ex::TemplateExpression)( X, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws... ) raw_contents = get_contents(ex) - results = NamedTuple{keys(raw_contents)}( - map(ex -> ex(X, operators; kws...), values(raw_contents)) - ) - return get_metadata(ex).structure(results) + if can_combine_vectors(ex) + results = NamedTuple{keys(raw_contents)}( + map(ex -> ex(X, operators; kws...), values(raw_contents)) + ) + return combine_vectors(ex, results, X) + else + @assert can_combine(ex) + callable = combine(ex, raw_contents) + return callable(X, operators; kws...) + end end @unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any @@ -333,12 +471,12 @@ function CC.check_constraints( cursize::Union{Int,Nothing}=nothing, )::Bool raw_contents = get_contents(ex) - variable_mapping = get_metadata(ex).variable_mapping + variable_constraints = get_metadata(ex).structure.variable_constraints # First, we check the variable constraints at the top level: has_invalid_variables = any(keys(raw_contents)) do key tree = raw_contents[key] - allowed_variables = variable_mapping[key] + allowed_variables = variable_constraints[key] contains_other_features_than(tree, allowed_variables) end if has_invalid_variables From 96e495afcaefd4687754f9b45daec4982bab842c Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 02:44:14 +0100 Subject: [PATCH 14/74] docs: document ability to define second method for combine_vectors --- src/TemplateExpression.jl | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 7a1095079..08db22f9b 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -46,12 +46,15 @@ If not declared using the constructor `TemplateStructure{K}(...)`, the keys of t `variable_constraints` `NamedTuple` will be used to infer this. # Fields -- `combine`: Optional function taking a `NamedTuple` of function keys => expression - pairs, returning a single expression. Fallback method used by `get_tree` +- `combine`: Optional function taking a `NamedTuple` of function keys => expressions, + returning a single expression. Fallback method used by `get_tree` on a `TemplateExpression` to generate a single `Expression`. -- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vector pairs, +- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vectors, returning a single vector. Used for evaluating the expression tree. -- `combine_strings`: Optional function taking a `NamedTuple` of function keys => string pairs, + You may optionally define a method with a second argument `X` for if you wish + to include the data matrix `X` (of shape `[num_features, num_rows]`) in the + computation. +- `combine_strings`: Optional function taking a `NamedTuple` of function keys => strings, returning a single string. Used for printing the expression tree. - `variable_constraints`: Optional `NamedTuple` that defines which variables each sub-expression is allowed to access. For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`. From 896caadf99dcdb7444a1dc12458d22cad9e468d2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 02:48:47 +0100 Subject: [PATCH 15/74] test: update template expression test --- examples/template_expression.jl | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/examples/template_expression.jl b/examples/template_expression.jl index ade5fc5cf..8c2465b1a 100644 --- a/examples/template_expression.jl +++ b/examples/template_expression.jl @@ -8,23 +8,14 @@ operators = options.operators variable_names = (i -> "x$i").(1:3) x1, x2, x3 = (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) -variable_mapping = (; f=[1, 2], g1=[3], g2=[3]) - -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) - return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2) )" -end -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) - return map(i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i]), eachindex(nt.f)) -end - -st_expr = TemplateExpression( - (; f=x1, g1=x3, g2=x3); - structure=my_structure, - operators, - variable_names, - variable_mapping, +structure = TemplateStructure{(:f, :g1, :g2)}(; + combine_vectors=e -> map((f, g1, g2) -> (f + g1, f + g2), e.f, e.g1, e.g2), + combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2) )", + variable_constraints=(; f=[1, 2], g1=[3], g2=[3]), ) +st_expr = TemplateExpression((; f=x1, g1=x3, g2=x3); structure, operators, variable_names) + X = rand(100, 3) .* 10 # Our dataset is a vector of 2-tuples @@ -35,7 +26,7 @@ model = SRRegressor(; unary_operators=(sin,), maxsize=15, expression_type=TemplateExpression, - expression_options=(; structure=my_structure, variable_mapping), + expression_options=(; structure), # The elementwise needs to operate directly on each row of `y`: elementwise_loss=((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2, early_stop_condition=(loss, complexity) -> loss < 1e-5 && complexity <= 7, From 103b920e704e6da190c1db6ef6e090f93e508f18 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 02:54:39 +0100 Subject: [PATCH 16/74] test: update more template expression tests --- test/test_template_expression.jl | 80 ++++++++++++-------------------- 1 file changed, 30 insertions(+), 50 deletions(-) diff --git a/test/test_template_expression.jl b/test/test_template_expression.jl index 1f7f44b89..873ff8970 100644 --- a/test/test_template_expression.jl +++ b/test/test_template_expression.jl @@ -6,25 +6,22 @@ options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) operators = options.operators - variable_names = (i -> "x$i").(1:3) + variable_names = ["x1", "x2", "x3"] x1, x2, x3 = (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) # For combining expressions to a single expression: - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) = - "sin($(nt.f)) + $(nt.g)^2" - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) = - @. sin(nt.f) + nt.g^2 - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}}) = - sin(nt.f) + nt.g * nt.g - - variable_mapping = (; f=[1, 2], g=[3]) + structure = TemplateStructure(; + combine=e -> sin(e.f) + e.g * e.g, + combine_vectors=e -> (@. sin(e.f) + e.g^2), + combine_strings=e -> "sin($(e.f)) + $(e.g)^2", + variable_constraints=(; f=[1, 2], g=[3]), + ) + + @test structure isa TemplateStructure{(:f, :g)} + st_expr = TemplateExpression( - (; f=x1, g=cos(x3)); - structure=my_structure, - operators, - variable_names, - variable_mapping, + (; f=x1, g=cos(x3)); structure=my_structure, operators, variable_names ) @test string_tree(st_expr) == "sin(x1) + cos(x3)^2" operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(cos, sin)) @@ -67,17 +64,13 @@ end (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) # For combining expressions to a single expression: - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) = - "sin($(nt.f)) + $(nt.g)^2" - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) = - @. sin(nt.f) + nt.g^2 - my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:Expression}}}) = - sin(nt.f) + nt.g * nt.g - - variable_mapping = (; f=[1, 2], g=[3]) - st_expr = TemplateExpression( - (; f=x1, g=x3); structure=my_structure, operators, variable_names, variable_mapping + structure = TemplateStructure{(:f, :g)}(; + combine=e -> sin(e.f) + e.g * e.g, + combine_strings=e -> "sin($(e.f)) + $(e.g)^2", + combine_vectors=e -> (@. sin(e.f) + e.g^2), + variable_constraints=(; f=[1, 2], g=[3]), ) + st_expr = TemplateExpression((; f=x1, g=x3); structure, operators, variable_names) @test Interfaces.test(ExpressionInterface, TemplateExpression, [st_expr]) end @testitem "Utilising TemplateExpression to build vector expressions" tags = [:part3] begin @@ -85,15 +78,11 @@ end using Random: rand # Define the structure function, which returns a tuple: - function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractString}}}) - return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2), $(nt.f) + $(nt.g3) )" - end - function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{<:AbstractVector}}}) - return map( - i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i], nt.f[i] + nt.g3[i]), - eachindex(nt.f), - ) - end + structure = TemplateStructure{(:f, :g1, :g2, :g3)}(; + combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2), $(e.f) + $(e.g3) )", + combine_vectors=e -> + map((f, g1, g2, g3) -> (f + g1, f + g2, f + g3), e.f, e.g1, e.g2, e.g3), + ) # Set up operators and variable names options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) @@ -105,20 +94,15 @@ end # Test with vector inputs: nt_vector = NamedTuple{(:f, :g1, :g2, :g3)}((1:3, 4:6, 7:9, 10:12)) - @test my_structure(nt_vector) == [(5, 8, 11), (7, 10, 13), (9, 12, 15)] + @test structure(nt_vector) == [(5, 8, 11), (7, 10, 13), (9, 12, 15)] # And string inputs: nt_string = NamedTuple{(:f, :g1, :g2, :g3)}(("x1", "x2", "x3", "x2")) - @test my_structure(nt_string) == "( x1 + x2, x1 + x3, x1 + x2 )" + @test structure(nt_string) == "( x1 + x2, x1 + x3, x1 + x2 )" # Now, using TemplateExpression: - variable_mapping = (; f=[1, 2], g1=[3], g2=[3], g3=[3]) st_expr = TemplateExpression( - (; f=x1, g1=x2, g2=x3, g3=x2); - structure=my_structure, - options.operators, - variable_names, - variable_mapping, + (; f=x1, g1=x2, g2=x3, g3=x2); structure, options.operators, variable_names ) @test string_tree(st_expr) == "( x1 + x2, x1 + x3, x1 + x2 )" @@ -137,22 +121,18 @@ end x1, x2, x3 = (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) - my_structure(nt) = nt.f - - variable_mapping = (; f=[1, 2], g1=[3], g2=[3], g3=[3]) + structure = TemplateStructure(; + combine=e -> e.f, variable_constraints=(; f=[1, 2], g1=[3], g2=[3], g3=[3]) + ) st_expr = TemplateExpression( - (; f=x1, g1=x3, g2=x3, g3=x3); - structure=my_structure, - operators, - variable_names, - variable_mapping, + (; f=x1, g1=x3, g2=x3, g3=x3); structure, operators, variable_names ) @test st_expr isa TemplateExpression @test get_operators(st_expr) == operators @test get_variable_names(st_expr) == variable_names - @test get_metadata(st_expr).structure == my_structure + @test get_metadata(st_expr).structure == structure end @testitem "Integration Test with fit! and Performance Check" tags = [:part3] begin include("../examples/template_expression.jl") From 1d23cfb263b07033d4a5ad6a4bd500f9e6d0460f Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 03:00:08 +0100 Subject: [PATCH 17/74] fix: missing method for `can_combine` --- src/TemplateExpression.jl | 1 + test/test_template_expression.jl | 4 +--- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 08db22f9b..6f4292981 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -269,6 +269,7 @@ function combine_strings(ex::TemplateExpression, nt::NamedTuple) return combine_strings(get_metadata(ex).structure, nt) end +can_combine(ex::TemplateExpression) = can_combine(get_metadata(ex).structure) function can_combine_vectors(ex::TemplateExpression) return can_combine_vectors(get_metadata(ex).structure) end diff --git a/test/test_template_expression.jl b/test/test_template_expression.jl index 873ff8970..6809ef8e8 100644 --- a/test/test_template_expression.jl +++ b/test/test_template_expression.jl @@ -20,9 +20,7 @@ @test structure isa TemplateStructure{(:f, :g)} - st_expr = TemplateExpression( - (; f=x1, g=cos(x3)); structure=my_structure, operators, variable_names - ) + st_expr = TemplateExpression((; f=x1, g=cos(x3)); structure, operators, variable_names) @test string_tree(st_expr) == "sin(x1) + cos(x3)^2" operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(cos, sin)) From 6f5221c12be85b86ee8910d855be60abaaf7beab Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 03:05:23 +0100 Subject: [PATCH 18/74] style: formatting --- src/TemplateExpression.jl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 6f4292981..2191096dd 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -269,7 +269,9 @@ function combine_strings(ex::TemplateExpression, nt::NamedTuple) return combine_strings(get_metadata(ex).structure, nt) end -can_combine(ex::TemplateExpression) = can_combine(get_metadata(ex).structure) +function can_combine(ex::TemplateExpression) + return can_combine(get_metadata(ex).structure) +end function can_combine_vectors(ex::TemplateExpression) return can_combine_vectors(get_metadata(ex).structure) end From 127241d0de23264f8e5a4bda12e363554932bca5 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 07:10:34 +0100 Subject: [PATCH 19/74] fix: type instability in eval_tree_array --- src/TemplateExpression.jl | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 2191096dd..11d44d045 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -356,11 +356,7 @@ function DE.eval_tree_array( ) # Combine them using the structure function: results = NamedTuple{keys(raw_contents)}(map(first, outs)) - if !all(last, outs) - return first(first(outs)), false - else - return combine_vectors(tree, results, cX), true - end + return combine_vectors(tree, results, cX), all(last, outs) else @assert can_combine(tree) return DE.eval_tree_array(combine(tree, raw_contents), cX, operators; kws...) From 1031d9b51445828574035101776e4c1df02aa373 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 07:44:18 +0100 Subject: [PATCH 20/74] fix: method ambiguity --- src/TemplateExpression.jl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 11d44d045..d2f2acc4d 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -130,18 +130,18 @@ function combine_strings(template::TemplateStructure, nt::NamedTuple) end function (template::TemplateStructure)( - nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractExpression}}} + nt::NamedTuple{<:Any,<:Tuple{AbstractExpression,Vararg{AbstractExpression}}} ) return combine(template, nt) end function (template::TemplateStructure)( - nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}, + nt::NamedTuple{<:Any,<:Tuple{AbstractVector,Vararg{AbstractVector}}}, X::Union{AbstractMatrix,Nothing}=nothing, ) return combine_vectors(template, nt, X) end function (template::TemplateStructure)( - nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}} + nt::NamedTuple{<:Any,<:Tuple{AbstractString,Vararg{AbstractString}}} ) return combine_strings(template, nt) end From 468e937aadf87169f334345c1b87064e5399b6b4 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 08:22:06 +0100 Subject: [PATCH 21/74] test: fix JET error --- src/TemplateExpression.jl | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index d2f2acc4d..9cab5da90 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -87,7 +87,13 @@ function TemplateStructure(; combine_strings::S=nothing, variable_constraints::C=nothing, _function_keys::Val{K}=Val(nothing), -) where {E,N,S,C,K} +) where { + K, + E<:Union{Nothing,Function}, + N<:Union{Nothing,Function}, + S<:Union{Nothing,Function}, + C<:Union{Nothing,NamedTuple{<:Any,<:Tuple{Vararg{Vector{Int}}}}}, +} K === nothing && variable_constraints === nothing && throw( @@ -377,7 +383,6 @@ function (ex::TemplateExpression)( return callable(X, operators; kws...) end end -@unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any function DA.violates_dimensional_constraints( tree::TemplateExpression, dataset::Dataset, options::AbstractOptions From f59186b05001598e8ab6b48973b99d3c93cdd0c5 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 09:29:55 +0100 Subject: [PATCH 22/74] feat: accommodate newlines in equation strings --- src/HallOfFame.jl | 36 +++++++++++++++++++++--------------- src/Utils.jl | 20 ++++++++++++++++---- test/test_pretty_printing.jl | 9 +++++++++ 3 files changed, 46 insertions(+), 19 deletions(-) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index a75b82939..08ca51158 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -1,7 +1,7 @@ module HallOfFameModule using DynamicExpressions: AbstractExpression, string_tree -using ..UtilsModule: split_string +using ..UtilsModule: split_string, include_splits_on_newlines using ..CoreModule: MAX_DEGREE, AbstractOptions, Dataset, DATA_TYPE, LOSS_TYPE, relu, create_expression using ..ComplexityModule: compute_complexity @@ -149,23 +149,29 @@ function string_dominating_pareto_curve( y_prefix *= WILDCARD_UNIT_STRING end eqn_string = y_prefix * " = " * eqn_string - base_string_length = length(@sprintf("%-10d %-8.3e %8.3e ", 1, 1.0, 1.0)) - - dots = "..." - equation_width = (twidth - 1) - base_string_length - length(dots) + stats_columns_string = @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score) + left_cols_width = length(stats_columns_string) + output *= stats_columns_string + output *= wrap_equation_string(eqn_string, left_cols_width, twidth) + end + output *= "-"^(twidth - 1) + return output +end - output *= @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score) +function wrap_equation_string(eqn_string, left_cols_width, twidth) + dots = "..." + equation_width = (twidth - 1) - left_cols_width - length(dots) + output = "" - split_eqn = split_string(eqn_string, equation_width) - print_pad = false - while length(split_eqn) > 1 - cur_piece = popfirst!(split_eqn) - output *= " "^(print_pad * base_string_length) * cur_piece * dots * "\n" - print_pad = true - end - output *= " "^(print_pad * base_string_length) * split_eqn[1] * "\n" + split_eqn = split_string(eqn_string, equation_width) + split_eqn = include_splits_on_newlines(split_eqn) + print_pad = false + while length(split_eqn) > 1 + cur_piece = popfirst!(split_eqn) + output *= " "^(print_pad * left_cols_width) * cur_piece * dots * "\n" + print_pad = true end - output *= "-"^(twidth - 1) + output *= " "^(print_pad * left_cols_width) * only(split_eqn) * "\n" return output end diff --git a/src/Utils.jl b/src/Utils.jl index da67bcf4d..d889035c2 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -43,10 +43,7 @@ end split_string(s::String, n::Integer) ```jldoctest -split_string("abcdefgh", 3) - -# output - +julia> split_string("abcdefgh", 3) ["abc", "def", "gh"] ``` """ @@ -56,6 +53,21 @@ function split_string(s::String, n::Integer) I = eachindex(s) |> collect return [s[I[i]:I[min(i + n - 1, end)]] for i in 1:n:length(s)] end +""" + include_splits_on_newlines(split_eqn::Vector{String}) + +For output of `split_string`, this adds more splits, based on newlines. +However, it filters newlines that are at the beginning of a string. +""" +function include_splits_on_newlines(split_eqn::Vector{String}) + output = sizehint!(String[], length(split_eqn)) + for piece in split_eqn + piece = replace(piece, r"^\n" => "") + subpieces = split(piece, '\n') + append!(output, subpieces) + end + return output +end """ Tiny equivalent to StaticArrays.MVector diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl index 56cfa1f6b..6b326a9dd 100644 --- a/test/test_pretty_printing.jl +++ b/test/test_pretty_printing.jl @@ -105,3 +105,12 @@ end s = sprint((io, ex) -> print_tree(io, ex, options), ex) @test strip(s) == "sin(x) / (y - y)" end + +@testitem "printing utilities" tags = [:part2] begin + using SymbolicRegression.UtilsModule: split_string + using SymbolicRegression.HallOfFameModule: wrap_equation_string + + @test split_string("abc\ndefg", 3) == ["abc", "\nde", "fg"] + @test wrap_equation_string("abcdefghijklmnop\nqrs\ntuvwxyz", 10, 30) == + "abcdefghijklmnop...\n qrs...\n tuvwxyz\n" +end From be2fccb92c3c3533a72eda32b557db21a5fc6ed2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:07:51 +0100 Subject: [PATCH 23/74] feat: improve printing styling for multi-line output --- src/HallOfFame.jl | 37 ++++++++++++++++++++++++++++-------- src/Utils.jl | 15 --------------- test/test_pretty_printing.jl | 3 --- 3 files changed, 29 insertions(+), 26 deletions(-) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index 08ca51158..366e9c704 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -1,7 +1,7 @@ module HallOfFameModule using DynamicExpressions: AbstractExpression, string_tree -using ..UtilsModule: split_string, include_splits_on_newlines +using ..UtilsModule: split_string using ..CoreModule: MAX_DEGREE, AbstractOptions, Dataset, DATA_TYPE, LOSS_TYPE, relu, create_expression using ..ComplexityModule: compute_complexity @@ -148,11 +148,12 @@ function string_dominating_pareto_curve( if dataset.y_sym_units === nothing && dataset.X_sym_units !== nothing y_prefix *= WILDCARD_UNIT_STRING end - eqn_string = y_prefix * " = " * eqn_string + prefix = y_prefix * " = " + eqn_string = prefix * eqn_string stats_columns_string = @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score) left_cols_width = length(stats_columns_string) output *= stats_columns_string - output *= wrap_equation_string(eqn_string, left_cols_width, twidth) + output *= wrap_equation_string(eqn_string, left_cols_width + length(prefix), twidth) end output *= "-"^(twidth - 1) return output @@ -164,14 +165,34 @@ function wrap_equation_string(eqn_string, left_cols_width, twidth) output = "" split_eqn = split_string(eqn_string, equation_width) - split_eqn = include_splits_on_newlines(split_eqn) + split_eqn_with_metadata = map( + ((i, piece),) -> let is_before_last = i < length(split_eqn) + (piece, is_before_last) + end, enumerate(split_eqn) + ) print_pad = false - while length(split_eqn) > 1 - cur_piece = popfirst!(split_eqn) - output *= " "^(print_pad * left_cols_width) * cur_piece * dots * "\n" + while length(split_eqn_with_metadata) > 0 + (cur_piece, requires_dots) = popfirst!(split_eqn_with_metadata)::Tuple{String,Bool} + if occursin(r"\n", cur_piece) + inner_splits = split(cur_piece, '\n') + cur_piece = popfirst!(inner_splits) + prepend!( + split_eqn_with_metadata, + map( + ((i, piece),) -> let is_last = i == length(inner_splits) + (piece, is_last && requires_dots) + end, + enumerate(inner_splits), + ), + ) + end + output *= " "^(print_pad * left_cols_width) * cur_piece + if requires_dots + output *= dots + end + output *= "\n" print_pad = true end - output *= " "^(print_pad * left_cols_width) * only(split_eqn) * "\n" return output end diff --git a/src/Utils.jl b/src/Utils.jl index d889035c2..50f1a3bee 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -53,21 +53,6 @@ function split_string(s::String, n::Integer) I = eachindex(s) |> collect return [s[I[i]:I[min(i + n - 1, end)]] for i in 1:n:length(s)] end -""" - include_splits_on_newlines(split_eqn::Vector{String}) - -For output of `split_string`, this adds more splits, based on newlines. -However, it filters newlines that are at the beginning of a string. -""" -function include_splits_on_newlines(split_eqn::Vector{String}) - output = sizehint!(String[], length(split_eqn)) - for piece in split_eqn - piece = replace(piece, r"^\n" => "") - subpieces = split(piece, '\n') - append!(output, subpieces) - end - return output -end """ Tiny equivalent to StaticArrays.MVector diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl index 6b326a9dd..5949582c6 100644 --- a/test/test_pretty_printing.jl +++ b/test/test_pretty_printing.jl @@ -108,9 +108,6 @@ end @testitem "printing utilities" tags = [:part2] begin using SymbolicRegression.UtilsModule: split_string - using SymbolicRegression.HallOfFameModule: wrap_equation_string @test split_string("abc\ndefg", 3) == ["abc", "\nde", "fg"] - @test wrap_equation_string("abcdefghijklmnop\nqrs\ntuvwxyz", 10, 30) == - "abcdefghijklmnop...\n qrs...\n tuvwxyz\n" end From 316e13e153f153c6ef41ecfd856a5cd70b24d5f8 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:21:52 +0100 Subject: [PATCH 24/74] test: more tests of equation printing --- test/test_pretty_printing.jl | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl index 5949582c6..4cc51960b 100644 --- a/test/test_pretty_printing.jl +++ b/test/test_pretty_printing.jl @@ -108,6 +108,35 @@ end @testitem "printing utilities" tags = [:part2] begin using SymbolicRegression.UtilsModule: split_string + using SymbolicRegression.HallOfFameModule: wrap_equation_string @test split_string("abc\ndefg", 3) == ["abc", "\nde", "fg"] + + test_equation_string = "cos(x) + 1.5387438743 - y^2" + @test wrap_equation_string(test_equation_string, 0, 15) == """cos(x) + 1.... + 5387438743 ... + - y^2\n""" + + # Note how we have special treatment of explicit newlines: + test_equation_string = "(\nB = ( -0.012549, 0.0086419, 0.6175 )\nF_d = (-0.051546) * v\n)" + @test wrap_equation_string(test_equation_string, 4, 1000) == """( + B = ( -0.012549, 0.0086419, 0.6175 ) + F_d = (-0.051546) * v + ) +""" + + @test startswith(wrap_equation_string(test_equation_string, 0, 10), "(\n") + @test_broken wrap_equation_string(test_equation_string, 0, 12) == """( +B = ... +( -0.0... +12549,... + 0.008... +6419, ... +0.6175... + ) +F_d... + = (-0... +.05154... +6) * v... +)""" end From 10abb4f166c1ccfac7c9741b8d4843d256f8cbef Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:22:48 +0100 Subject: [PATCH 25/74] feat!: default maxsize now 30 --- src/Options.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/Options.jl b/src/Options.jl index 4c84eb56e..142012275 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -454,7 +454,7 @@ $(OPTION_DESCRIPTIONS) dimensional_constraint_penalty::Union{Nothing,Real}=nothing, dimensionless_constants_only::Bool=false, alpha::Real=0.100000, - maxsize::Integer=20, + maxsize::Integer=30, maxdepth::Union{Nothing,Integer}=nothing, turbo::Bool=false, bumper::Bool=false, From 2ff1e2b4ce036c092d9644bd808cea83bdaf1346 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:23:06 +0100 Subject: [PATCH 26/74] fix: add expected array type for type mismatch --- src/TemplateExpression.jl | 1 + 1 file changed, 1 insertion(+) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 9cab5da90..b459bda51 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -383,6 +383,7 @@ function (ex::TemplateExpression)( return callable(X, operators; kws...) end end +@unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any function DA.violates_dimensional_constraints( tree::TemplateExpression, dataset::Dataset, options::AbstractOptions From 3f1836bf32548dbad0ef0d951856f1b3001c8aad Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:26:09 +0100 Subject: [PATCH 27/74] docs: add complex template expression example --- examples/template_expression_complex.jl | 103 ++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 examples/template_expression_complex.jl diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl new file mode 100644 index 000000000..bfcaa22cf --- /dev/null +++ b/examples/template_expression_complex.jl @@ -0,0 +1,103 @@ +using SymbolicRegression +using Random: AbstractRNG, default_rng, MersenneTwister +using MLJBase: machine, fit!, report +using Test: @test + +function cross((a1, a2, a3), (b1, b2, b3)) + return (a2 * b3 - a3 * b2, a3 * b1 - a1 * b3, a1 * b2 - a2 * b1) +end + +options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) +operators = options.operators + +# Inputs: time since experiment start, velocity, room temperature +variable_names = ["t", "v_x", "v_y", "v_z", "T"] + +# Latents: the magnetic field (in 3D), drag force +variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5]) + +# Targets: the total force vector on the particle (in 3D) + +# First, let's generate our example data. +function simulate(rng::AbstractRNG=default_rng()) + # Say that each time we run the experiment, the temperature is a bit different: + T = 298.15 + 0.5 * rand(rng) + + # We run the experiment, and record the velocity at a random time + # between 0 and 10 seconds. + t = 10 * rand(rng) + + # We introduce a particle at a random velocity between -1 and 1 + v = ntuple(_ -> 2 * rand(rng) - 1, 3) + + ### TRUE (unknown) MODEL ### + # The magnetic field is sinusoidal, with frequency 1 Hz, + # along axes x and y, and decays along the z-axis. + ω = 2π + B = (sin(ω * t), cos(ω * t), exp(-t / 10)) + + # We assume the drag force is linear in the velocity and + # depends on the temperature with a power law. + F_d = -1e-5 * T^(3//2) .* v #= The last part is known, though =# + ############################ + + F_mag = cross(v, B) + F = F_d .+ F_mag + + return (; t, v, T, F, B, F_d, F_mag) +end + +struct ForceVector{T} + x::T + y::T + z::T +end + +X, y, other = let + rng = MersenneTwister(0) + n = 1000 + data = [simulate(rng) for _ in 1:n] + X = (; + t=map(d -> d.t, data), + v_x=map(d -> d.v[1], data), + v_y=map(d -> d.v[2], data), + v_z=map(d -> d.v[3], data), + T=map(d -> d.T, data), + ) + y = map(d -> ForceVector(d.F...), data) + # To check our results at the end: + other = (; + B=map(d -> d.B, data), F_d=map(d -> d.F_d, data), F_mag=map(d -> d.F_mag, data) + ) + + X, y, other +end + +structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; + combine_strings=e -> + "\nB = ( $(e.B_x), $(e.B_y), $(e.B_z) )\nF_d = ($(e.F_d_scale)) * v", + combine_vectors=(e, X) -> [ + let v = (X[2, i], X[3, i], X[4, i]), + F_d = e.F_d_scale[i] .* v, + B = (e.B_x[i], e.B_y[i], e.B_z[i]), + F_mag = cross(v, B) + + ForceVector((F_d .+ F_mag)...) + end for i in eachindex(axes(X, 2), e.F_d_scale, e.B_x, e.B_y, e.B_z) + ], + variable_constraints, +) + +model = SRRegressor(; + binary_operators=(+, -, *, /), + unary_operators=(sin, cos, sqrt, exp), + niterations=100, + maxsize=30, + expression_type=TemplateExpression, + expression_options=(; structure), + # The elementwise needs to operate directly on each row of `y`: + elementwise_loss=(F1, F2) -> (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2, +) + +mach = machine(model, X, y) +fit!(mach) From 75fd548fb68eea240cb70d48ec0171bba950a129 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:38:44 +0100 Subject: [PATCH 28/74] docs: make complex template example easier to understand --- examples/template_expression_complex.jl | 77 +++++++++++++++---------- 1 file changed, 46 insertions(+), 31 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index bfcaa22cf..7d7f69896 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -47,45 +47,60 @@ function simulate(rng::AbstractRNG=default_rng()) return (; t, v, T, F, B, F_d, F_mag) end +rng = MersenneTwister(0) +n = 1000 + +data = [simulate(rng) for _ in 1:n] + +X = (; + t=map(d -> d.t, data), + v_x=map(d -> d.v[1], data), + v_y=map(d -> d.v[2], data), + v_z=map(d -> d.v[3], data), + T=map(d -> d.T, data), +) + +# We can regress directly on a struct! struct ForceVector{T} x::T y::T z::T end +y = map(d -> ForceVector(d.F...), data) + +# The trick is to define the right structure function. +# First, let's just make a function that prints the expression: +function combine_strings(e) + return "\nB = ( $(e.B_x), $(e.B_y), $(e.B_z) )\nF_d = ($(e.F_d_scale)) * v" +end + +# So, this will just print the separate B and F_d expressions we've learned. + +# Then, let's define an expression that takes the numerical values +# evaluated in the TemplateExpression, and combines them into the resultant +# force vector. Inside this function, we can do whatever we want. + +function combine_vectors(e, X) + # Extract the 3D velocity vectors from the input matrix: + v = map(x -> (x[2], x[3], x[4]), eachcol(X)) + + # Use this to compute the full drag force: + F_d = map((fd, v) -> fd .* v, e.F_d_scale, v) + + # Collect the magnetic field components that we've learned into the vector: + B = map(tuple, e.B_x, e.B_y, e.B_z) + + # Using this, we compute the magnetic force with a cross product: + F_mag = map(cross, v, B) -X, y, other = let - rng = MersenneTwister(0) - n = 1000 - data = [simulate(rng) for _ in 1:n] - X = (; - t=map(d -> d.t, data), - v_x=map(d -> d.v[1], data), - v_y=map(d -> d.v[2], data), - v_z=map(d -> d.v[3], data), - T=map(d -> d.T, data), - ) - y = map(d -> ForceVector(d.F...), data) - # To check our results at the end: - other = (; - B=map(d -> d.B, data), F_d=map(d -> d.F_d, data), F_mag=map(d -> d.F_mag, data) - ) - - X, y, other + # Finally, we combine the drag and magnetic forces into the total force: + return map((fd, fm) -> ForceVector((fd .+ fm)...), F_d, F_mag) end structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; - combine_strings=e -> - "\nB = ( $(e.B_x), $(e.B_y), $(e.B_z) )\nF_d = ($(e.F_d_scale)) * v", - combine_vectors=(e, X) -> [ - let v = (X[2, i], X[3, i], X[4, i]), - F_d = e.F_d_scale[i] .* v, - B = (e.B_x[i], e.B_y[i], e.B_z[i]), - F_mag = cross(v, B) - - ForceVector((F_d .+ F_mag)...) - end for i in eachindex(axes(X, 2), e.F_d_scale, e.B_x, e.B_y, e.B_z) - ], - variable_constraints, + combine_strings=combine_strings, + combine_vectors=combine_vectors, + variable_constraints=variable_constraints, ) model = SRRegressor(; @@ -94,7 +109,7 @@ model = SRRegressor(; niterations=100, maxsize=30, expression_type=TemplateExpression, - expression_options=(; structure), + expression_options=(; structure=structure), # The elementwise needs to operate directly on each row of `y`: elementwise_loss=(F1, F2) -> (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2, ) From 90aa4846ee33ff2f9c429d51d6e8d8f53edfc8fd Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 10:51:07 +0100 Subject: [PATCH 29/74] docs: clean up complex template expression example --- examples/template_expression_complex.jl | 84 +++++++++++++------------ 1 file changed, 45 insertions(+), 39 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index 7d7f69896..dcd36746d 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -10,48 +10,41 @@ end options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) operators = options.operators -# Inputs: time since experiment start, velocity, room temperature -variable_names = ["t", "v_x", "v_y", "v_z", "T"] - -# Latents: the magnetic field (in 3D), drag force -variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5]) - -# Targets: the total force vector on the particle (in 3D) - # First, let's generate our example data. -function simulate(rng::AbstractRNG=default_rng()) - # Say that each time we run the experiment, the temperature is a bit different: - T = 298.15 + 0.5 * rand(rng) - - # We run the experiment, and record the velocity at a random time - # between 0 and 10 seconds. - t = 10 * rand(rng) +# Let's take 1000 trials: +n = 1000 +rng = MersenneTwister(0) - # We introduce a particle at a random velocity between -1 and 1 - v = ntuple(_ -> 2 * rand(rng) - 1, 3) +# Say that each time we run the experiment, the temperature is a bit different: +T = 298.15 .+ 0.5 .* rand(rng, n) - ### TRUE (unknown) MODEL ### - # The magnetic field is sinusoidal, with frequency 1 Hz, - # along axes x and y, and decays along the z-axis. - ω = 2π - B = (sin(ω * t), cos(ω * t), exp(-t / 10)) +# We run the experiment, and record the velocity at a random time +# between 0 and 10 seconds. +t = 10 .* rand(rng, n) - # We assume the drag force is linear in the velocity and - # depends on the temperature with a power law. - F_d = -1e-5 * T^(3//2) .* v #= The last part is known, though =# - ############################ +# We introduce a particle at a random velocity between -1 and 1 +v = [ntuple(_ -> 2 * rand(rng) - 1, 3) for _ in 1:n] - F_mag = cross(v, B) - F = F_d .+ F_mag +### TRUE (unknown) MODEL ### +# Let's assume magnetic field is sinusoidal, with frequency 1 Hz, +# along axes x and y, and decays over t along the z-axis. +ω = 2π +B = map(ti -> (sin(ω * ti), cos(ω * ti), exp(-ti / 10)), t) - return (; t, v, T, F, B, F_d, F_mag) -end +# We assume the drag force is linear in the velocity and +# depends on the temperature with a power law. +F_d = map((Ti, vi) -> -1e-5 .* Ti^(3//2) .* v, T, v) +############################ -rng = MersenneTwister(0) -n = 1000 +# Now, let's compute the true magnetic force: +F_mag = map(cross, v, B) +# And sum it to get the total force: +F = F_d .+ F_mag -data = [simulate(rng) for _ in 1:n] +# This forms our dataset! +data = (; t, v, T, F, B, F_d, F_mag) +# Now, let's format it for input to the regressor: X = (; t=map(d -> d.t, data), v_x=map(d -> d.v[1], data), @@ -68,9 +61,13 @@ struct ForceVector{T} end y = map(d -> ForceVector(d.F...), data) +# Our variable names are the keys of the struct: +variable_names = ["t", "v_x", "v_y", "v_z", "T"] + # The trick is to define the right structure function. # First, let's just make a function that prints the expression: function combine_strings(e) + # e is a named tuple of strings representing each formula return "\nB = ( $(e.B_x), $(e.B_y), $(e.B_z) )\nF_d = ($(e.F_d_scale)) * v" end @@ -81,22 +78,31 @@ end # force vector. Inside this function, we can do whatever we want. function combine_vectors(e, X) - # Extract the 3D velocity vectors from the input matrix: - v = map(x -> (x[2], x[3], x[4]), eachcol(X)) + # This time, e is a named tuple of *vectors*, representing the batched + # evaluation of each formula. + + # First, extract the 3D velocity vectors from the input matrix: + v = [(X[2, i], X[3, i], X[4, i]) for i in eachindex(axes(X, 2))] # Use this to compute the full drag force: - F_d = map((fd, v) -> fd .* v, e.F_d_scale, v) + F_d = [e.F_d_scale[i] .* v[i] for i in eachindex(v)] # Collect the magnetic field components that we've learned into the vector: - B = map(tuple, e.B_x, e.B_y, e.B_z) + B = [(e.B_x[i], e.B_y[i], e.B_z[i]) for i in eachindex(e.B_x)] # Using this, we compute the magnetic force with a cross product: - F_mag = map(cross, v, B) + F_mag = [cross(v[i], B[i]) for i in eachindex(v)] # Finally, we combine the drag and magnetic forces into the total force: - return map((fd, fm) -> ForceVector((fd .+ fm)...), F_d, F_mag) + return [ForceVector((F_d[i] .+ F_mag[i])...) for i in eachindex(F_d)] end +# For the functions we wish to learn, we can constraint what variables +# each of them depends on, explicitly. Let's say B only depends on time, +# and the drag force scale only depends on temperature (we explicitly +# multiply the velocity in) +variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5]) + structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; combine_strings=combine_strings, combine_vectors=combine_vectors, From 02644663ff0c8cc201afffd1a42bebe3b170c69e Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 11:53:03 +0100 Subject: [PATCH 30/74] test: fix JET error --- src/TemplateExpression.jl | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index b459bda51..5099d5927 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -94,8 +94,15 @@ function TemplateStructure(; S<:Union{Nothing,Function}, C<:Union{Nothing,NamedTuple{<:Any,<:Tuple{Vararg{Vector{Int}}}}}, } - K === nothing && - variable_constraints === nothing && + Kout = if K !== nothing && variable_constraints !== nothing + K != keys(variable_constraints) && + throw(ArgumentError("`K` must match the keys of `variable_constraints`.")) + K + elseif K !== nothing + K + elseif variable_constraints !== nothing + keys(variable_constraints) + else throw( ArgumentError( "If `variable_constraints` is not provided, " * @@ -103,12 +110,7 @@ function TemplateStructure(; "`TemplateStructure{K}(...)`, for tuple of symbols `K`.", ), ) - K !== nothing && - variable_constraints !== nothing && - K != keys(variable_constraints) && - throw(ArgumentError("`K` must match the keys of `variable_constraints`.")) - - Kout = K === nothing ? keys(variable_constraints::NamedTuple) : K + end return TemplateStructure{Kout,E,N,S,C}( combine, combine_vectors, combine_strings, variable_constraints ) From 026bb5ab4d7fc5ce84c08e2ae0c94939b270b53d Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 13:01:24 +0100 Subject: [PATCH 31/74] test: update test to new maxsize --- test/test_search_statistics.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/test_search_statistics.jl b/test/test_search_statistics.jl index cc2f5360a..c22425b00 100644 --- a/test/test_search_statistics.jl +++ b/test/test_search_statistics.jl @@ -13,7 +13,7 @@ end normalize_frequencies!(statistics) -@test sum(statistics.frequencies) == 1020 +@test sum(statistics.frequencies) == 1030 @test sum(statistics.normalized_frequencies) ≈ 1.0 @test statistics.normalized_frequencies[5] > statistics.normalized_frequencies[15] From a40b78169f9237712ad6e3fffe9efeb73b3e3dfc Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 13:07:28 +0100 Subject: [PATCH 32/74] test: get second broken string test working --- test/test_pretty_printing.jl | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl index 4cc51960b..8b300767c 100644 --- a/test/test_pretty_printing.jl +++ b/test/test_pretty_printing.jl @@ -126,17 +126,16 @@ end """ @test startswith(wrap_equation_string(test_equation_string, 0, 10), "(\n") - @test_broken wrap_equation_string(test_equation_string, 0, 12) == """( -B = ... -( -0.0... -12549,... - 0.008... -6419, ... -0.6175... - ) -F_d... - = (-0... -.05154... -6) * v... -)""" + @test wrap_equation_string(test_equation_string, 0, 12) == """( +B = ( ... +-0.01254... +9, 0.008... +6419, 0.... +6175 ) +F... +_d = (-0... +.051546)... + * v +) +""" end From b62598f8b9fa3d8e2925a96173dff777d78e156d Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 21 Oct 2024 13:19:32 +0100 Subject: [PATCH 33/74] fix: printing with explicit newline --- src/HallOfFame.jl | 35 +++++++++++++---------------------- src/Utils.jl | 4 ++-- test/test_pretty_printing.jl | 17 ++++++++--------- 3 files changed, 23 insertions(+), 33 deletions(-) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index 366e9c704..19d56c501 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -164,29 +164,20 @@ function wrap_equation_string(eqn_string, left_cols_width, twidth) equation_width = (twidth - 1) - left_cols_width - length(dots) output = "" - split_eqn = split_string(eqn_string, equation_width) - split_eqn_with_metadata = map( - ((i, piece),) -> let is_before_last = i < length(split_eqn) - (piece, is_before_last) - end, enumerate(split_eqn) - ) - print_pad = false - while length(split_eqn_with_metadata) > 0 - (cur_piece, requires_dots) = popfirst!(split_eqn_with_metadata)::Tuple{String,Bool} - if occursin(r"\n", cur_piece) - inner_splits = split(cur_piece, '\n') - cur_piece = popfirst!(inner_splits) - prepend!( - split_eqn_with_metadata, - map( - ((i, piece),) -> let is_last = i == length(inner_splits) - (piece, is_last && requires_dots) - end, - enumerate(inner_splits), - ), - ) + forced_split_eqn = split(eqn_string, '\n') + split_eqn = @NamedTuple{piece::String, requires_dots::Bool}[] + for piece in forced_split_eqn + subpieces = split_string(piece, equation_width) + for (i, subpiece) in enumerate(subpieces) + # We don't need dots on the last subpiece, since it + # is either the last row of the entire string, or it has + # an explicit newline in it! + push!(split_eqn, (piece=subpiece, requires_dots=i < length(subpieces))) end - output *= " "^(print_pad * left_cols_width) * cur_piece + end + print_pad = false + for (; piece, requires_dots) in split_eqn + output *= " "^(print_pad * left_cols_width) * piece if requires_dots output *= dots end diff --git a/src/Utils.jl b/src/Utils.jl index 50f1a3bee..2ee29f16c 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -40,14 +40,14 @@ function subscriptify(number::Integer) end """ - split_string(s::String, n::Integer) + split_string(s::AbstractString, n::Integer) ```jldoctest julia> split_string("abcdefgh", 3) ["abc", "def", "gh"] ``` """ -function split_string(s::String, n::Integer) +function split_string(s::AbstractString, n::Integer) length(s) <= n && return [s] # Due to unicode characters, need to split only at valid indices: I = eachindex(s) |> collect diff --git a/test/test_pretty_printing.jl b/test/test_pretty_printing.jl index 8b300767c..dcbc3f59c 100644 --- a/test/test_pretty_printing.jl +++ b/test/test_pretty_printing.jl @@ -127,15 +127,14 @@ end @test startswith(wrap_equation_string(test_equation_string, 0, 10), "(\n") @test wrap_equation_string(test_equation_string, 0, 12) == """( -B = ( ... --0.01254... -9, 0.008... -6419, 0.... -6175 ) -F... -_d = (-0... -.051546)... - * v +B = ( -0... +.012549,... + 0.00864... +19, 0.61... +75 ) +F_d = (-... +0.051546... +) * v ) """ end From a810dc5215c3c0c789491da42756c7f2db6166b3 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 19:13:52 +0100 Subject: [PATCH 34/74] refactor: IOBuffer rather than string concat --- src/HallOfFame.jl | 55 +++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index 19d56c501..b99ac03a2 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -122,13 +122,13 @@ end function string_dominating_pareto_curve( hallOfFame, dataset, options; width::Union{Integer,Nothing}=nothing ) - twidth = (width === nothing) ? 100 : max(100, width::Integer) - output = "" - output *= "Hall of Fame:\n" - # TODO: Get user's terminal width. - output *= "-"^(twidth - 1) * "\n" - output *= @sprintf( - "%-10s %-8s %-8s %-8s\n", "Complexity", "Loss", "Score", "Equation" + terminal_width = (width === nothing) ? 100 : max(100, width::Integer) + buffer = IOBuffer() + println(buffer, "Hall of Fame:") + println(buffer, '-'^(terminal_width - 1)) + print( + buffer, + @sprintf("%-10s %-8s %-8s %-8s\n", "Complexity", "Loss", "Score", "Equation") ) formatted = format_hall_of_fame(hallOfFame, options) @@ -152,39 +152,42 @@ function string_dominating_pareto_curve( eqn_string = prefix * eqn_string stats_columns_string = @sprintf("%-10d %-8.3e %-8.3e ", complexity, loss, score) left_cols_width = length(stats_columns_string) - output *= stats_columns_string - output *= wrap_equation_string(eqn_string, left_cols_width + length(prefix), twidth) + print(buffer, stats_columns_string) + print( + buffer, + wrap_equation_string( + eqn_string, left_cols_width + length(prefix), terminal_width + ), + ) end - output *= "-"^(twidth - 1) - return output + print(buffer, '-'^(terminal_width - 1)) + return String(take!(buffer)) end -function wrap_equation_string(eqn_string, left_cols_width, twidth) +function wrap_equation_string(eqn_string, left_cols_width, terminal_width) dots = "..." - equation_width = (twidth - 1) - left_cols_width - length(dots) - output = "" + equation_width = (terminal_width - 1) - left_cols_width - length(dots) + buffer = IOBuffer() forced_split_eqn = split(eqn_string, '\n') - split_eqn = @NamedTuple{piece::String, requires_dots::Bool}[] + print_pad = false for piece in forced_split_eqn subpieces = split_string(piece, equation_width) for (i, subpiece) in enumerate(subpieces) # We don't need dots on the last subpiece, since it # is either the last row of the entire string, or it has # an explicit newline in it! - push!(split_eqn, (piece=subpiece, requires_dots=i < length(subpieces))) - end - end - print_pad = false - for (; piece, requires_dots) in split_eqn - output *= " "^(print_pad * left_cols_width) * piece - if requires_dots - output *= dots + requires_dots = i < length(subpieces) + print(buffer, ' '^(print_pad * left_cols_width)) + print(buffer, subpiece) + if requires_dots + print(buffer, dots) + end + println(buffer) + print_pad = true end - output *= "\n" - print_pad = true end - return output + return String(take!(buffer)) end function format_hall_of_fame(hof::HallOfFame{T,L}, options) where {T,L} From 0cbc02db3ee3dbf55e1aa897adc8e85fe5f75ec4 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 19:33:33 +0100 Subject: [PATCH 35/74] docs: tweak example --- examples/template_expression_complex.jl | 56 +++++++++++++++---------- 1 file changed, 35 insertions(+), 21 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index dcd36746d..e20d6bcb6 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -29,28 +29,32 @@ v = [ntuple(_ -> 2 * rand(rng) - 1, 3) for _ in 1:n] # Let's assume magnetic field is sinusoidal, with frequency 1 Hz, # along axes x and y, and decays over t along the z-axis. ω = 2π -B = map(ti -> (sin(ω * ti), cos(ω * ti), exp(-ti / 10)), t) +B = [(sin(ω * ti), cos(ω * ti), exp(-ti / 10)) for ti in t] # We assume the drag force is linear in the velocity and # depends on the temperature with a power law. -F_d = map((Ti, vi) -> -1e-5 .* Ti^(3//2) .* v, T, v) +F_d = [-1e-5 * Ti^(3//2) .* vi for (Ti, vi) in zip(T, v)] ############################ # Now, let's compute the true magnetic force: -F_mag = map(cross, v, B) +F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] # And sum it to get the total force: -F = F_d .+ F_mag +F = [fd .+ fm for (fd, fm) in zip(F_d, F_mag)] + +# And some random other expression to spice things up: +E = [sin(ω * ti) * cos(ω * ti) for ti in t] # This forms our dataset! -data = (; t, v, T, F, B, F_d, F_mag) +data = (; t, v, T, F, B, F_d, F_mag, E) # Now, let's format it for input to the regressor: X = (; - t=map(d -> d.t, data), - v_x=map(d -> d.v[1], data), - v_y=map(d -> d.v[2], data), - v_z=map(d -> d.v[3], data), - T=map(d -> d.T, data), + t=data.t, + v_x=[vi[1] for vi in data.v], + v_y=[vi[2] for vi in data.v], + v_z=[vi[3] for vi in data.v], + T=data.T, + E=data.E, ) # We can regress directly on a struct! @@ -58,8 +62,9 @@ struct ForceVector{T} x::T y::T z::T + E::T end -y = map(d -> ForceVector(d.F...), data) +y = [ForceVector(F..., E) for (F, E) in zip(data.F, data.E)] # Our variable names are the keys of the struct: variable_names = ["t", "v_x", "v_y", "v_z", "T"] @@ -68,7 +73,10 @@ variable_names = ["t", "v_x", "v_y", "v_z", "T"] # First, let's just make a function that prints the expression: function combine_strings(e) # e is a named tuple of strings representing each formula - return "\nB = ( $(e.B_x), $(e.B_y), $(e.B_z) )\nF_d = ($(e.F_d_scale)) * v" + B_x_padded = e.B_x + B_y_padded = e.B_y + B_z_padded = e.B_z + return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n │ 𝐅 = ($(e.F_d_scale)) * 𝐯\n ╰ E = $(e.E)" end # So, this will just print the separate B and F_d expressions we've learned. @@ -85,25 +93,27 @@ function combine_vectors(e, X) v = [(X[2, i], X[3, i], X[4, i]) for i in eachindex(axes(X, 2))] # Use this to compute the full drag force: - F_d = [e.F_d_scale[i] .* v[i] for i in eachindex(v)] + F_d = [F_d_scale_i .* vi for (F_d_scale_i, vi) in zip(e.F_d_scale, v)] # Collect the magnetic field components that we've learned into the vector: - B = [(e.B_x[i], e.B_y[i], e.B_z[i]) for i in eachindex(e.B_x)] + B = [(bx, by, bz) for (bx, by, bz) in zip(e.B_x, e.B_y, e.B_z)] # Using this, we compute the magnetic force with a cross product: - F_mag = [cross(v[i], B[i]) for i in eachindex(v)] + F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] + + E = e.E # Finally, we combine the drag and magnetic forces into the total force: - return [ForceVector((F_d[i] .+ F_mag[i])...) for i in eachindex(F_d)] + return [ForceVector((fd .+ fm)..., ei) for (fd, fm, ei) in zip(F_d, F_mag, E)] end # For the functions we wish to learn, we can constraint what variables # each of them depends on, explicitly. Let's say B only depends on time, # and the drag force scale only depends on temperature (we explicitly # multiply the velocity in) -variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5]) +variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5], E=[1]) -structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; +structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :E)}(; combine_strings=combine_strings, combine_vectors=combine_vectors, variable_constraints=variable_constraints, @@ -112,12 +122,16 @@ structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; model = SRRegressor(; binary_operators=(+, -, *, /), unary_operators=(sin, cos, sqrt, exp), - niterations=100, - maxsize=30, + niterations=500, + maxsize=35, expression_type=TemplateExpression, expression_options=(; structure=structure), # The elementwise needs to operate directly on each row of `y`: - elementwise_loss=(F1, F2) -> (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2, + elementwise_loss=(F1, F2) -> + (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2 + (F1.E - F2.E)^2, + mutation_weights=MutationWeights(; rotate_tree=0.5), + batching=true, + batch_size=30, ) mach = machine(model, X, y) From dde3d83b9b0f643f44b44dbbe5e7f4e8808a1aa2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 22:23:19 +0100 Subject: [PATCH 36/74] feat!: store CSV in `outputs` folder --- src/MLJInterface.jl | 4 ++++ src/Options.jl | 33 ++++++++++++++++----------------- src/OptionsStruct.jl | 2 +- src/SearchUtils.jl | 37 +++++++++++++++++++++++++++++++------ src/SymbolicRegression.jl | 4 +++- test/test_params.jl | 1 - 6 files changed, 55 insertions(+), 26 deletions(-) diff --git a/src/MLJInterface.jl b/src/MLJInterface.jl index 4d7a1d140..914c39697 100644 --- a/src/MLJInterface.jl +++ b/src/MLJInterface.jl @@ -56,6 +56,7 @@ function modelexpr(model_name::Symbol) addprocs_function::Union{Function,Nothing} = nothing heap_size_hint_in_bytes::Union{Integer,Nothing} = nothing runtests::Bool = true + run_id::Union{String,Nothing} = nothing loss_type::L = Nothing selection_method::Function = choose_best dimensions_type::Type{D} = SymbolicDimensions{DEFAULT_DIM_BASE_TYPE} @@ -202,6 +203,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class runtests=m.runtests, saved_state=(old_fitresult === nothing ? nothing : old_fitresult.state), return_state=true, + run_id=m.run_id, loss_type=m.loss_type, X_units=X_units_clean, y_units=y_units_clean, @@ -567,6 +569,8 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt - `runtests::Bool=true`: Whether to run (quick) tests before starting the search, to see if there will be any problems during the equation search related to the host environment. + - `run_id::Union{String,Nothing}=nothing`: A unique identifier for the run. + If not specified, a unique ID will be generated. - `loss_type::Type=Nothing`: If you would like to use a different type for the loss than for the data you passed, specify the type here. Note that if you pass complex data `::Complex{L}`, then the loss diff --git a/src/Options.jl b/src/Options.jl index 142012275..980e25edc 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -2,7 +2,6 @@ module OptionsModule using DispatchDoctor: @unstable using Optim: Optim -using Dates: Dates using StatsBase: StatsBase using DynamicExpressions: OperatorEnum, Expression, default_node_type using ADTypes: AbstractADType, ADTypes @@ -206,7 +205,6 @@ const deprecated_options_mapping = Base.ImmutableDict( :mutationWeights => :mutation_weights, :hofMigration => :hof_migration, :shouldOptimizeConstants => :should_optimize_constants, - :hofFile => :output_file, :perturbationFactor => :perturbation_factor, :batchSize => :batch_size, :crossoverProbability => :crossover_probability, @@ -382,7 +380,6 @@ const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators type, such as `:Zygote` for Zygote, `:Enzyme`, etc. Most backends will not work, and many will never work due to incompatibilities, though support for some is gradually being added. -- `output_file`: What file to store equations to, as a backup. - `perturbation_factor`: When mutating a constant, either multiply or divide by (1+perturbation_factor)^(rand()+1). - `probability_negate_constant`: Probability of negating a constant in the equation @@ -399,6 +396,9 @@ const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators not. - `print_precision`: How many digits to print when printing equations. By default, this is 5. +- `output_directory`: The base directory to save output files to. Files + will be saved in a subdirectory according to the run ID. By default, + this is `./outputs`. - `save_to_file`: Whether to save equations to a file during the search. - `bin_constraints`: See `constraints`. This is the same, but specified for binary operators only (for example, if you have an operator that is both a binary @@ -463,6 +463,7 @@ $(OPTION_DESCRIPTIONS) should_simplify::Union{Nothing,Bool}=nothing, should_optimize_constants::Bool=true, output_file::Union{Nothing,AbstractString}=nothing, + output_directory::Union{Nothing,String}=nothing, expression_type::Type=Expression, node_type::Type=default_node_type(expression_type), expression_options::NamedTuple=NamedTuple(), @@ -537,7 +538,6 @@ $(OPTION_DESCRIPTIONS) #! format: off k == :hofMigration && (hof_migration = kws[k]; true) && continue k == :shouldOptimizeConstants && (should_optimize_constants = kws[k]; true) && continue - k == :hofFile && (output_file = kws[k]; true) && continue k == :perturbationFactor && (perturbation_factor = kws[k]; true) && continue k == :batchSize && (batch_size = kws[k]; true) && continue k == :crossoverProbability && (crossover_probability = kws[k]; true) && continue @@ -597,6 +597,9 @@ $(OPTION_DESCRIPTIONS) Optim.BFGS(; linesearch=LineSearches.BackTracking()) end end + if output_file !== nothing + error("`output_file` is deprecated. Use `output_directory` instead.") + end if elementwise_loss === nothing elementwise_loss = L2DistLoss() @@ -616,18 +619,6 @@ $(OPTION_DESCRIPTIONS) ) end - is_testing = parse(Bool, get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false")) - - if output_file === nothing - # "%Y-%m-%d_%H%M%S.%f" - date_time_str = Dates.format(Dates.now(), "yyyy-mm-dd_HHMMSS.sss") - output_file = "hall_of_fame_" * date_time_str * ".csv" - if is_testing - tmpdir = mktempdir() - output_file = joinpath(tmpdir, output_file) - end - end - @assert maxsize > 3 @assert warmup_maxsize_by >= 0.0f0 @assert length(unary_operators) <= max_ops @@ -733,6 +724,14 @@ $(OPTION_DESCRIPTIONS) ADTypes.Auto(autodiff_backend) end + _output_directory = + if output_directory === nothing && + get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false") == "true" + mktempdir() + else + output_directory + end + options = Options{ typeof(complexity_mapping), operator_specialization(typeof(operators), expression_type), @@ -763,7 +762,7 @@ $(OPTION_DESCRIPTIONS) hof_migration, should_simplify, should_optimize_constants, - output_file, + _output_directory, populations, perturbation_factor, annealing, diff --git a/src/OptionsStruct.jl b/src/OptionsStruct.jl index fa8a0035b..a391b242a 100644 --- a/src/OptionsStruct.jl +++ b/src/OptionsStruct.jl @@ -207,7 +207,7 @@ struct Options{ hof_migration::Bool should_simplify::Bool should_optimize_constants::Bool - output_file::String + output_directory::Union{String,Nothing} populations::Int perturbation_factor::Float32 annealing::Bool diff --git a/src/SearchUtils.jl b/src/SearchUtils.jl index 23358d9dc..5f04b5b0b 100644 --- a/src/SearchUtils.jl +++ b/src/SearchUtils.jl @@ -4,6 +4,7 @@ This includes: process management, stdin reading, checking for early stops.""" module SearchUtilsModule using Printf: @printf, @sprintf +using Dates: Dates using Distributed: Distributed, @spawnat, Future, procs, addprocs using StatsBase: mean using DispatchDoctor: @unstable @@ -56,6 +57,7 @@ struct RuntimeOptions{PARALLELISM,DIM_OUT,RETURN_STATE} <: AbstractRuntimeOption parallelism::Val{PARALLELISM} dim_out::Val{DIM_OUT} return_state::Val{RETURN_STATE} + run_id::String end @unstable @inline function Base.getproperty( roptions::RuntimeOptions{P,D,R}, name::Symbol @@ -85,6 +87,7 @@ end heap_size_hint_in_bytes::Union{Integer,Nothing}=nothing, runtests::Bool=true, return_state::Union{Bool,Nothing,Val}=nothing, + run_id::Union{String,Nothing}=nothing, verbosity::Union{Int,Nothing}=nothing, progress::Union{Bool,Nothing}=nothing, v_dim_out::Val{DIM_OUT}=Val(nothing), @@ -194,6 +197,12 @@ end `` end + _run_id = if run_id === nothing + generate_run_id() + else + run_id + end + return RuntimeOptions{concurrency,dim_out,_return_state}( niterations, _numprocs, @@ -206,9 +215,16 @@ end Val(concurrency), Val(dim_out), Val(_return_state), + _run_id, ) end +function generate_run_id() + date_str = Dates.format(Dates.now(), "yyyymmdd_HHMMSS") + h = join(rand(['0':'9'; 'a':'z'; 'A':'Z'], 6)) + return "$(date_str)_$h" +end + """A simple dictionary to track worker allocations.""" const WorkerAssignments = Dict{Tuple{Int,Int},Int} @@ -569,12 +585,21 @@ Base.@kwdef struct SearchState{T,L,N<:AbstractExpression{T},WorkerOutputType,Cha end function save_to_file( - dominating, nout::Integer, j::Integer, dataset::Dataset{T,L}, options::AbstractOptions + dominating, + nout::Integer, + j::Integer, + dataset::Dataset{T,L}, + options::AbstractOptions, + ropt::AbstractRuntimeOptions, ) where {T,L} - output_file = options.output_file - if nout > 1 - output_file = output_file * ".out$j" - end + output_directory = joinpath( + options.output_directory === nothing ? "outputs" : options.output_directory, + ropt.run_id, + ) + mkpath(output_directory) + filename = nout > 1 ? "hall_of_fame_output$(j).csv" : "hall_of_fame.csv" + output_file = joinpath(output_directory, filename) + dominating_n = length(dominating) complexities = Vector{Int}(undef, dominating_n) @@ -602,7 +627,7 @@ function save_to_file( end # Write file twice in case exit in middle of filewrite - for out_file in (output_file, output_file * ".bkup") + for out_file in (output_file, output_file * ".bak") open(out_file, "w") do io write(io, s) end diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index 0ffeac249..34ea2e961 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -433,6 +433,7 @@ function equation_search( runtests::Bool=true, saved_state=nothing, return_state::Union{Bool,Nothing,Val}=nothing, + run_id::Union{String,Nothing}=nothing, loss_type::Type{L}=Nothing, verbosity::Union{Integer,Nothing}=nothing, progress::Union{Bool,Nothing}=nothing, @@ -482,6 +483,7 @@ function equation_search( runtests=runtests, saved_state=saved_state, return_state=return_state, + run_id=run_id, verbosity=verbosity, progress=progress, v_dim_out=Val(DIM_OUT), @@ -864,7 +866,7 @@ function _main_search_loop!( dominating = calculate_pareto_frontier(state.halls_of_fame[j]) if options.save_to_file - save_to_file(dominating, nout, j, dataset, options) + save_to_file(dominating, nout, j, dataset, options, ropt) end ################################################################### # Migration ####################################################### diff --git a/test/test_params.jl b/test/test_params.jl index b74b58013..6c2f35006 100644 --- a/test/test_params.jl +++ b/test/test_params.jl @@ -30,7 +30,6 @@ const default_params = ( hof_migration=true, fraction_replaced_hof=0.1f0, should_optimize_constants=true, - output_file=nothing, perturbation_factor=1.000000f0, annealing=true, batching=false, From f8ad3542c8ad49e5fd24d8da93cd72a7adf274bf Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 22:28:21 +0100 Subject: [PATCH 37/74] fix!: MLJ saving unicode to csv --- src/MLJInterface.jl | 20 +++++++++++------ test/test_mlj.jl | 53 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 65 insertions(+), 8 deletions(-) diff --git a/src/MLJInterface.jl b/src/MLJInterface.jl index 914c39697..b2eb8db4a 100644 --- a/src/MLJInterface.jl +++ b/src/MLJInterface.jl @@ -174,7 +174,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class else old_fitresult.types end - X_t::types.X_t, variable_names, X_units::types.X_units = get_matrix_and_info( + X_t::types.X_t, variable_names, display_variable_names, X_units::types.X_units = get_matrix_and_info( X, m.dimensions_type ) y_t::types.y_t, y_variable_names, y_units::types.y_units = format_input_for( @@ -194,6 +194,7 @@ function _update(m, verbosity, old_fitresult, old_cache, X, y, w, options, class niterations=m.niterations, weights=w_t, variable_names=variable_names, + display_variable_names=display_variable_names, options=options, parallelism=m.parallelism, numprocs=m.numprocs, @@ -254,14 +255,17 @@ end function get_matrix_and_info(X, ::Type{D}) where {D} sch = MMI.istable(X) ? MMI.schema(X) : nothing Xm_t = MMI.matrix(X; transpose=true) - colnames = if sch === nothing - [map(i -> "x$(subscriptify(i))", axes(Xm_t, 1))...] + colnames, display_colnames = if sch === nothing + ( + ["x$(i)" for i in eachindex(axes(Xm_t, 1))], + ["x$(subscriptify(i))" for i in eachindex(axes(Xm_t, 1))], + ) else - [string.(sch.names)...] + ([string(name) for name in sch.names], [string(name) for name in sch.names]) end D_promoted = get_dimensions_type(Xm_t, D) Xm_t_strip, X_units = unwrap_units_single(Xm_t, D_promoted) - return Xm_t_strip, colnames, X_units + return Xm_t_strip, colnames, display_colnames, X_units end function format_input_for(::SRRegressor, y, ::Type{D}) where {D} @@ -280,7 +284,8 @@ function format_input_for(::MultitargetSRRegressor, y, ::Type{D}) where {D} MMI.istable(y) || (length(size(y)) == 2 && size(y, 2) > 1), "For single-output regression, please use `SRRegressor`." ) - return get_matrix_and_info(y, D) + out = get_matrix_and_info(y, D) + return out[1], out[2], out[4] end function validate_variable_names(variable_names, fitresult) @assert( @@ -420,7 +425,7 @@ function _predict(m::M, fitresult, Xnew, idx, classes) where {M<:AbstractSRRegre params = full_report(m, fitresult; v_with_strings=Val(false)) prototype = MMI.istable(Xnew) ? Xnew : nothing - Xnew_t, variable_names, X_units = get_matrix_and_info(Xnew, m.dimensions_type) + Xnew_t, variable_names, _, X_units = get_matrix_and_info(Xnew, m.dimensions_type) T = promote_type(eltype(Xnew_t), fitresult.types.T) if isempty(params.equations) || any(isempty, params.equations) @@ -570,6 +575,7 @@ function tag_with_docstring(model_name::Symbol, description::String, bottom_matt search, to see if there will be any problems during the equation search related to the host environment. - `run_id::Union{String,Nothing}=nothing`: A unique identifier for the run. + This will be used to store outputs from the run in the `outputs` directory. If not specified, a unique ID will be generated. - `loss_type::Type=Nothing`: If you would like to use a different type for the loss than for the data you passed, specify the type here. diff --git a/test/test_mlj.jl b/test/test_mlj.jl index d26773485..6b5ec9844 100644 --- a/test/test_mlj.jl +++ b/test/test_mlj.jl @@ -127,10 +127,61 @@ end rng = MersenneTwister(0) X = randn(rng, 100, 3) Y = X - model = MultitargetSRRegressor(; niterations=10, stop_kws...) + + # Create a temporary directory + temp_dir = mktempdir() + + # Set the run_id and output_directory + run_id = "test_run" + output_directory = temp_dir + + # Instantiate the model with the specified run_id and output_directory + model = MultitargetSRRegressor(; + niterations=10, run_id=run_id, output_directory=output_directory, stop_kws... + ) + mach = machine(model, X, Y) fit!(mach) + + # Check predictions @test sum(abs2, predict(mach, X) .- Y) / length(X) < 1e-6 + + # Load the output CSV file + for i in 1:3 + csv_file = joinpath(output_directory, run_id, "hall_of_fame_output$(i).csv") + csv_content = read(csv_file, String) + + # Parse the CSV content using regex + lines = split(csv_content, '\n') + header = split(lines[1], ',') + data_lines = lines[2:end] + + @test header[1] == "Complexity" + @test header[2] == "Loss" + @test header[3] == "Equation" + + complexities = Int[] + losses = Float64[] + equations = String[] + + for line in data_lines + if isempty(line) + continue + end + cols = split(line, ',') + push!(complexities, parse(Int, cols[1])) + push!(losses, parse(Float64, cols[2])) + @show cols + push!(equations, cols[3]) + end + + @test !isempty(complexities) + @test complexities == report(mach).complexities[i] + @test losses == report(mach).losses[i] + for (eq, eq_str) in zip(equations, report(mach).equation_strings[i]) + @test eq[(begin + 1):(end - 1)] == eq_str + end + end end @testitem "Helpful errors" tags = [:part3] begin From b58c256160fce033126f60edd3cb58767d3bc8df Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 22:58:59 +0100 Subject: [PATCH 38/74] docs: update TemplateExpression docs --- docs/src/types.md | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/docs/src/types.md b/docs/src/types.md index cd62389be..bf954dfac 100644 --- a/docs/src/types.md +++ b/docs/src/types.md @@ -62,14 +62,34 @@ These types allow you to define expressions with parameters that can be tuned to ## Template Expressions -Template expressions are a type of expression that allows you to specify a predefined structure. -This lets you also fit vector expressions, as the custom evaluation structure can simply return -a vector of tuples. +Template expressions allow you to specify predefined structures and constraints for your expressions. +These use the new `TemplateStructure` type to define how expressions should be combined and evaluated. ```@docs TemplateExpression +TemplateStructure ``` +Example usage: + +```julia +# Define a template structure +structure = TemplateStructure( + combine=e -> e.f + e.g, # Create normal `Expression` + combine_vectors=e -> (e.f .+ e.g), # Output vector + combine_strings=e -> "($e.f) + ($e.g)", # Output string + variable_constraints=(; f=[1, 2], g=[3]) # Constrain dependencies +) + +# Use in options +model = SRRegressor(; + expression_type=TemplateExpression, + expression_options=(; structure=structure) +) +``` + +The `variable_constraints` field allows you to specify which variables can be used in different parts of the expression. + ## Population Groups of equations are given as a population, which is From 6e662f6dc9306e93ad9a0a153fdab441d9d440cb Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 23:13:58 +0100 Subject: [PATCH 39/74] docs: add TemplateExpression example to docs --- docs/src/examples.md | 132 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 121 insertions(+), 11 deletions(-) diff --git a/docs/src/examples.md b/docs/src/examples.md index 762970a9a..38b393600 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -230,26 +230,39 @@ Note that you can also search for dimensionless units by settings ## 7. Working with Expressions -Expressions in `SymbolicRegression.jl` are represented using the `Expression` type, which combines the raw `Node` type with an `OperatorEnum`. This allows for more flexible and powerful expression manipulation and evaluation. - -Here's an example: +Expressions in `SymbolicRegression.jl` are represented using the `Expression{T,Node{T},...}` type, which provides a more robust way to combine structure, operators, and constraints. Here's an example: ```julia using SymbolicRegression -# Define options with operators -options = Options(; binary_operators=[+, -, *], unary_operators=[cos]) +# Define options with operators and structure +options = Options( + binary_operators=[+, -, *], + unary_operators=[cos], + expression_options=( + structure=TemplateStructure(), + variable_constraints=Dict(1 => [1, 2], 2 => [2]) + ) +) -# Create expression nodes +# Create expression nodes with constraints operators = options.operators variable_names = ["x1", "x2"] -x1 = Expression(Node{Float64}(feature=1); operators, variable_names) -x2 = Expression(Node{Float64}(feature=2); operators, variable_names) +x1 = Expression( + Node{Float64}(feature=1), + operators=operators, + variable_names=variable_names, + structure=options.expression_options.structure +) +x2 = Expression( + Node{Float64}(feature=2), + operators=operators, + variable_names=variable_names, + structure=options.expression_options.structure +) -# Construct an expression using the operators from options +# Construct and evaluate expression expr = x1 * cos(x2 - 3.2) - -# Evaluate the expression directly X = rand(Float64, 2, 100) output = expr(X) ``` @@ -330,3 +343,100 @@ to browse the documentation for the Python frontend [PySR](http://astroautomata.com/PySR), which has additional documentation. In particular, the [tuning page](http://astroautomata.com/PySR/tuning) is useful for improving search performance. + +## 10. Template Expressions + +Template expressions allow you to define structured expressions where different parts can be constrained to use specific variables. In this example, we'll create expressions that output pairs of values. + +First, let's set up our basic configuration: + +```julia +using SymbolicRegression +using Random: rand +using MLJBase: machine, fit!, report + +options = Options( + binary_operators=(+, *, /, -), + unary_operators=(sin, cos) +) +operators = options.operators +variable_names = ["x1", "x2", "x3"] +``` + +Now we'll create base expressions for each variable: + +```julia +x1, x2, x3 = [ + Expression( + Node{Float64}(feature=i); + operators=operators, + variable_names=variable_names + ) + for i in 1:3 +] +``` + +The key part is defining our template structure. This determines how different parts of the expression combine: + +```julia +structure = TemplateStructure{(:f, :g1, :g2)}(; + # Define how to combine vectors of evaluated expressions + combine_vectors=e -> map( + (f, g1, g2) -> (f + g1, f + g2), + e.f, e.g1, e.g2 + ), + # Define how to combine strings for printing + combine_strings=e -> "( $(e.f) + $(e.g1), $(e.f) + $(e.g2) )", + # Constrain which variables can be used in each part + variable_constraints=(; f=[1, 2], g1=[3], g2=[3]) +) +``` + +Let's generate some example data: + +```julia +X = rand(100, 3) .* 10 +# Create pairs of target expressions +y = [ + (sin(X[i, 1]) + X[i, 3]^2, sin(X[i, 1]) + X[i, 3]) + for i in eachindex(axes(X, 1)) +] +``` + +Now we can set up and train our model: + +```julia +model = SRRegressor(; + binary_operators=(+, *), + unary_operators=(sin,), + maxsize=25, + expression_type=TemplateExpression, + # Pass options used to instantiate expressions + expression_options=(; structure), + # Our `y` is 2-tuple of values + elementwise_loss=((x1, x2), (y1, y2)) -> (y1 - x1)^2 + (y2 - x2)^2 +) + +mach = machine(model, X, y) +fit!(mach) +``` + +After training, we can examine the best expression: + +```julia +r = report(mach) +best_expr = r.equations[r.best_idx] + +# Access individual parts of the template expression +f_part = get_contents(best_expr).f # Expression using x1 or x2 +g1_part = get_contents(best_expr).g1 # Expression using x3 +g2_part = get_contents(best_expr).g2 # Expression using x3 +``` + +The above code demonstrates how template expressions can be used to: + +- Define structured expressions with multiple components +- Constrains which variables can be used in each component +- Create expressions that can output multiple values + +You can even output custom structs - see `examples/template_expression_complex.jl` From 0877a9034622cc5aad2e209a4d2b978fe19219b7 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 23:15:50 +0100 Subject: [PATCH 40/74] test: weaken test condition --- test/test_mlj.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/test_mlj.jl b/test/test_mlj.jl index 6b5ec9844..a4348fd28 100644 --- a/test/test_mlj.jl +++ b/test/test_mlj.jl @@ -39,7 +39,7 @@ end rep = report(mach) @test occursin("a", rep.equation_strings[rep.best_idx]) ypred_good = predict(mach, X) - @test sum(abs2, predict(mach, X) .- y) / length(y) < 1e-5 + @test sum(abs2, predict(mach, X) .- y) / length(y) < 1e-4 # Check that we can choose the equation ypred_same = predict(mach, (data=X, idx=rep.best_idx)) From e367105f179df89bf35253efc272dc2830c24a2f Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Thu, 24 Oct 2024 23:26:26 +0100 Subject: [PATCH 41/74] test: add coverage for TemplateExpression --- test/test_template_expression.jl | 90 ++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/test/test_template_expression.jl b/test/test_template_expression.jl index 6809ef8e8..04836cf15 100644 --- a/test/test_template_expression.jl +++ b/test/test_template_expression.jl @@ -135,3 +135,93 @@ end @testitem "Integration Test with fit! and Performance Check" tags = [:part3] begin include("../examples/template_expression.jl") end +@testitem "TemplateExpression with only combine function" tags = [:part3] begin + using SymbolicRegression + using SymbolicRegression.TemplateExpressionModule: + can_combine_vectors, can_combine, get_function_keys + using SymbolicRegression.InterfaceDynamicExpressionsModule: expected_array_type + using DynamicExpressions: constructorof + + # Set up basic operators and variables + options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) + operators = options.operators + variable_names = ["x1", "x2", "x3"] + x1, x2, x3 = + (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) + + # Create a TemplateStructure with only combine (no combine_vectors) + structure = TemplateStructure(; + combine=e -> sin(e.f) + e.g * e.g, # Only define combine + variable_constraints=(; f=[1, 2], g=[3]), + ) + + # Create the TemplateExpression + st_expr = TemplateExpression((; f=x1, g=cos(x3)); structure, operators, variable_names) + + @test constructorof(typeof(st_expr)) === TemplateExpression + @test get_function_keys(st_expr) == (:f, :g) + + # Test evaluation + cX = [1.0 2.0; 3.0 4.0; 5.0 6.0] + out = st_expr(cX) + out_2, complete = eval_tree_array(st_expr, cX) + + # The expression should evaluate by first combining to a single expression, + # then evaluating that expression + expected = sin.(cX[1, :]) .+ cos.(cX[3, :]) .^ 2 + @test out ≈ expected + + @test complete + @test out_2 ≈ expected + + # Verify that can_combine_vectors is false but can_combine is true + @test !can_combine_vectors(st_expr) + @test can_combine(st_expr) + + @test expected_array_type(cX, typeof(st_expr)) === Any + + @test string_tree(st_expr) == "sin(x1) + (cos(x3) * cos(x3))" +end +@testitem "TemplateExpression with data in combine_vectors" tags = [:part3] begin + using SymbolicRegression + + options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, exp)) + operators = options.operators + variable_names = ["x1", "x2", "x3"] + x1, x2, x3 = + (i -> Expression(Node(Float64; feature=i); operators, variable_names)).(1:3) + f = exp(2.5 * x3) + g = x1 + structure = TemplateStructure(; + combine_vectors=(e, X) -> e.f .+ X[2, :], variable_constraints=(; f=[3], g=[1]) + ) + st_expr = TemplateExpression((; f, g); structure, operators, variable_names) + X = randn(3, 100) + @test st_expr(X) ≈ @. exp(2.5 * X[3, :]) + X[2, :] +end +@testitem "TemplateStructure constructors" tags = [:part3] begin + using SymbolicRegression + + operators = Options(; binary_operators=(+, *, /, -)).operators + variable_names = ["x1", "x2"] + + # Create simple expressions with constant values + f = Expression(Node(Float64; val=1.0); operators, variable_names) + g = Expression(Node(Float64; val=2.0); operators, variable_names) + + # Test TemplateStructure{K}(combine; kws...) + st1 = TemplateStructure{(:f, :g)}(e -> e.f + e.g) + @test st1.combine((; f, g)) == f + g + + # Test TemplateStructure(combine; kws...) + st2 = TemplateStructure(e -> e.f + e.g; variable_constraints=(; f=[1], g=[2])) + @test st2.combine((; f, g)) == f + g + + # Test error when no K or variable_constraints provided + @test_throws ArgumentError TemplateStructure(e -> e.f + e.g) + @test_throws ArgumentError( + "If `variable_constraints` is not provided, " * + "you must initialize `TemplateStructure` with " * + "`TemplateStructure{K}(...)`, for tuple of symbols `K`.", + ) TemplateStructure(e -> e.f + e.g) +end From 6785db0b73b4b454b99ba2d623a6e561d74481a9 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sat, 26 Oct 2024 12:17:24 +0100 Subject: [PATCH 42/74] docs: describe new output structure in changelog --- CHANGELOG.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 778533db9..6e5a69a68 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,6 +26,7 @@ Summary of major recent changes, described in more detail below: - New mutation operators introduced, `swap_operands` and `rotate_tree` – both of which seem to help kick the evolution out of local optima. - New hyperparameter defaults created, based on a Pareto front volume calculation, rather than simply accuracy of the best expression. - [Support for Zygote.jl and Enzyme.jl within the constant optimizer, specified using the `autodiff_backend` option](#support-for-zygotejl-and-enzymejl-within-the-constant-optimizer-specified-using-the-autodiff_backend-option) +- [Changed output file handling](#changed-output-file-handling) - Major refactoring of the codebase to improve readability and modularity - Identified and fixed a major internal bug involving unexpected aliasing produced by the crossover operator - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. @@ -384,10 +385,20 @@ Options( for Enzyme.jl (though Enzyme support is highly experimental). +### Changed output file handling + +Instead of writing to a single file like `hall_of_fame_.csv`, outputs are now organized in a directory structure. +Each run gets a unique ID (containing a timestamp and random string, e.g., `20240315_120000_x7k92p`), and outputs are saved to `outputs//`. +Currently, only saves `hall_of_fame.csv` (and `hall_of_fame.csv.bak`), with plans to add more logs and diagnostics in this folder in future releases. + +The output directory can be customized via the `output_directory` option (defaults to `./outputs`). +A custom run ID can be specified via the new `run_id` parameter passed to `equation_search` (or `SRRegressor`). + ### Other Small Features in v1.0.0 - Support for per-variable complexity, via the `complexity_of_variables` option. - Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option. +- Default `maxsize` increased from 20 to 30. ### Update Guide @@ -397,6 +408,12 @@ Only if you are interacting with the return types of or if you have modified any internals, should you need to make some changes. +Also note that the "_hall of fame_" CSV file is now stored in +a directory structure, of the form `outputs//hall_of_fame.csv`. +This is to accommodate additional log files without polluting the current working directory. +Multi-output runs are now stored in the format `.../hall_of_fame_output1.csv`, rather than +the old format `hall_of_fame_{timestamp}.csv.out1`. + So, the key changes are, as discussed [above](#changed-the-core-expression-type-from-nodet--expressiontnodet), the change from `Node` to `Expression` as the default type for representing expressions. This includes the hall of fame object returned by `equation_search`, as well as the vector of expressions stored in `report(mach).equations` for the MLJ interface. From c6501bea987c98f9e340bc3f2bdec379282a8698 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 08:15:40 +0000 Subject: [PATCH 43/74] feat: add default options selector --- src/Options.jl | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/src/Options.jl b/src/Options.jl index 980e25edc..caacbf1b7 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -814,4 +814,44 @@ $(OPTION_DESCRIPTIONS) return options end +function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = nothing)) + if version isa VersionNumber && version < v"1.0.0" + return (;) + else + return (; + adaptive_parsimony_scaling=147.85182336915454, + alpha=0.0236276472097734, + annealing=false, + batching=false, + crossover_probability=0.0611420635066688, + fraction_replaced=0.0001862232335251, + fraction_replaced_hof=0.4873748218612678, + maxsize=30, + mutation_weights=MutationWeights(; + add_node=0.1790074914122459, + delete_node=0.8570333054193809, + do_nothing=1.0, + insert_node=5.662909956653213, + mutate_constant=0.0819490475733105, + mutate_operator=8.432050027550325, + optimize=0.0, + randomize=0.0161291198588, + rotate_tree=3.3054325751117952, + simplify=0.0034318313230220503, + swap_operands=0.0141241179573197, + form_connection=0.5, + break_connection=0.1, + ), + ncycles_per_iteration=364, + parsimony=0.0, + perturbation_factor=0.2188520798183637, + population_size=57, + populations=86, + probability_negate_constant=0.0008335092034286, + tournament_selection_n=49, + tournament_selection_p=0.5093587102628294, + ) + end +end + end From 06ce20da4ae0227a9f093a9c913a63a74c5422e2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 08:20:29 +0000 Subject: [PATCH 44/74] feat: round best hparams to 3 sig digits --- src/Options.jl | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/src/Options.jl b/src/Options.jl index caacbf1b7..655a46d56 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -819,37 +819,37 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n return (;) else return (; - adaptive_parsimony_scaling=147.85182336915454, - alpha=0.0236276472097734, + adaptive_parsimony_scaling=148, + alpha=0.0236, annealing=false, batching=false, - crossover_probability=0.0611420635066688, - fraction_replaced=0.0001862232335251, - fraction_replaced_hof=0.4873748218612678, + crossover_probability=0.0611, + fraction_replaced=0.000186, + fraction_replaced_hof=0.487, maxsize=30, mutation_weights=MutationWeights(; - add_node=0.1790074914122459, - delete_node=0.8570333054193809, + add_node=0.179, + delete_node=0.857, do_nothing=1.0, - insert_node=5.662909956653213, - mutate_constant=0.0819490475733105, - mutate_operator=8.432050027550325, + insert_node=5.66, + mutate_constant=0.0819, + mutate_operator=8.43, optimize=0.0, - randomize=0.0161291198588, - rotate_tree=3.3054325751117952, - simplify=0.0034318313230220503, - swap_operands=0.0141241179573197, + randomize=0.0161, + rotate_tree=3.30, + simplify=0.00343, + swap_operands=0.0141, form_connection=0.5, break_connection=0.1, ), ncycles_per_iteration=364, parsimony=0.0, - perturbation_factor=0.2188520798183637, + perturbation_factor=0.219, population_size=57, populations=86, - probability_negate_constant=0.0008335092034286, + probability_negate_constant=0.000834, tournament_selection_n=49, - tournament_selection_p=0.5093587102628294, + tournament_selection_p=0.509, ) end end From 4617b35c10221120d6ee5bf8cfe69eb54b01d18c Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 08:45:53 +0000 Subject: [PATCH 45/74] feat: add other defaults to `default_options` --- src/Options.jl | 126 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 105 insertions(+), 21 deletions(-) diff --git a/src/Options.jl b/src/Options.jl index 655a46d56..4918c0f5a 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -816,40 +816,124 @@ end function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = nothing)) if version isa VersionNumber && version < v"1.0.0" - return (;) - else return (; - adaptive_parsimony_scaling=148, - alpha=0.0236, + # Creating the Search Space + binary_operators=[+, -, /, *], + unary_operators=Function[], + maxsize=20, + # Setting the Search Size + populations=15, + population_size=33, + ncycles_per_iteration=550, + # Working with Complexities + parsimony=0.0032, + warmup_maxsize_by=0.0, + use_frequency=true, + use_frequency_in_tournament=true, + adaptive_parsimony_scaling=20.0, + should_simplify=true, + # Mutations + mutation_weights=MutationWeights(; + mutate_constant=0.048, + mutate_operator=0.47, + swap_operands=0.1, + rotate_tree=0.0, + add_node=0.79, + insert_node=5.1, + delete_node=1.7, + simplify=0.0020, + randomize=0.00023, + do_nothing=0.21, + optimize=0.0, + form_connection=0.5, + break_connection=0.1, + ), + crossover_probability=0.066, annealing=false, + alpha=0.1, + perturbation_factor=0.076, + probability_negate_constant=0.01, + # Tournament Selection + tournament_selection_n=12, + tournament_selection_p=0.86, + # Constant Optimization + should_optimize_constants=true, + optimizer_probability=0.14, + optimizer_nrestarts=2, + optimizer_algorithm=Optim.BFGS(; linesearch=LineSearches.BackTracking()), + # Migration between Populations + migration=true, + hof_migration=true, + fraction_replaced=0.00036, + fraction_replaced_hof=0.035, + topn=12, + # Performance and Parallelization batching=false, - crossover_probability=0.0611, - fraction_replaced=0.000186, - fraction_replaced_hof=0.487, + batch_size=50, + turbo=false, + bumper=false, + # Determinism + deterministic=false, + ) + else + return (; + # Creating the Search Space + binary_operators=[+, -, /, *], + unary_operators=Function[], maxsize=30, + # Setting the Search Size + populations=86, + population_size=57, + ncycles_per_iteration=364, + # Working with Complexities + parsimony=0.0, + warmup_maxsize_by=0.0, + use_frequency=true, + use_frequency_in_tournament=true, + adaptive_parsimony_scaling=148, + should_simplify=true, + # Mutations mutation_weights=MutationWeights(; - add_node=0.179, - delete_node=0.857, - do_nothing=1.0, - insert_node=5.66, - mutate_constant=0.0819, - mutate_operator=8.43, + mutate_constant=0.035291911190776126, + mutate_operator=3.6313193324458504, + swap_operands=0.006082646856290204, + rotate_tree=1.4235068782658613, + add_node=0.07709078600032576, + insert_node=2.43877044565746, + delete_node=0.369087185245687, + simplify=0.0014779413533204176, + randomize=0.006946114475984983, + do_nothing=0.43065675850844304, optimize=0.0, - randomize=0.0161, - rotate_tree=3.30, - simplify=0.00343, - swap_operands=0.0141, form_connection=0.5, break_connection=0.1, ), - ncycles_per_iteration=364, - parsimony=0.0, + crossover_probability=0.0611, + annealing=false, + alpha=0.1, perturbation_factor=0.219, - population_size=57, - populations=86, probability_negate_constant=0.000834, + # Tournament Selection tournament_selection_n=49, tournament_selection_p=0.509, + # Constant Optimization + should_optimize_constants=true, + optimizer_probability=0.14, + optimizer_nrestarts=2, + optimizer_algorithm=Optim.BFGS(; linesearch=LineSearches.BackTracking()), + # Migration between Populations + migration=true, + hof_migration=true, + fraction_replaced=0.000186, + fraction_replaced_hof=0.487, + topn=12, + # Performance and Parallelization + batching=false, + batch_size=50, + turbo=false, + bumper=false, + # Determinism + deterministic=false, ) end end From 062a9dccb3a249ffba51da27a8bd4fd4f193a433 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 10:32:29 +0000 Subject: [PATCH 46/74] feat!: impose new default search hyperparameters --- src/MutationWeights.jl | 20 +-- src/Options.jl | 287 ++++++++++++++++++++++++-------------- src/SymbolicRegression.jl | 2 +- src/Utils.jl | 11 +- src/precompile.jl | 1 + 5 files changed, 202 insertions(+), 119 deletions(-) diff --git a/src/MutationWeights.jl b/src/MutationWeights.jl index 9de15af7d..5b6253cec 100644 --- a/src/MutationWeights.jl +++ b/src/MutationWeights.jl @@ -100,16 +100,16 @@ will be normalized to sum to 1.0 after initialization. - [`AbstractMutationWeights`](@ref SymbolicRegression.CoreModule.MutationWeightsModule.AbstractMutationWeights): Use to define custom mutation weight types. """ Base.@kwdef mutable struct MutationWeights <: AbstractMutationWeights - mutate_constant::Float64 = 0.048 - mutate_operator::Float64 = 0.47 - swap_operands::Float64 = 0.1 - rotate_tree::Float64 = 0.3 - add_node::Float64 = 0.79 - insert_node::Float64 = 5.1 - delete_node::Float64 = 1.7 - simplify::Float64 = 0.0020 - randomize::Float64 = 0.00023 - do_nothing::Float64 = 0.21 + mutate_constant::Float64 = 0.0353 + mutate_operator::Float64 = 3.63 + swap_operands::Float64 = 0.00608 + rotate_tree::Float64 = 1.42 + add_node::Float64 = 0.0771 + insert_node::Float64 = 2.44 + delete_node::Float64 = 0.369 + simplify::Float64 = 0.00148 + randomize::Float64 = 0.00695 + do_nothing::Float64 = 0.431 optimize::Float64 = 0.0 form_connection::Float64 = 0.5 break_connection::Float64 = 0.1 diff --git a/src/Options.jl b/src/Options.jl index 4918c0f5a..afe97e5b5 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -3,7 +3,7 @@ module OptionsModule using DispatchDoctor: @unstable using Optim: Optim using StatsBase: StatsBase -using DynamicExpressions: OperatorEnum, Expression, default_node_type +using DynamicExpressions: OperatorEnum, Expression, default_node_type, AbstractExpression, AbstractExpressionNode using ADTypes: AbstractADType, ADTypes using LossFunctions: L2DistLoss, SupervisedLoss using Optim: Optim @@ -228,7 +228,10 @@ const deprecated_options_mapping = Base.ImmutableDict( # For static analysis tools: @ignore const DEFAULT_OPTIONS = () -const OPTION_DESCRIPTIONS = """- `binary_operators`: Vector of binary operators (functions) to use. +const OPTION_DESCRIPTIONS = """- `defaults`: What set of defaults to use for `Options`. The default, + `nothing`, will simply take the default options from the current version of SymbolicRegression. + However, you may also select the defaults from an earlier version, such as `v"0.24.5"`. +- `binary_operators`: Vector of binary operators (functions) to use. Each operator should be defined for two input scalars, and one output scalar. All operators need to be defined over the entire real line (excluding infinity - these @@ -439,83 +442,156 @@ https://github.com/MilesCranmer/PySR/discussions/115. $(OPTION_DESCRIPTIONS) """ @unstable @save_kwargs DEFAULT_OPTIONS function Options(; - binary_operators=Function[+, -, /, *], - unary_operators=Function[], - constraints=nothing, - elementwise_loss::Union{Function,SupervisedLoss,Nothing}=nothing, - loss_function::Union{Function,Nothing}=nothing, - tournament_selection_n::Integer=12, #1 sampled from every tournament_selection_n per mutation - tournament_selection_p::Real=0.86, - topn::Integer=12, #samples to return per population - complexity_of_operators=nothing, - complexity_of_constants::Union{Nothing,Real}=nothing, - complexity_of_variables::Union{Nothing,Real,AbstractVector}=nothing, - parsimony::Real=0.0032, - dimensional_constraint_penalty::Union{Nothing,Real}=nothing, + # Note: We can only `@nospecialize` on the first 32 arguments, which is why + # we have to declare some of these later on. + @nospecialize(defaults::Union{VersionNumber,Nothing}=nothing), + # Search options: + ## 1. Creating the Search Space: + @nospecialize(binary_operators=nothing), + @nospecialize(unary_operators=nothing), + @nospecialize(maxsize::Union{Nothing,Integer}=nothing), + @nospecialize(maxdepth::Union{Nothing,Integer}=nothing), + @nospecialize(expression_type::Type{<:AbstractExpression}=Expression), + @nospecialize(expression_options::NamedTuple=NamedTuple()), + @nospecialize(node_type::Type{<:AbstractExpressionNode}=default_node_type(expression_type)), + ## 2. Setting the Search Size: + @nospecialize(populations::Union{Nothing,Integer}=nothing), + @nospecialize(population_size::Union{Nothing,Integer}=nothing), + @nospecialize(ncycles_per_iteration::Union{Nothing,Integer}=nothing), + ## 3. The Objective: + @nospecialize(elementwise_loss::Union{Function,SupervisedLoss,Nothing}=nothing), + @nospecialize(loss_function::Union{Function,Nothing}=nothing), + ### [model_selection - only used in MLJ interface] + @nospecialize(dimensional_constraint_penalty::Union{Nothing,Real}=nothing), + ### dimensionless_constants_only + ## 4. Working with Complexities: + @nospecialize(parsimony::Union{Nothing,Real}=nothing), + @nospecialize(constraints=nothing), + @nospecialize(nested_constraints=nothing), + @nospecialize(complexity_of_operators=nothing), + @nospecialize(complexity_of_constants::Union{Nothing,Real}=nothing), + @nospecialize(complexity_of_variables::Union{Nothing,Real,AbstractVector}=nothing), + @nospecialize(warmup_maxsize_by::Union{Real,Nothing}=nothing), + ### use_frequency + ### use_frequency_in_tournament + @nospecialize(adaptive_parsimony_scaling::Union{Real,Nothing}=nothing), + ### should_simplify + ## 5. Mutations: + @nospecialize(mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple,Nothing}=nothing), + @nospecialize(crossover_probability::Union{Real,Nothing}=nothing), + @nospecialize(annealing::Union{Bool,Nothing}=nothing), + @nospecialize(alpha::Union{Nothing,Real}=nothing), + ### perturbation_factor + @nospecialize(probability_negate_constant::Union{Real,Nothing}=nothing), + ### skip_mutation_failures + ## 6. Tournament Selection: + @nospecialize(tournament_selection_n::Union{Nothing,Integer}=nothing), + @nospecialize(tournament_selection_p::Union{Nothing,Real}=nothing), + ## 7. Constant Optimization: + ### optimizer_algorithm + ### optimizer_nrestarts + ### optimizer_probability + ### optimizer_iterations + ### optimizer_f_calls_limit + ### optimizer_options + ### should_optimize_constants + ## 8. Migration between Populations: + ### migration + ### hof_migration + ### fraction_replaced + ### fraction_replaced_hof + ### topn + ## 9. Data Preprocessing: + ### [none] + ## 10. Stopping Criteria: + ### timeout_in_seconds + ### max_evals + @nospecialize(early_stop_condition::Union{Function,Real,Nothing}=nothing), + ## 11. Performance and Parallelization: + ### [others, passed to `equation_search`] + @nospecialize(batching::Union{Bool,Nothing}=nothing), + @nospecialize(batch_size::Union{Nothing,Integer}=nothing), + ### turbo + ### bumper + ### autodiff_backend + ## 12. Determinism: + ### [others, passed to `equation_search`] + ### deterministic + ### seed + ## 13. Monitoring: + ### verbosity + ### print_precision + ### progress + ## 14. Environment: + ### [none] + ## 15. Exporting the Results: + ### [others, passed to `equation_search`] + ### output_directory + ### save_to_file + + # Other search, but no specializations (since Julia limits us to 32!) + ## 1. Search Space: + ## 2. Setting the Search Size: + ## 3. The Objective: dimensionless_constants_only::Bool=false, - alpha::Real=0.100000, - maxsize::Integer=30, - maxdepth::Union{Nothing,Integer}=nothing, - turbo::Bool=false, - bumper::Bool=false, - migration::Bool=true, - hof_migration::Bool=true, - should_simplify::Union{Nothing,Bool}=nothing, - should_optimize_constants::Bool=true, - output_file::Union{Nothing,AbstractString}=nothing, - output_directory::Union{Nothing,String}=nothing, - expression_type::Type=Expression, - node_type::Type=default_node_type(expression_type), - expression_options::NamedTuple=NamedTuple(), - populations::Integer=15, - perturbation_factor::Real=0.076, - annealing::Bool=false, - batching::Bool=false, - batch_size::Integer=50, - mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple}=MutationWeights(), - crossover_probability::Real=0.066, - warmup_maxsize_by::Real=0.0, + ## 4. Working with Complexities: use_frequency::Bool=true, use_frequency_in_tournament::Bool=true, - adaptive_parsimony_scaling::Real=20.0, - population_size::Integer=33, - ncycles_per_iteration::Integer=550, - fraction_replaced::Real=0.00036, - fraction_replaced_hof::Real=0.035, - verbosity::Union{Integer,Nothing}=nothing, - print_precision::Integer=5, - save_to_file::Bool=true, - probability_negate_constant::Real=0.01, - seed=nothing, - bin_constraints=nothing, - una_constraints=nothing, - progress::Union{Bool,Nothing}=nothing, - terminal_width::Union{Nothing,Integer}=nothing, + should_simplify::Union{Nothing,Bool}=nothing, + ## 5. Mutations: + perturbation_factor::Union{Nothing,Real}=nothing, + skip_mutation_failures::Bool=true, + ## 6. Tournament Selection + ## 7. Constant Optimization: optimizer_algorithm::Union{AbstractString,Optim.AbstractOptimizer}=Optim.BFGS(; linesearch=LineSearches.BackTracking() ), - optimizer_nrestarts::Integer=2, - optimizer_probability::Real=0.14, + optimizer_nrestarts::Int=2, + optimizer_probability::Float64=0.14, optimizer_iterations::Union{Nothing,Integer}=nothing, optimizer_f_calls_limit::Union{Nothing,Integer}=nothing, optimizer_options::Union{Dict,NamedTuple,Optim.Options,Nothing}=nothing, - autodiff_backend::Union{AbstractADType,Symbol,Nothing}=nothing, - use_recorder::Bool=false, - recorder_file::AbstractString="pysr_recorder.json", - early_stop_condition::Union{Function,Real,Nothing}=nothing, + should_optimize_constants::Bool=true, + ## 8. Migration between Populations: + migration::Bool=true, + hof_migration::Bool=true, + fraction_replaced::Union{Real,Nothing}=nothing, + fraction_replaced_hof::Union{Real,Nothing}=nothing, + topn::Union{Nothing,Integer}=nothing, + ## 9. Data Preprocessing: + ## 10. Stopping Criteria: timeout_in_seconds::Union{Nothing,Real}=nothing, max_evals::Union{Nothing,Integer}=nothing, - skip_mutation_failures::Bool=true, - nested_constraints=nothing, + ## 11. Performance and Parallelization: + turbo::Bool=false, + bumper::Bool=false, + autodiff_backend::Union{AbstractADType,Symbol,Nothing}=nothing, + ## 12. Determinism: deterministic::Bool=false, - # Not search options; just construction options: + seed=nothing, + ## 13. Monitoring: + verbosity::Union{Integer,Nothing}=nothing, + print_precision::Integer=5, + progress::Union{Bool,Nothing}=nothing, + ## 14. Environment: + ## 15. Exporting the Results: + output_directory::Union{Nothing,String}=nothing, + save_to_file::Bool=true, + ## Undocumented features: + bin_constraints=nothing, + una_constraints=nothing, + terminal_width::Union{Nothing,Integer}=nothing, + use_recorder::Bool=false, + recorder_file::AbstractString="pysr_recorder.json", + ### Not search options; just construction options: define_helper_functions::Bool=true, - deprecated_return_state=nothing, ######################################### # Deprecated args: ###################### + output_file::Union{Nothing,AbstractString}=nothing, fast_cycle::Bool=false, npopulations::Union{Nothing,Integer}=nothing, npop::Union{Nothing,Integer}=nothing, + deprecated_return_state=nothing, kws..., ######################################### ) @@ -577,7 +653,6 @@ $(OPTION_DESCRIPTIONS) "Unknown deprecated keyword argument: $k. Please update `Options(;)` to transfer this key.", ) end - fast_cycle && Base.depwarn("`fast_cycle` is deprecated and has no effect.", :Options) if npop !== nothing Base.depwarn("`npop` is deprecated. Use `population_size` instead.", :Options) population_size = npop @@ -609,6 +684,35 @@ $(OPTION_DESCRIPTIONS) end end + ################################# + #### Supply defaults ############ + #! format: off + _default_options = default_options(defaults) + binary_operators = something(binary_operators, _default_options.binary_operators) + unary_operators = something(unary_operators, _default_options.unary_operators) + maxsize = something(maxsize, _default_options.maxsize) + populations = something(populations, _default_options.populations) + population_size = something(population_size, _default_options.population_size) + ncycles_per_iteration = something(ncycles_per_iteration, _default_options.ncycles_per_iteration) + parsimony = something(parsimony, _default_options.parsimony) + warmup_maxsize_by = something(warmup_maxsize_by, _default_options.warmup_maxsize_by) + adaptive_parsimony_scaling = something(adaptive_parsimony_scaling, _default_options.adaptive_parsimony_scaling) + mutation_weights = something(mutation_weights, _default_options.mutation_weights) + crossover_probability = something(crossover_probability, _default_options.crossover_probability) + annealing = something(annealing, _default_options.annealing) + alpha = something(alpha, _default_options.alpha) + perturbation_factor = something(perturbation_factor, _default_options.perturbation_factor) + probability_negate_constant = something(probability_negate_constant, _default_options.probability_negate_constant) + tournament_selection_n = something(tournament_selection_n, _default_options.tournament_selection_n) + tournament_selection_p = something(tournament_selection_p, _default_options.tournament_selection_p) + fraction_replaced = something(fraction_replaced, _default_options.fraction_replaced) + fraction_replaced_hof = something(fraction_replaced_hof, _default_options.fraction_replaced_hof) + topn = something(topn, _default_options.topn) + batching = something(batching, _default_options.batching) + batch_size = something(batch_size, _default_options.batch_size) + #! format: on + ################################# + if should_simplify === nothing should_simplify = ( loss_function === nothing && @@ -623,6 +727,7 @@ $(OPTION_DESCRIPTIONS) @assert warmup_maxsize_by >= 0.0f0 @assert length(unary_operators) <= max_ops @assert length(binary_operators) <= max_ops + @assert tournament_selection_n < population_size "`tournament_selection_n` must be less than `population_size`" # Make sure nested_constraints contains functions within our operator set: _nested_constraints = build_nested_constraints(; @@ -694,18 +799,14 @@ $(OPTION_DESCRIPTIONS) # Parse optimizer options if !isa(optimizer_options, Optim.Options) - optimizer_iterations = isnothing(optimizer_iterations) ? 8 : optimizer_iterations - optimizer_f_calls_limit = if isnothing(optimizer_f_calls_limit) - 10_000 - else - optimizer_f_calls_limit - end + optimizer_iterations = something(optimizer_iterations, 8) + optimizer_f_calls_limit = something(optimizer_f_calls_limit, 10_000) extra_kws = hasfield(Optim.Options, :show_warnings) ? (; show_warnings=false) : () optimizer_options = Optim.Options(; iterations=optimizer_iterations, f_calls_limit=optimizer_f_calls_limit, extra_kws..., - (isnothing(optimizer_options) ? () : optimizer_options)..., + something(optimizer_options, ())..., ) else @assert optimizer_iterations === nothing && optimizer_f_calls_limit === nothing @@ -828,10 +929,7 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n # Working with Complexities parsimony=0.0032, warmup_maxsize_by=0.0, - use_frequency=true, - use_frequency_in_tournament=true, adaptive_parsimony_scaling=20.0, - should_simplify=true, # Mutations mutation_weights=MutationWeights(; mutate_constant=0.048, @@ -856,24 +954,13 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n # Tournament Selection tournament_selection_n=12, tournament_selection_p=0.86, - # Constant Optimization - should_optimize_constants=true, - optimizer_probability=0.14, - optimizer_nrestarts=2, - optimizer_algorithm=Optim.BFGS(; linesearch=LineSearches.BackTracking()), # Migration between Populations - migration=true, - hof_migration=true, fraction_replaced=0.00036, fraction_replaced_hof=0.035, topn=12, # Performance and Parallelization batching=false, batch_size=50, - turbo=false, - bumper=false, - # Determinism - deterministic=false, ) else return (; @@ -888,22 +975,19 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n # Working with Complexities parsimony=0.0, warmup_maxsize_by=0.0, - use_frequency=true, - use_frequency_in_tournament=true, adaptive_parsimony_scaling=148, - should_simplify=true, # Mutations mutation_weights=MutationWeights(; - mutate_constant=0.035291911190776126, - mutate_operator=3.6313193324458504, - swap_operands=0.006082646856290204, - rotate_tree=1.4235068782658613, - add_node=0.07709078600032576, - insert_node=2.43877044565746, - delete_node=0.369087185245687, - simplify=0.0014779413533204176, - randomize=0.006946114475984983, - do_nothing=0.43065675850844304, + mutate_constant=0.0353, + mutate_operator=3.63, + swap_operands=0.00608, + rotate_tree=1.42, + add_node=0.0771, + insert_node=2.44, + delete_node=0.369, + simplify=0.00148, + randomize=0.00695, + do_nothing=0.431, optimize=0.0, form_connection=0.5, break_connection=0.1, @@ -916,24 +1000,13 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n # Tournament Selection tournament_selection_n=49, tournament_selection_p=0.509, - # Constant Optimization - should_optimize_constants=true, - optimizer_probability=0.14, - optimizer_nrestarts=2, - optimizer_algorithm=Optim.BFGS(; linesearch=LineSearches.BackTracking()), # Migration between Populations - migration=true, - hof_migration=true, fraction_replaced=0.000186, fraction_replaced_hof=0.487, topn=12, # Performance and Parallelization batching=false, batch_size=50, - turbo=false, - bumper=false, - # Determinism - deterministic=false, ) end end diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index 34ea2e961..d263f9403 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -419,7 +419,7 @@ which is useful for debugging and profiling. function equation_search( X::AbstractMatrix{T}, y::AbstractMatrix; - niterations::Int=10, + niterations::Int=40, weights::Union{AbstractMatrix{T},AbstractVector{T},Nothing}=nothing, options::AbstractOptions=Options(), variable_names::Union{AbstractVector{String},Nothing}=nothing, diff --git a/src/Utils.jl b/src/Utils.jl index 2ee29f16c..e8ae7653d 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -171,7 +171,16 @@ function _save_kwargs(log_variable::Symbol, fdef::Expr) def = splitdef(fdef) # Get kwargs: kwargs = copy(def[:kwargs]) - filter!(kwargs) do k + kwargs = map(kwargs) do k + # If it's a macrocall for @nospecialize + if k.head == :macrocall && string(k.args[1]) == "@nospecialize" + # Find the actual argument - it's the last non-LineNumberNode argument + inner_arg = last(filter(arg -> !(arg isa LineNumberNode), k.args)) + return inner_arg + end + return k + end + kwargs = filter(kwargs) do k # Filter ...: k.head == :... && return false # Filter other deprecated kwargs: diff --git a/src/precompile.jl b/src/precompile.jl index 13aaac06f..ca3c9c4f9 100644 --- a/src/precompile.jl +++ b/src/precompile.jl @@ -44,6 +44,7 @@ function do_precompilation(::Val{mode}) where {mode} unary_operators=[sin, cos, exp, log, sqrt, abs], populations=3, population_size=start ? 50 : 12, + tournament_selection_n=6, ncycles_per_iteration=start ? 30 : 1, mutation_weights=MutationWeights(; mutate_constant=1.0, From 77756e447a5acd66b30a0ea679be7cde6e57e117 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 10:33:42 +0000 Subject: [PATCH 47/74] style: formatting --- src/Options.jl | 72 +++++++++++++++++++++++++++----------------------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/src/Options.jl b/src/Options.jl index afe97e5b5..eac85d243 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -3,7 +3,8 @@ module OptionsModule using DispatchDoctor: @unstable using Optim: Optim using StatsBase: StatsBase -using DynamicExpressions: OperatorEnum, Expression, default_node_type, AbstractExpression, AbstractExpressionNode +using DynamicExpressions: + OperatorEnum, Expression, default_node_type, AbstractExpression, AbstractExpressionNode using ADTypes: AbstractADType, ADTypes using LossFunctions: L2DistLoss, SupervisedLoss using Optim: Optim @@ -444,49 +445,54 @@ $(OPTION_DESCRIPTIONS) @unstable @save_kwargs DEFAULT_OPTIONS function Options(; # Note: We can only `@nospecialize` on the first 32 arguments, which is why # we have to declare some of these later on. - @nospecialize(defaults::Union{VersionNumber,Nothing}=nothing), + @nospecialize(defaults::Union{VersionNumber,Nothing} = nothing), # Search options: ## 1. Creating the Search Space: - @nospecialize(binary_operators=nothing), - @nospecialize(unary_operators=nothing), - @nospecialize(maxsize::Union{Nothing,Integer}=nothing), - @nospecialize(maxdepth::Union{Nothing,Integer}=nothing), - @nospecialize(expression_type::Type{<:AbstractExpression}=Expression), - @nospecialize(expression_options::NamedTuple=NamedTuple()), - @nospecialize(node_type::Type{<:AbstractExpressionNode}=default_node_type(expression_type)), + @nospecialize(binary_operators = nothing), + @nospecialize(unary_operators = nothing), + @nospecialize(maxsize::Union{Nothing,Integer} = nothing), + @nospecialize(maxdepth::Union{Nothing,Integer} = nothing), + @nospecialize(expression_type::Type{<:AbstractExpression} = Expression), + @nospecialize(expression_options::NamedTuple = NamedTuple()), + @nospecialize( + node_type::Type{<:AbstractExpressionNode} = default_node_type(expression_type) + ), ## 2. Setting the Search Size: - @nospecialize(populations::Union{Nothing,Integer}=nothing), - @nospecialize(population_size::Union{Nothing,Integer}=nothing), - @nospecialize(ncycles_per_iteration::Union{Nothing,Integer}=nothing), + @nospecialize(populations::Union{Nothing,Integer} = nothing), + @nospecialize(population_size::Union{Nothing,Integer} = nothing), + @nospecialize(ncycles_per_iteration::Union{Nothing,Integer} = nothing), ## 3. The Objective: - @nospecialize(elementwise_loss::Union{Function,SupervisedLoss,Nothing}=nothing), - @nospecialize(loss_function::Union{Function,Nothing}=nothing), + @nospecialize(elementwise_loss::Union{Function,SupervisedLoss,Nothing} = nothing), + @nospecialize(loss_function::Union{Function,Nothing} = nothing), ### [model_selection - only used in MLJ interface] - @nospecialize(dimensional_constraint_penalty::Union{Nothing,Real}=nothing), + @nospecialize(dimensional_constraint_penalty::Union{Nothing,Real} = nothing), ### dimensionless_constants_only ## 4. Working with Complexities: - @nospecialize(parsimony::Union{Nothing,Real}=nothing), - @nospecialize(constraints=nothing), - @nospecialize(nested_constraints=nothing), - @nospecialize(complexity_of_operators=nothing), - @nospecialize(complexity_of_constants::Union{Nothing,Real}=nothing), - @nospecialize(complexity_of_variables::Union{Nothing,Real,AbstractVector}=nothing), - @nospecialize(warmup_maxsize_by::Union{Real,Nothing}=nothing), + @nospecialize(parsimony::Union{Nothing,Real} = nothing), + @nospecialize(constraints = nothing), + @nospecialize(nested_constraints = nothing), + @nospecialize(complexity_of_operators = nothing), + @nospecialize(complexity_of_constants::Union{Nothing,Real} = nothing), + @nospecialize(complexity_of_variables::Union{Nothing,Real,AbstractVector} = nothing), + @nospecialize(warmup_maxsize_by::Union{Real,Nothing} = nothing), ### use_frequency ### use_frequency_in_tournament - @nospecialize(adaptive_parsimony_scaling::Union{Real,Nothing}=nothing), + @nospecialize(adaptive_parsimony_scaling::Union{Real,Nothing} = nothing), ### should_simplify ## 5. Mutations: - @nospecialize(mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple,Nothing}=nothing), - @nospecialize(crossover_probability::Union{Real,Nothing}=nothing), - @nospecialize(annealing::Union{Bool,Nothing}=nothing), - @nospecialize(alpha::Union{Nothing,Real}=nothing), + @nospecialize( + mutation_weights::Union{AbstractMutationWeights,AbstractVector,NamedTuple,Nothing} = + nothing + ), + @nospecialize(crossover_probability::Union{Real,Nothing} = nothing), + @nospecialize(annealing::Union{Bool,Nothing} = nothing), + @nospecialize(alpha::Union{Nothing,Real} = nothing), ### perturbation_factor - @nospecialize(probability_negate_constant::Union{Real,Nothing}=nothing), + @nospecialize(probability_negate_constant::Union{Real,Nothing} = nothing), ### skip_mutation_failures ## 6. Tournament Selection: - @nospecialize(tournament_selection_n::Union{Nothing,Integer}=nothing), - @nospecialize(tournament_selection_p::Union{Nothing,Real}=nothing), + @nospecialize(tournament_selection_n::Union{Nothing,Integer} = nothing), + @nospecialize(tournament_selection_p::Union{Nothing,Real} = nothing), ## 7. Constant Optimization: ### optimizer_algorithm ### optimizer_nrestarts @@ -506,11 +512,11 @@ $(OPTION_DESCRIPTIONS) ## 10. Stopping Criteria: ### timeout_in_seconds ### max_evals - @nospecialize(early_stop_condition::Union{Function,Real,Nothing}=nothing), + @nospecialize(early_stop_condition::Union{Function,Real,Nothing} = nothing), ## 11. Performance and Parallelization: ### [others, passed to `equation_search`] - @nospecialize(batching::Union{Bool,Nothing}=nothing), - @nospecialize(batch_size::Union{Nothing,Integer}=nothing), + @nospecialize(batching::Union{Bool,Nothing} = nothing), + @nospecialize(batch_size::Union{Nothing,Integer} = nothing), ### turbo ### bumper ### autodiff_backend From c9bd7db4873d0501258427f117bf760cf86c8fd4 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 10:41:04 +0000 Subject: [PATCH 48/74] feat!: increase default `niterations` --- CHANGELOG.md | 1 + example.jl | 6 ++---- src/SymbolicRegression.jl | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6e5a69a68..a57ecaa77 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -399,6 +399,7 @@ A custom run ID can be specified via the new `run_id` parameter passed to `equat - Support for per-variable complexity, via the `complexity_of_variables` option. - Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option. - Default `maxsize` increased from 20 to 30. +- Default `niterations` increased from 10 to 50, as many users seem to be unaware that this is small (and meant for testing), even in publications. I think this `50` is still low, but it should be a more accurate default for those who don't tune. ### Update Guide diff --git a/example.jl b/example.jl index ef70096e5..129c72f40 100644 --- a/example.jl +++ b/example.jl @@ -4,12 +4,10 @@ X = randn(Float32, 5, 100) y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2 options = SymbolicRegression.Options(; - binary_operators=[+, *, /, -], unary_operators=[cos, exp], populations=20 + binary_operators=[+, *, /, -], unary_operators=[cos, exp] ) -hall_of_fame = equation_search( - X, y; niterations=40, options=options, parallelism=:multithreading -) +hall_of_fame = equation_search(X, y; options=options, parallelism=:multithreading) dominating = calculate_pareto_frontier(hall_of_fame) diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index d263f9403..7ff71e394 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -419,7 +419,7 @@ which is useful for debugging and profiling. function equation_search( X::AbstractMatrix{T}, y::AbstractMatrix; - niterations::Int=40, + niterations::Int=50, weights::Union{AbstractMatrix{T},AbstractVector{T},Nothing}=nothing, options::AbstractOptions=Options(), variable_names::Union{AbstractVector{String},Nothing}=nothing, From 72d86abe6368bbcf72281410fc1c5b2f5b1a4510 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 11:00:04 +0000 Subject: [PATCH 49/74] docs: fix changelog --- CHANGELOG.md | 2 +- src/Options.jl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a57ecaa77..ca1989c04 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -399,7 +399,7 @@ A custom run ID can be specified via the new `run_id` parameter passed to `equat - Support for per-variable complexity, via the `complexity_of_variables` option. - Option to force dimensionless constants when fitting with dimensional constraints, via the `dimensionless_constants_only` option. - Default `maxsize` increased from 20 to 30. -- Default `niterations` increased from 10 to 50, as many users seem to be unaware that this is small (and meant for testing), even in publications. I think this `50` is still low, but it should be a more accurate default for those who don't tune. +- Default `niterations` increased from 10 to 50, as many users seem to be unaware that this is small (and meant for testing), even in publications. I think this 50 is still low, but it should be a more accurate default for those who don't tune. ### Update Guide diff --git a/src/Options.jl b/src/Options.jl index eac85d243..a6b22b940 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -553,7 +553,7 @@ $(OPTION_DESCRIPTIONS) linesearch=LineSearches.BackTracking() ), optimizer_nrestarts::Int=2, - optimizer_probability::Float64=0.14, + optimizer_probability::AbstractFloat=0.14, optimizer_iterations::Union{Nothing,Integer}=nothing, optimizer_f_calls_limit::Union{Nothing,Integer}=nothing, optimizer_options::Union{Dict,NamedTuple,Optim.Options,Nothing}=nothing, From 7e5e517835ab676d15f776b727b291ba60f78e2d Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 13:14:48 +0000 Subject: [PATCH 50/74] feat!: modify default hyperparameters --- src/Options.jl | 51 ++++++++++++++++++++------------------ src/SymbolicRegression.jl | 4 +-- test/test_stop_on_clock.jl | 1 + 3 files changed, 30 insertions(+), 26 deletions(-) diff --git a/src/Options.jl b/src/Options.jl index a6b22b940..0ec5b3418 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -971,44 +971,47 @@ function default_options(@nospecialize(version::Union{VersionNumber,Nothing} = n else return (; # Creating the Search Space - binary_operators=[+, -, /, *], + binary_operators=Function[+, -, /, *], unary_operators=Function[], maxsize=30, # Setting the Search Size - populations=86, - population_size=57, - ncycles_per_iteration=364, + populations=31, + population_size=27, + ncycles_per_iteration=380, # Working with Complexities parsimony=0.0, warmup_maxsize_by=0.0, - adaptive_parsimony_scaling=148, + adaptive_parsimony_scaling=1040, # Mutations mutation_weights=MutationWeights(; - mutate_constant=0.0353, - mutate_operator=3.63, - swap_operands=0.00608, - rotate_tree=1.42, - add_node=0.0771, - insert_node=2.44, - delete_node=0.369, - simplify=0.00148, - randomize=0.00695, - do_nothing=0.431, + mutate_constant=0.0346, + mutate_operator=0.293, + swap_operands=0.198, + rotate_tree=4.26, + add_node=2.47, + insert_node=0.0112, + delete_node=0.870, + simplify=0.00209, + randomize=0.000502, + do_nothing=0.273, optimize=0.0, form_connection=0.5, break_connection=0.1, ), - crossover_probability=0.0611, - annealing=false, - alpha=0.1, - perturbation_factor=0.219, - probability_negate_constant=0.000834, + crossover_probability=0.0259, + annealing=true, + alpha=3.17, + perturbation_factor=0.129, + probability_negate_constant=0.00743, # Tournament Selection - tournament_selection_n=49, - tournament_selection_p=0.509, + tournament_selection_n=15, + tournament_selection_p=0.982, # Migration between Populations - fraction_replaced=0.000186, - fraction_replaced_hof=0.487, + fraction_replaced=0.00036, + ## ^Note: the optimal value found was 0.00000425, + ## but I thought this was a symptom of doing the sweep on such + ## a small problem, so I increased it to the older value of 0.00036 + fraction_replaced_hof=0.0614, topn=12, # Performance and Parallelization batching=false, diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index 7ff71e394..3e8c59b5f 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -339,7 +339,7 @@ which is useful for debugging and profiling. - `y::Union{AbstractMatrix{T}, AbstractVector{T}}`: The values to predict. The first dimension is the output feature to predict with each equation, and the second dimension is rows. -- `niterations::Int=10`: The number of iterations to perform the search. +- `niterations::Int=100`: The number of iterations to perform the search. More iterations will improve the results. - `weights::Union{AbstractMatrix{T}, AbstractVector{T}, Nothing}=nothing`: Optionally weight the loss for each `y` by this value (same shape as `y`). @@ -419,7 +419,7 @@ which is useful for debugging and profiling. function equation_search( X::AbstractMatrix{T}, y::AbstractMatrix; - niterations::Int=50, + niterations::Int=100, weights::Union{AbstractMatrix{T},AbstractVector{T},Nothing}=nothing, options::AbstractOptions=Options(), variable_names::Union{AbstractVector{String},Nothing}=nothing, diff --git a/test/test_stop_on_clock.jl b/test/test_stop_on_clock.jl index a7f925a20..238678b47 100644 --- a/test/test_stop_on_clock.jl +++ b/test/test_stop_on_clock.jl @@ -10,6 +10,7 @@ y = 2 * cos.(X[4, :]) options = Options(; default_params..., population_size=10, + tournament_selection_n=9, ncycles_per_iteration=100, maxsize=15, timeout_in_seconds=1, From 019f3f0db1775a85f4106eb126f90c8b7aad47e6 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 14:24:39 +0000 Subject: [PATCH 51/74] chore: update gitignore --- .gitignore | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 2cb9c5d85..ecd0d8ac2 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,8 @@ .dataset*.jl .hyperparams*.jl +outputs *.csv +*.bak *.bkup performance*txt *.out @@ -8,7 +10,7 @@ trials* **/__pycache__ build dist -Manifest.toml +Manifest*.toml *.cov .coveralls.yml **/*tmp*.jl From 93e1238f1ef839e7ec5e71e03ec64ada1d7e8c52 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 16:30:31 +0000 Subject: [PATCH 52/74] fix: bug in option specialization affecting Enzyme --- src/OptionsStruct.jl | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/OptionsStruct.jl b/src/OptionsStruct.jl index a391b242a..f249fcf6f 100644 --- a/src/OptionsStruct.jl +++ b/src/OptionsStruct.jl @@ -282,17 +282,16 @@ Base.show(io::IO, ::MIME"text/plain", options::Options) = Base.print(io, options specialized_options(options::AbstractOptions) = options @unstable function specialized_options(options::Options) - return _specialized_options(options) + return _specialized_options(options, options.operators) end -@generated function _specialized_options(options::O) where {O<:Options} +@generated function _specialized_options( + options::O, operators::OP +) where {O<:Options,OP<:AbstractOperatorEnum} # Return an options struct with concrete operators type_parameters = O.parameters fields = Any[:(getfield(options, $(QuoteNode(k)))) for k in fieldnames(O)] quote - operators = getfield(options, :operators) - Options{$(type_parameters[1]),typeof(operators),$(type_parameters[3:end]...)}( - $(fields...) - ) + Options{$(type_parameters[1]),$(OP),$(type_parameters[3:end]...)}($(fields...)) end end From 029cbfccd7bd02fe04733c217caf54b74b117f98 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 16:37:04 +0000 Subject: [PATCH 53/74] refactor: reduce compilation throughout library --- Project.toml | 7 +----- src/Configure.jl | 35 +++++++++++++++--------------- src/Core.jl | 2 +- src/ExpressionBuilder.jl | 7 +++--- src/InterfaceDynamicExpressions.jl | 11 +++++----- src/Options.jl | 11 +++++----- src/OptionsStruct.jl | 9 +++++--- src/Population.jl | 3 ++- src/ProgressBars.jl | 2 ++ src/SearchUtils.jl | 35 ++++++++++++++++-------------- src/SymbolicRegression.jl | 29 ++++++++++++++----------- src/TemplateExpression.jl | 24 ++++++++++++++------ src/Utils.jl | 12 +++++----- 13 files changed, 105 insertions(+), 82 deletions(-) diff --git a/Project.toml b/Project.toml index 9c5400593..40b235088 100644 --- a/Project.toml +++ b/Project.toml @@ -40,7 +40,7 @@ SymbolicRegressionSymbolicUtilsExt = "SymbolicUtils" [compat] ADTypes = "^1.4.0" -Compat = "^4.2" +Compat = "^4.16" ConstructionBase = "<1.5.7" Dates = "1" DifferentiationInterface = "0.5, 0.6" @@ -66,8 +66,3 @@ StatsBase = "0.33, 0.34" SymbolicUtils = "0.19, ^1.0.5, 2, 3" TOML = "<0.0.1, 1" julia = "1.10" - -[extras] -Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9" -JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1" -SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b" diff --git a/src/Configure.jl b/src/Configure.jl index 2b184e5cd..d8f029bfa 100644 --- a/src/Configure.jl +++ b/src/Configure.jl @@ -1,6 +1,6 @@ const TEST_TYPE = Float32 -function test_operator(op::F, x::T, y=nothing) where {F,T} +function test_operator(@nospecialize(op::Function), x::T, y=nothing) where {T} local output try output = y === nothing ? op(x) : op(x, y) @@ -26,14 +26,18 @@ function test_operator(op::F, x::T, y=nothing) where {F,T} end return nothing end +precompile(Tuple{typeof(test_operator),Function,Float64,Float64}) +precompile(Tuple{typeof(test_operator),Function,Float32,Float32}) +precompile(Tuple{typeof(test_operator),Function,Float64}) +precompile(Tuple{typeof(test_operator),Function,Float32}) const TEST_INPUTS = collect(range(-100, 100; length=99)) function assert_operators_well_defined(T, options::AbstractOptions) test_input = if T <: Complex - (x -> convert(T, x)).(TEST_INPUTS .+ TEST_INPUTS .* im) + Base.Fix1(convert, T).(TEST_INPUTS .+ TEST_INPUTS .* im) else - (x -> convert(T, x)).(TEST_INPUTS) + Base.Fix1(convert, T).(TEST_INPUTS) end for x in test_input, y in test_input, op in options.operators.binops test_operator(op, x, y) @@ -54,20 +58,18 @@ function test_option_configuration( verbosity > 0 && @warn "You are using multithreading mode, but only one thread is available. Try starting julia with `--threads=auto`." end - if any(d -> d.X_units !== nothing || d.y_units !== nothing, datasets) && - options.dimensional_constraint_penalty === nothing + if any(has_units, datasets) && options.dimensional_constraint_penalty === nothing verbosity > 0 && @warn "You are using dimensional constraints, but `dimensional_constraint_penalty` was not set. The default penalty of `1000.0` will be used." end - for op in (options.operators.binops..., options.operators.unaops...) - if is_anonymous_function(op) - throw( - AssertionError( - "Anonymous functions can't be used as operators for SymbolicRegression.jl", - ), - ) - end + if any(is_anonymous_function, options.operators.binops) || + any(is_anonymous_function, options.operators.unaops) + throw( + AssertionError( + "Anonymous functions can't be used as operators for SymbolicRegression.jl" + ), + ) end assert_operators_well_defined(T, options) @@ -80,6 +82,7 @@ function test_option_configuration( ), ) end + return nothing end # Check for errors before they happen @@ -205,9 +208,7 @@ function activate_env_on_workers( end end -function import_module_on_workers( - procs, filename::String, options::AbstractOptions, verbosity -) +function import_module_on_workers(procs, filename::String, verbosity) loaded_modules_head_worker = [k.name for (k, _) in Base.loaded_modules] included_as_local = "SymbolicRegression" ∉ loaded_modules_head_worker @@ -329,7 +330,7 @@ function configure_workers(; end if we_created_procs - import_module_on_workers(procs, file, options, verbosity) + import_module_on_workers(procs, file, verbosity) end move_functions_to_workers(procs, options, example_dataset, verbosity) diff --git a/src/Core.jl b/src/Core.jl index 2d6e73d89..6000412ce 100644 --- a/src/Core.jl +++ b/src/Core.jl @@ -12,7 +12,7 @@ include("Options.jl") using .ProgramConstantsModule: MAX_DEGREE, BATCH_DIM, FEATURE_DIM, RecordType, DATA_TYPE, LOSS_TYPE -using .DatasetModule: Dataset, is_weighted +using .DatasetModule: Dataset, is_weighted, has_units using .MutationWeightsModule: AbstractMutationWeights, MutationWeights, sample_mutation using .OptionsStructModule: AbstractOptions, diff --git a/src/ExpressionBuilder.jl b/src/ExpressionBuilder.jl index 709937ecf..d7bc5f5d6 100644 --- a/src/ExpressionBuilder.jl +++ b/src/ExpressionBuilder.jl @@ -5,6 +5,7 @@ This module provides functions for creating, initializing, and manipulating module ExpressionBuilderModule using DispatchDoctor: @unstable +using Compat: Fix using DynamicExpressions: AbstractExpressionNode, AbstractExpression, @@ -133,20 +134,20 @@ end pop::Population, options::AbstractOptions, dataset::Dataset{T,L} ) where {T,L} return Population( - map(member -> embed_metadata(member, options, dataset), pop.members) + map(Fix{2}(Fix{3}(embed_metadata, dataset), options), pop.members) ) end function embed_metadata( hof::HallOfFame, options::AbstractOptions, dataset::Dataset{T,L} ) where {T,L} return HallOfFame( - map(member -> embed_metadata(member, options, dataset), hof.members), hof.exists + map(Fix{2}(Fix{3}(embed_metadata, dataset), options), hof.members), hof.exists ) end function embed_metadata( vec::Vector{H}, options::AbstractOptions, dataset::Dataset{T,L} ) where {T,L,H<:Union{HallOfFame,Population,PopMember}} - return map(elem -> embed_metadata(elem, options, dataset), vec) + return map(Fix{2}(Fix{3}(embed_metadata, dataset), options), vec) end end diff --git a/src/InterfaceDynamicExpressions.jl b/src/InterfaceDynamicExpressions.jl index 6c8aa45fd..86f14d3be 100644 --- a/src/InterfaceDynamicExpressions.jl +++ b/src/InterfaceDynamicExpressions.jl @@ -1,6 +1,7 @@ module InterfaceDynamicExpressionsModule using Printf: @sprintf +using Compat: Fix using DynamicExpressions: DynamicExpressions as DE, OperatorEnum, @@ -199,16 +200,17 @@ Convert an equation to a string. ) end - vprecision = vals[options.print_precision] if X_sym_units !== nothing || y_sym_units !== nothing return DE.string_tree( tree, DE.get_operators(tree, options); - f_variable=(feature, vname) -> string_variable(feature, vname, X_sym_units), + f_variable=Fix{3}(string_variable, X_sym_units), f_constant=let unit_placeholder = options.dimensionless_constants_only ? "" : WILDCARD_UNIT_STRING - (val,) -> string_constant(val, vprecision, unit_placeholder) + Fix{2}( + Fix{3}(string_constant, unit_placeholder), options.v_print_precision + ) end, variable_names=display_variable_names, kws..., @@ -218,13 +220,12 @@ Convert an equation to a string. tree, DE.get_operators(tree, options); f_variable=string_variable, - f_constant=(val,) -> string_constant(val, vprecision, ""), + f_constant=Fix{2}(Fix{3}(string_constant, ""), options.v_print_precision), variable_names=display_variable_names, kws..., ) end end -const vals = ntuple(Val, 8192) function string_variable_raw(feature, variable_names) if variable_names === nothing || feature > length(variable_names) return "x" * string(feature) diff --git a/src/Options.jl b/src/Options.jl index 0ec5b3418..4b59b0390 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -28,7 +28,7 @@ using ..OperatorsModule: using ..MutationWeightsModule: AbstractMutationWeights, MutationWeights, mutations import ..OptionsStructModule: Options using ..OptionsStructModule: ComplexityMapping, operator_specialization -using ..UtilsModule: max_ops, @save_kwargs, @ignore +using ..UtilsModule: @save_kwargs, @ignore """Build constraints on operator-level complexity from a user-passed dict.""" @unstable function build_constraints(; @@ -731,8 +731,8 @@ $(OPTION_DESCRIPTIONS) @assert maxsize > 3 @assert warmup_maxsize_by >= 0.0f0 - @assert length(unary_operators) <= max_ops - @assert length(binary_operators) <= max_ops + @assert length(unary_operators) <= 8192 + @assert length(binary_operators) <= 8192 @assert tournament_selection_n < population_size "`tournament_selection_n` must be less than `population_size`" # Make sure nested_constraints contains functions within our operator set: @@ -798,7 +798,7 @@ $(OPTION_DESCRIPTIONS) early_stop_condition = if typeof(early_stop_condition) <: Real # Need to make explicit copy here for this to work: stopping_point = Float64(early_stop_condition) - (loss, complexity) -> loss < stopping_point + Base.Fix2(<, stopping_point) ∘ first ∘ tuple # Equivalent to (l, c) -> l < stopping_point else early_stop_condition end @@ -850,6 +850,7 @@ $(OPTION_DESCRIPTIONS) bumper, deprecated_return_state, typeof(_autodiff_backend), + print_precision, }( operators, _bin_constraints, @@ -887,7 +888,7 @@ $(OPTION_DESCRIPTIONS) fraction_replaced_hof, topn, verbosity, - print_precision, + Val(print_precision), save_to_file, probability_negate_constant, length(unary_operators), diff --git a/src/OptionsStruct.jl b/src/OptionsStruct.jl index f249fcf6f..b39dbf0b5 100644 --- a/src/OptionsStruct.jl +++ b/src/OptionsStruct.jl @@ -188,6 +188,7 @@ struct Options{ _bumper, _return_state, AD, + print_precision, } <: AbstractOptions operators::OP bin_constraints::Vector{Tuple{Int,Int}} @@ -225,7 +226,7 @@ struct Options{ fraction_replaced_hof::Float32 topn::Int verbosity::Union{Int,Nothing} - print_precision::Int + v_print_precision::Val{print_precision} save_to_file::Bool probability_negate_constant::Float32 nuna::Int @@ -256,7 +257,7 @@ struct Options{ use_recorder::Bool end -function Base.print(io::IO, options::Options) +function Base.print(io::IO, @nospecialize(options::Options)) return print( io, "Options(" * @@ -278,7 +279,9 @@ function Base.print(io::IO, options::Options) ")", ) end -Base.show(io::IO, ::MIME"text/plain", options::Options) = Base.print(io, options) +function Base.show(io::IO, ::MIME"text/plain", @nospecialize(options::Options)) + return Base.print(io, options) +end specialized_options(options::AbstractOptions) = options @unstable function specialized_options(options::Options) diff --git a/src/Population.jl b/src/Population.jl index d475da168..6b9173c5c 100644 --- a/src/Population.jl +++ b/src/Population.jl @@ -139,7 +139,7 @@ function _best_of_sample( scores[i] = member.score * exp(adaptive_parsimony_scaling * frequency) end else - map!(member -> member.score, scores, members) + map!(_get_score, scores, members) end chosen_idx = if p == 1.0 @@ -157,6 +157,7 @@ function _best_of_sample( end return members[chosen_idx] end +_get_score(member::PopMember) = member.score const CACHED_WEIGHTS = let init_k = collect(0:5), diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl index 5a1f3fe6e..551bac013 100644 --- a/src/ProgressBars.jl +++ b/src/ProgressBars.jl @@ -18,6 +18,8 @@ mutable struct WrappedProgressBar end end +precompile(Tuple{typeof(Base.setproperty!),WrappedProgressBar,Symbol,Int64}) + """Iterate a progress bar without needing to store cycle/state externally.""" function manually_iterate!(pbar::WrappedProgressBar) cur_cycle = pbar.cycle diff --git a/src/SearchUtils.jl b/src/SearchUtils.jl index 5f04b5b0b..a1fbac7d8 100644 --- a/src/SearchUtils.jl +++ b/src/SearchUtils.jl @@ -8,6 +8,7 @@ using Dates: Dates using Distributed: Distributed, @spawnat, Future, procs, addprocs using StatsBase: mean using DispatchDoctor: @unstable +using Compat: Fix using DynamicExpressions: AbstractExpression, string_tree using ..UtilsModule: subscriptify @@ -79,7 +80,6 @@ end @unstable function RuntimeOptions(; niterations::Int=10, nout::Int=1, - options::AbstractOptions=Options(), parallelism=:multithreading, numprocs::Union{Int,Nothing}=nothing, procs::Union{Vector{Int},Nothing}=nothing, @@ -91,6 +91,10 @@ end verbosity::Union{Int,Nothing}=nothing, progress::Union{Bool,Nothing}=nothing, v_dim_out::Val{DIM_OUT}=Val(nothing), + # Defined from options + options_return_state, + options_verbosity, + options_progress, ) where {DIM_OUT} concurrency = if parallelism in (:multithreading, "multithreading") :multithreading @@ -120,14 +124,14 @@ end _return_state = if return_state isa Val first(typeof(return_state).parameters) else - if options.return_state === Val(nothing) + if options_return_state === Val(nothing) return_state === nothing ? false : return_state else @assert( return_state === nothing, "You cannot set `return_state` in both the `AbstractOptions` and in the passed arguments." ) - first(typeof(options.return_state).parameters) + first(typeof(options_return_state).parameters) end end @@ -151,11 +155,11 @@ end end end - _verbosity = if verbosity === nothing && options.verbosity === nothing + _verbosity = if verbosity === nothing && options_verbosity === nothing 1 - elseif verbosity === nothing && options.verbosity !== nothing - options.verbosity - elseif verbosity !== nothing && options.verbosity === nothing + elseif verbosity === nothing && options_verbosity !== nothing + options_verbosity + elseif verbosity !== nothing && options_verbosity === nothing verbosity else error( @@ -163,11 +167,11 @@ end ) 1 end - _progress::Bool = if progress === nothing && options.progress === nothing + _progress::Bool = if progress === nothing && options_progress === nothing (_verbosity > 0) && nout == 1 - elseif progress === nothing && options.progress !== nothing - options.progress - elseif progress !== nothing && options.progress === nothing + elseif progress === nothing && options_progress !== nothing + options_progress + elseif progress !== nothing && options_progress === nothing progress else error( @@ -319,9 +323,9 @@ function init_dummy_pops( ] end -struct StdinReader{ST} +struct StdinReader can_read_user_input::Bool - stream::ST + stream::IO end """Start watching stream (like stdin) for user input.""" @@ -344,6 +348,7 @@ function watch_stream(stream) end return StdinReader(can_read_user_input, stream) end +precompile(Tuple{typeof(watch_stream),Base.TTY}) """Close the stdin reader and stop reading.""" function close_reader!(reader::StdinReader) @@ -628,9 +633,7 @@ function save_to_file( # Write file twice in case exit in middle of filewrite for out_file in (output_file, output_file * ".bak") - open(out_file, "w") do io - write(io, s) - end + open(Base.Fix2(write, s), out_file, "w") end return nothing end diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index 3e8c59b5f..c15996129 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -157,7 +157,7 @@ using DynamicExpressions: with_type_parameters LogitDistLoss, QuantileLoss, LogCoshLoss -using Compat: @compat +using Compat: @compat, Fix @compat public AbstractOptions, AbstractRuntimeOptions, @@ -262,7 +262,8 @@ using .CoreModule: erf, erfc, atanh_clip, - create_expression + create_expression, + has_units using .UtilsModule: is_anonymous_function, recursive_merge, json3_write, @ignore using .ComplexityModule: compute_complexity using .CheckConstraintsModule: check_constraints @@ -507,14 +508,19 @@ function equation_search( runtime_options::Union{AbstractRuntimeOptions,Nothing}=nothing, runtime_options_kws..., ) where {T<:DATA_TYPE,L<:LOSS_TYPE,D<:Dataset{T,L}} - runtime_options = if runtime_options === nothing - RuntimeOptions(; options, nout=length(datasets), runtime_options_kws...) - else - runtime_options - end + _runtime_options = @something( + runtime_options, + RuntimeOptions(; + options_return_state=options.return_state, + options_verbosity=options.verbosity, + options_progress=options.progress, + nout=length(datasets), + runtime_options_kws..., + ) + ) # Underscores here mean that we have mutated the variable - return _equation_search(datasets, runtime_options, options, saved_state) + return _equation_search(datasets, _runtime_options, options, saved_state) end @noinline function _equation_search( @@ -1029,13 +1035,10 @@ function _format_output( out_hof = if ropt.dim_out == 1 embed_metadata(only(state.halls_of_fame), options, only(datasets)) else - map(j -> embed_metadata(state.halls_of_fame[j], options, datasets[j]), 1:nout) + map(Fix{2}(embed_metadata, options), state.halls_of_fame, datasets) end if ropt.return_state - return ( - map(j -> embed_metadata(state.last_pops[j], options, datasets[j]), 1:nout), - out_hof, - ) + return (map(Fix{2}(embed_metadata, options), state.last_pops, datasets), out_hof) else return out_hof end diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 5099d5927..ed9e60bf3 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -1,6 +1,7 @@ module TemplateExpressionModule using Random: AbstractRNG +using Compat: Fix using DispatchDoctor: @unstable using DynamicExpressions: DynamicExpressions as DE, @@ -23,7 +24,8 @@ using DynamicExpressions: using DynamicExpressions.InterfacesModule: ExpressionInterface, Interfaces, @implements, all_ei_methods_except, Arguments -using ..CoreModule: AbstractOptions, Dataset, CoreModule as CM, AbstractMutationWeights +using ..CoreModule: + AbstractOptions, Dataset, CoreModule as CM, AbstractMutationWeights, has_units using ..ConstantOptimizationModule: ConstantOptimizationModule as CO using ..InterfaceDynamicExpressionsModule: InterfaceDynamicExpressionsModule as IDE using ..MutationFunctionsModule: MutationFunctionsModule as MF @@ -388,13 +390,18 @@ end @unstable IDE.expected_array_type(::AbstractMatrix, ::Type{<:TemplateExpression}) = Any function DA.violates_dimensional_constraints( - tree::TemplateExpression, dataset::Dataset, options::AbstractOptions + @nospecialize(tree::TemplateExpression), + dataset::Dataset, + @nospecialize(options::AbstractOptions) ) - @assert dataset.X_units === nothing && dataset.y_units === nothing + @assert !has_units(dataset) return false end function MM.condition_mutation_weights!( - weights::AbstractMutationWeights, member::P, options::AbstractOptions, curmaxsize::Int + @nospecialize(weights::AbstractMutationWeights), + @nospecialize(member::P), + @nospecialize(options::AbstractOptions), + curmaxsize::Int, ) where {T,L,N<:TemplateExpression,P<:PopMember{T,L,N}} # HACK TODO return nothing @@ -498,9 +505,12 @@ function CC.check_constraints( maxsize && return false # Then, we check other constraints for inner expressions: - return all( - t -> CC.check_constraints(t, options, maxsize, nothing), values(raw_contents) - ) + for t in values(raw_contents) + if !CC.check_constraints(t, options, maxsize, nothing) + return false + end + end + return true # TODO: The concept of `cursize` doesn't really make sense here. end function contains_other_features_than(tree::AbstractExpression, features) diff --git a/src/Utils.jl b/src/Utils.jl index e8ae7653d..64058fc9d 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -26,6 +26,7 @@ function is_anonymous_function(op) op_string[1] == '#' && op_string[2] in ('1', '2', '3', '4', '5', '6', '7', '8', '9') end +precompile(Tuple{typeof(is_anonymous_function),Function}) recursive_merge(x::AbstractVector...) = cat(x...; dims=1) recursive_merge(x::AbstractDict...) = merge(recursive_merge, x...) @@ -88,12 +89,13 @@ function _to_vec(v::MutableTuple{S,T}) where {S,T} return x end -const max_ops = 8192 -const vals = ntuple(Val, max_ops) - """Return the bottom k elements of x, and their indices.""" -bottomk_fast(x::AbstractVector{T}, k) where {T} = - _bottomk_dispatch(x, vals[k])::Tuple{Vector{T},Vector{Int}} +bottomk_fast(x::AbstractVector{T}, k) where {T} = Base.Cartesian.@nif( + 32, + d -> d == k, + d -> _bottomk_dispatch(x, Val(d))::Tuple{Vector{T},Vector{Int}}, + _ -> _bottomk_dispatch(x, Val(k))::Tuple{Vector{T},Vector{Int}} +) function _bottomk_dispatch(x::AbstractVector{T}, ::Val{k}) where {T,k} if k == 1 From 00af0ac16632e101a6cc9430a85ee0c8c8b74c9e Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 16:37:19 +0000 Subject: [PATCH 54/74] fix: annealing issue --- src/SingleIteration.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/SingleIteration.jl b/src/SingleIteration.jl index 90edb8ee7..2d36e6c87 100644 --- a/src/SingleIteration.jl +++ b/src/SingleIteration.jl @@ -33,7 +33,7 @@ function s_r_cycle( if !options.annealing min_temp = max_temp end - all_temperatures = LinRange(max_temp, min_temp, ncycles) + all_temperatures = ncycles > 1 ? LinRange(max_temp, min_temp, ncycles) : [max_temp] best_examples_seen = HallOfFame(options, dataset) num_evals = 0.0 From bd2eac48277e1c4171a2f349ea7dbee9f408c8b9 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 17:18:45 +0000 Subject: [PATCH 55/74] refactor: use `something` to clean up code --- src/Dataset.jl | 21 ++------ src/LossFunctions.jl | 9 ++-- src/MLJInterface.jl | 8 +-- src/MutationFunctions.jl | 25 +++++---- src/Options.jl | 4 +- src/SearchUtils.jl | 109 +++++++++++++-------------------------- 6 files changed, 61 insertions(+), 115 deletions(-) diff --git a/src/Dataset.jl b/src/Dataset.jl index f9e28bcc5..49a452938 100644 --- a/src/Dataset.jl +++ b/src/Dataset.jl @@ -131,22 +131,11 @@ function Dataset( n = size(X, BATCH_DIM) nfeatures = size(X, FEATURE_DIM) - variable_names = if variable_names === nothing - ["x$(i)" for i in 1:nfeatures] - else - variable_names - end - display_variable_names = if display_variable_names === nothing - ["x$(subscriptify(i))" for i in 1:nfeatures] - else - display_variable_names - end - - y_variable_name = if y_variable_name === nothing - ("y" ∉ variable_names) ? "y" : "target" - else - y_variable_name - end + variable_names = @something(variable_names, ["x$(i)" for i in 1:nfeatures]) + display_variable_names = @something( + display_variable_names, ["x$(subscriptify(i))" for i in 1:nfeatures] + ) + y_variable_name = @something(y_variable_name, ("y" ∉ variable_names) ? "y" : "target") avg_y = if y === nothing || !(eltype(y) isa Number) nothing else diff --git a/src/LossFunctions.jl b/src/LossFunctions.jl index 01dcca86b..637bb0fa4 100644 --- a/src/LossFunctions.jl +++ b/src/LossFunctions.jl @@ -140,7 +140,7 @@ function eval_loss_batched( regularization::Bool=true, idx=nothing, )::L where {T<:DATA_TYPE,L<:LOSS_TYPE} - _idx = idx === nothing ? batch_sample(dataset, options) : idx + _idx = @something(idx, batch_sample(dataset, options)) return eval_loss(tree, dataset, options; regularization=regularization, idx=_idx) end @@ -172,7 +172,7 @@ function loss_to_score( L(0.01) end loss_val = loss / normalization - size = complexity === nothing ? compute_complexity(member, options) : complexity + size = @something(complexity, compute_complexity(member, options)) parsimony_term = size * options.parsimony loss_val += L(parsimony_term) @@ -247,11 +247,8 @@ function dimensional_regularization( ) where {T<:DATA_TYPE,L<:LOSS_TYPE} if !violates_dimensional_constraints(tree, dataset, options) return zero(L) - elseif options.dimensional_constraint_penalty === nothing - return L(1000) - else - return L(options.dimensional_constraint_penalty::Float32) end + return convert(L, something(options.dimensional_constraint_penalty, 1000)) end end diff --git a/src/MLJInterface.jl b/src/MLJInterface.jl index b2eb8db4a..395837ef2 100644 --- a/src/MLJInterface.jl +++ b/src/MLJInterface.jl @@ -437,17 +437,17 @@ function _predict(m::M, fitresult, Xnew, idx, classes) where {M<:AbstractSRRegre validate_variable_names(variable_names, fitresult) validate_units(X_units_clean, fitresult.X_units) - idx = idx === nothing ? params.best_idx : idx + _idx = something(idx, params.best_idx) if M <: SRRegressor return eval_tree_mlj( - params.equations[idx], Xnew_t, classes, m, T, fitresult, nothing, prototype + params.equations[_idx], Xnew_t, classes, m, T, fitresult, nothing, prototype ) elseif M <: MultitargetSRRegressor outs = [ eval_tree_mlj( - params.equations[i][idx[i]], Xnew_t, classes, m, T, fitresult, i, prototype - ) for i in eachindex(idx, params.equations) + params.equations[i][_idx[i]], Xnew_t, classes, m, T, fitresult, i, prototype + ) for i in eachindex(_idx, params.equations) ] out_matrix = reduce(hcat, outs) if !fitresult.y_is_table diff --git a/src/MutationFunctions.jl b/src/MutationFunctions.jl index 73e0367b0..5348d2ffb 100644 --- a/src/MutationFunctions.jl +++ b/src/MutationFunctions.jl @@ -149,11 +149,11 @@ function append_random_op( options::AbstractOptions, nfeatures::Int, rng::AbstractRNG=default_rng(); - makeNewBinOp::Union{Bool,Nothing}=nothing, + make_new_bin_op::Union{Bool,Nothing}=nothing, ) where {T<:DATA_TYPE} tree, context = get_contents_for_mutation(ex, rng) ex = with_contents_for_mutation( - ex, append_random_op(tree, options, nfeatures, rng; makeNewBinOp), context + ex, append_random_op(tree, options, nfeatures, rng; make_new_bin_op), context ) return ex end @@ -162,16 +162,15 @@ function append_random_op( options::AbstractOptions, nfeatures::Int, rng::AbstractRNG=default_rng(); - makeNewBinOp::Union{Bool,Nothing}=nothing, + make_new_bin_op::Union{Bool,Nothing}=nothing, ) where {T<:DATA_TYPE} node = rand(rng, NodeSampler(; tree, filter=t -> t.degree == 0)) - if makeNewBinOp === nothing - choice = rand(rng) - makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin) - end + _make_new_bin_op = @something( + make_new_bin_op, rand(rng) < options.nbin / (options.nuna + options.nbin), + ) - if makeNewBinOp + if _make_new_bin_op newnode = constructorof(typeof(tree))(; op=rand(rng, 1:(options.nbin)), l=make_random_leaf(nfeatures, T, typeof(tree), rng, options), @@ -210,10 +209,10 @@ function insert_random_op( ) where {T<:DATA_TYPE} node = rand(rng, NodeSampler(; tree)) choice = rand(rng) - makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin) + make_new_bin_op = choice < options.nbin / (options.nuna + options.nbin) left = copy(node) - if makeNewBinOp + if make_new_bin_op right = make_random_leaf(nfeatures, T, typeof(tree), rng, options) newnode = constructorof(typeof(tree))(; op=rand(rng, 1:(options.nbin)), l=left, r=right @@ -246,10 +245,10 @@ function prepend_random_op( ) where {T<:DATA_TYPE} node = tree choice = rand(rng) - makeNewBinOp = choice < options.nbin / (options.nuna + options.nbin) + make_new_bin_op = choice < options.nbin / (options.nuna + options.nbin) left = copy(tree) - if makeNewBinOp + if make_new_bin_op right = make_random_leaf(nfeatures, T, typeof(tree), rng, options) newnode = constructorof(typeof(tree))(; op=rand(rng, 1:(options.nbin)), l=left, r=right @@ -399,7 +398,7 @@ function gen_random_tree_fixed_size( while cur_size < node_count if cur_size == node_count - 1 # only unary operator allowed. options.nuna == 0 && break # We will go over the requested amount, so we must break. - tree = append_random_op(tree, options, nfeatures, rng; makeNewBinOp=false) + tree = append_random_op(tree, options, nfeatures, rng; make_new_bin_op=false) else tree = append_random_op(tree, options, nfeatures, rng) end diff --git a/src/Options.jl b/src/Options.jl index 4b59b0390..aa247709e 100644 --- a/src/Options.jl +++ b/src/Options.jl @@ -597,7 +597,7 @@ $(OPTION_DESCRIPTIONS) fast_cycle::Bool=false, npopulations::Union{Nothing,Integer}=nothing, npop::Union{Nothing,Integer}=nothing, - deprecated_return_state=nothing, + deprecated_return_state::Union{Bool,Nothing}=nothing, kws..., ######################################### ) @@ -848,7 +848,7 @@ $(OPTION_DESCRIPTIONS) typeof(set_mutation_weights), turbo, bumper, - deprecated_return_state, + deprecated_return_state::Union{Bool,Nothing}, typeof(_autodiff_backend), print_precision, }( diff --git a/src/SearchUtils.jl b/src/SearchUtils.jl index a1fbac7d8..e11c97112 100644 --- a/src/SearchUtils.jl +++ b/src/SearchUtils.jl @@ -86,16 +86,16 @@ end addprocs_function::Union{Function,Nothing}=nothing, heap_size_hint_in_bytes::Union{Integer,Nothing}=nothing, runtests::Bool=true, - return_state::Union{Bool,Nothing,Val}=nothing, + return_state::VRS=nothing, run_id::Union{String,Nothing}=nothing, verbosity::Union{Int,Nothing}=nothing, progress::Union{Bool,Nothing}=nothing, v_dim_out::Val{DIM_OUT}=Val(nothing), # Defined from options - options_return_state, - options_verbosity, - options_progress, -) where {DIM_OUT} + options_return_state::Val{ORS}=Val(nothing), + options_verbosity::Union{Integer,Nothing}=nothing, + options_progress::Union{Bool,Nothing}=nothing, +) where {DIM_OUT,ORS,VRS<:Union{Bool,Nothing,Val}} concurrency = if parallelism in (:multithreading, "multithreading") :multithreading elseif parallelism in (:multiprocessing, "multiprocessing") @@ -109,37 +109,32 @@ end ) :serial end - not_distributed = concurrency in (:multithreading, :serial) - not_distributed && - procs !== nothing && - error( + if concurrency in (:multithreading, :serial) + numprocs !== nothing && error( + "`numprocs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.", + ) + procs !== nothing && error( "`procs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.", ) - not_distributed && - numprocs !== nothing && + end + verbosity !== nothing && + options_verbosity !== nothing && error( - "`numprocs` should not be set when using `parallelism=$(parallelism)`. Please use `:multiprocessing`.", + "You cannot set `verbosity` in both the search parameters " * + "`AbstractOptions` and the call to `equation_search`.", + ) + progress !== nothing && + options_progress !== nothing && + error( + "You cannot set `progress` in both the search parameters " * + "`AbstractOptions` and the call to `equation_search`.", + ) + ORS !== nothing && + return_state !== nothing && + error( + "You cannot set `return_state` in both the `AbstractOptions` and in the passed arguments.", ) - _return_state = if return_state isa Val - first(typeof(return_state).parameters) - else - if options_return_state === Val(nothing) - return_state === nothing ? false : return_state - else - @assert( - return_state === nothing, - "You cannot set `return_state` in both the `AbstractOptions` and in the passed arguments." - ) - first(typeof(options_return_state).parameters) - end - end - - dim_out = if DIM_OUT === nothing - nout > 1 ? 2 : 1 - else - DIM_OUT - end _numprocs::Int = if numprocs === nothing if procs === nothing 4 @@ -155,42 +150,17 @@ end end end - _verbosity = if verbosity === nothing && options_verbosity === nothing - 1 - elseif verbosity === nothing && options_verbosity !== nothing - options_verbosity - elseif verbosity !== nothing && options_verbosity === nothing - verbosity - else - error( - "You cannot set `verbosity` in both the search parameters `AbstractOptions` and the call to `equation_search`.", - ) - 1 - end - _progress::Bool = if progress === nothing && options_progress === nothing - (_verbosity > 0) && nout == 1 - elseif progress === nothing && options_progress !== nothing - options_progress - elseif progress !== nothing && options_progress === nothing - progress - else - error( - "You cannot set `progress` in both the search parameters `AbstractOptions` and the call to `equation_search`.", - ) - false - end - - _addprocs_function = addprocs_function === nothing ? addprocs : addprocs_function + _return_state = VRS <: Val ? first(VRS.parameters) : something(ORS, return_state, false) + dim_out = something(DIM_OUT, nout > 1 ? 2 : 1) + _verbosity = something(verbosity, options_verbosity, 1) + _progress = something(progress, options_progress, (_verbosity > 0) && nout == 1) + _addprocs_function = something(addprocs_function, addprocs) + _run_id = @something(run_id, generate_run_id()) exeflags = if concurrency == :multiprocessing heap_size_hint_in_megabytes = floor( - Int, ( - if heap_size_hint_in_bytes === nothing - (Sys.free_memory() / _numprocs) - else - heap_size_hint_in_bytes - end - ) / 1024^2 + Int, + (@something(heap_size_hint_in_bytes, (Sys.free_memory() / _numprocs))) / 1024^2, ) _verbosity > 0 && heap_size_hint_in_bytes === nothing && @@ -201,12 +171,6 @@ end `` end - _run_id = if run_id === nothing - generate_run_id() - else - run_id - end - return RuntimeOptions{concurrency,dim_out,_return_state}( niterations, _numprocs, @@ -597,10 +561,7 @@ function save_to_file( options::AbstractOptions, ropt::AbstractRuntimeOptions, ) where {T,L} - output_directory = joinpath( - options.output_directory === nothing ? "outputs" : options.output_directory, - ropt.run_id, - ) + output_directory = joinpath(something(options.output_directory, "outputs"), ropt.run_id) mkpath(output_directory) filename = nout > 1 ? "hall_of_fame_output$(j).csv" : "hall_of_fame.csv" output_file = joinpath(output_directory, filename) From 86f3f435b4d228fcabd6c6ddd4588adab8881409 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 18:26:47 +0000 Subject: [PATCH 56/74] deps: bump DispatchDoctor with `@nospecialize` support --- Project.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Project.toml b/Project.toml index 40b235088..d965e30b0 100644 --- a/Project.toml +++ b/Project.toml @@ -44,7 +44,7 @@ Compat = "^4.16" ConstructionBase = "<1.5.7" Dates = "1" DifferentiationInterface = "0.5, 0.6" -DispatchDoctor = "0.4" +DispatchDoctor = "^0.4.17" Distributed = "<0.0.1, 1" DynamicExpressions = "1" DynamicQuantities = "1" From ed2539fd58a9f0ab19a6af8f1d4925af37914208 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 19:02:21 +0000 Subject: [PATCH 57/74] docs: build Literate.jl example --- docs/Project.toml | 1 + docs/make.jl | 5 + docs/utils.jl | 70 ++++++++++ examples/template_expression_complex.jl | 175 +++++++++++++++++++----- 4 files changed, 213 insertions(+), 38 deletions(-) create mode 100644 docs/utils.jl diff --git a/docs/Project.toml b/docs/Project.toml index 7a01b4d6a..6399bf082 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -2,6 +2,7 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" DynamicExpressions = "a40a106e-89c9-4ca8-8020-a735e8728b6b" Gumbo = "708ec375-b3d6-5a57-a7ce-8257bf98657a" +Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b" [compat] diff --git a/docs/make.jl b/docs/make.jl index b0546821b..1d4cf7570 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -16,6 +16,11 @@ using SymbolicRegression: AbstractSearchState, @extend_operators using DynamicExpressions +using Literate: markdown + +include("utils.jl") +process_literate_blocks("test") +process_literate_blocks("examples") DocMeta.setdocmeta!( SymbolicRegression, :DocTestSetup, :(using LossFunctions); recursive=true diff --git a/docs/utils.jl b/docs/utils.jl new file mode 100644 index 000000000..b1ea4247c --- /dev/null +++ b/docs/utils.jl @@ -0,0 +1,70 @@ + +# Function to process literate blocks in test files +function process_literate_blocks(base_path="test") + test_dir = joinpath(@__DIR__, "..", base_path) + for file in readdir(test_dir) + if endswith(file, ".jl") + process_file(joinpath(test_dir, file)) + end + end +end + +function process_file(filepath) + content = read(filepath, String) + blocks = match_literate_blocks(content) + for (output_file, block_content) in blocks + process_literate_block(output_file, block_content, filepath) + end +end + +function match_literate_blocks(content) + pattern = r"^(\s*)#literate_begin\s+file=\"(.*?)\"\n(.*?)#literate_end"sm + matches = collect(eachmatch(pattern, content)) + return Dict( + m.captures[2] => process_block_content(m.captures[1], m.captures[3]) for + m in matches + ) +end + +function process_block_content(indent, block_content) + if isempty(block_content) + return "" + end + indent_length = length(indent) + lines = split(block_content, '\n') + stripped_lines = [ + if length(line) > indent_length + line[(indent_length + 1):end] + else + "" + end for line in lines + ] + return strip(join(stripped_lines, '\n')) +end + +function process_literate_block(output_file, content, source_file) + # Create a temporary .jl file + temp_file = tempname() * ".jl" + write(temp_file, content) + + # Process the temporary file with Literate.markdown + output_dir = joinpath(@__DIR__, "src", "examples") + base_name = first(splitext(basename(output_file))) # Remove any existing extension + + markdown(temp_file, output_dir; name=base_name, documenter=true) + + # Generate the relative path for EditURL + edit_path = relpath(source_file, output_dir) + + # Read the generated markdown file + md_file = joinpath(output_dir, base_name * ".md") + md_content = read(md_file, String) + + # Replace the existing EditURL with the correct one + new_content = replace(md_content, r"EditURL = .*" => "EditURL = \"$edit_path\"") + + # Write the updated content back to the file + write(md_file, new_content) + + @info "Processed literate block to $md_file with EditURL set to $edit_path" +end diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index e20d6bcb6..b57900d01 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -1,7 +1,10 @@ -using SymbolicRegression -using Random: AbstractRNG, default_rng, MersenneTwister -using MLJBase: machine, fit!, report -using Test: @test +#! format: off +#literate_begin file=src/examples/template_expression.md +#= +# Searching with template expressions +=# +using SymbolicRegression, MLJBase, Random +using Test: @test #src function cross((a1, a2, a3), (b1, b2, b3)) return (a2 * b3 - a3 * b2, a3 * b1 - a1 * b3, a1 * b2 - a2 * b1) @@ -10,44 +13,76 @@ end options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) operators = options.operators -# First, let's generate our example data. -# Let's take 1000 trials: +#= +First, let's generate our example data. +Let's take 1000 trials: +=# n = 1000 -rng = MersenneTwister(0) +rng = Random.MersenneTwister(0); -# Say that each time we run the experiment, the temperature is a bit different: +#= +Say that each time we run the experiment, the temperature is a bit different: +=# T = 298.15 .+ 0.5 .* rand(rng, n) +T[1:3] -# We run the experiment, and record the velocity at a random time -# between 0 and 10 seconds. +#= +We run the experiment, and record the velocity at a random time +between 0 and 10 seconds. +=# t = 10 .* rand(rng, n) +t[1:3] -# We introduce a particle at a random velocity between -1 and 1 +#= +We introduce a particle at a random velocity between -1 and 1 +=# v = [ntuple(_ -> 2 * rand(rng) - 1, 3) for _ in 1:n] +v[1:3] -### TRUE (unknown) MODEL ### -# Let's assume magnetic field is sinusoidal, with frequency 1 Hz, -# along axes x and y, and decays over t along the z-axis. +#= +**Now, let's create the true unknown model.** + +Let's assume magnetic field is sinusoidal, with frequency 1 Hz, +along axes x and y, and decays over t along the z-axis. +=# ω = 2π B = [(sin(ω * ti), cos(ω * ti), exp(-ti / 10)) for ti in t] +B[1:3] -# We assume the drag force is linear in the velocity and -# depends on the temperature with a power law. +#= +We assume the drag force is linear in the velocity and +depends on the temperature with a power law. +=# F_d = [-1e-5 * Ti^(3//2) .* vi for (Ti, vi) in zip(T, v)] -############################ +F_d[1:3] -# Now, let's compute the true magnetic force: +#= +Now, let's compute the true magnetic force: +=# F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] -# And sum it to get the total force: +F_mag[1:3] + +#= +And sum it to get the total force: +=# F = [fd .+ fm for (fd, fm) in zip(F_d, F_mag)] +F[1:3] -# And some random other expression to spice things up: +#= +And some random other expression to spice things up: +=# E = [sin(ω * ti) * cos(ω * ti) for ti in t] +E[1:3] -# This forms our dataset! +#= +This forms our dataset! +=# data = (; t, v, T, F, B, F_d, F_mag, E) +keys(data) -# Now, let's format it for input to the regressor: +#= +Now, let's format it for input to the regressor: +=# X = (; t=data.t, v_x=[vi[1] for vi in data.v], @@ -56,21 +91,33 @@ X = (; T=data.T, E=data.E, ) +keys(X) -# We can regress directly on a struct! -struct ForceVector{T} +#= +Template expressions allow us to regress directly on a struct, +so here we can define a `Force` type: +=# +struct Force{T} x::T y::T z::T E::T end -y = [ForceVector(F..., E) for (F, E) in zip(data.F, data.E)] +y = [Force(F..., E) for (F, E) in zip(data.F, data.E)] +y[1:3] -# Our variable names are the keys of the struct: +#= +Our variable names are the keys of the struct: +=# variable_names = ["t", "v_x", "v_y", "v_z", "T"] -# The trick is to define the right structure function. -# First, let's just make a function that prints the expression: +#= +Template expressions require you to define a _structure_ function, +which describes how to combine the sub-expressions into a single +expression, numerically evaluate them, and print them. + +First, let's just make a function that prints the expression: +=# function combine_strings(e) # e is a named tuple of strings representing each formula B_x_padded = e.B_x @@ -79,12 +126,13 @@ function combine_strings(e) return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n │ 𝐅 = ($(e.F_d_scale)) * 𝐯\n ╰ E = $(e.E)" end -# So, this will just print the separate B and F_d expressions we've learned. - -# Then, let's define an expression that takes the numerical values -# evaluated in the TemplateExpression, and combines them into the resultant -# force vector. Inside this function, we can do whatever we want. +#= +So, this will just print the separate B and F_d expressions we've learned. +Then, let's define an expression that takes the numerical values +evaluated in the TemplateExpression, and combines them into the resultant +force vector. Inside this function, we can do whatever we want. +=# function combine_vectors(e, X) # This time, e is a named tuple of *vectors*, representing the batched # evaluation of each formula. @@ -107,18 +155,44 @@ function combine_vectors(e, X) return [ForceVector((fd .+ fm)..., ei) for (fd, fm, ei) in zip(F_d, F_mag, E)] end -# For the functions we wish to learn, we can constraint what variables -# each of them depends on, explicitly. Let's say B only depends on time, -# and the drag force scale only depends on temperature (we explicitly -# multiply the velocity in) +#= +For the functions we wish to learn, we can constraint what variables +each of them depends on, explicitly. Let's say B only depends on time, +and the drag force scale only depends on temperature (we explicitly +multiply the velocity in). +=# variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5], E=[1]) +#= +Now, we can create our template expression: +=# structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :E)}(; combine_strings=combine_strings, combine_vectors=combine_vectors, variable_constraints=variable_constraints, ) +#= +Let's look at an example of how this would be used +in a TemplateExpression: +=# +t = Expression(Node{Float64}(; feature=1); operators, variable_names) +T = Expression(Node{Float64}(; feature=5); operators, variable_names) +B_x = B_y = B_z = 2.1 * cos(t) +F_d_scale = 1.0 * sqrt(T) +E = 2.1 * sin(t) * cos(t) + +ex = TemplateExpression( + (; B_x, B_y, B_z, F_d_scale, E); + structure, operators, variable_names +) + +#= +So we can see that it prints the expression as we've defined it. + +Now, we can create a regressor that builds template expressions +which follow this structure: +=# model = SRRegressor(; binary_operators=(+, -, *, /), unary_operators=(sin, cos, sqrt, exp), @@ -132,7 +206,32 @@ model = SRRegressor(; mutation_weights=MutationWeights(; rotate_tree=0.5), batching=true, batch_size=30, -) +); + +#= +Note how we also have to define the custom `elementwise_loss` +function. This is because our `combine_vectors` function +returns a `Force` struct, so we need to combine it against the truth! + +Next, we can set up our machine and fit: +=# mach = machine(model, X, y) + +#= +At this point, you would run: +```julia fit!(mach) +``` + +which should print using your `combine_strings` function +during the search. The final result is accessible with: +```julia +report(mach) +``` +which would return a named tuple of the fitted results, +including the `.equations` field, which is a vector +of `TemplateExpression` objects that dominated the Pareto front. +=# +#literate_end +#! format: on From 2597e96870951fe1b1fe817f29bb253910498ffa Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 21:53:19 +0000 Subject: [PATCH 58/74] docs: greatly improve TemplateExpression example --- docs/make.jl | 24 ++-- examples/template_expression_complex.jl | 159 +++++++++++++++++------- 2 files changed, 128 insertions(+), 55 deletions(-) diff --git a/docs/make.jl b/docs/make.jl index 1d4cf7570..c169b552d 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -22,13 +22,6 @@ include("utils.jl") process_literate_blocks("test") process_literate_blocks("examples") -DocMeta.setdocmeta!( - SymbolicRegression, :DocTestSetup, :(using LossFunctions); recursive=true -) -DocMeta.setdocmeta!( - SymbolicRegression, :DocTestSetup, :(using DynamicExpressions); recursive=true -) - readme = open(dirname(@__FILE__) * "/../README.md") do io read(io, String) end @@ -93,6 +86,12 @@ open(dirname(@__FILE__) * "/src/index.md", "w") do io write(io, index_base) end +DocMeta.setdocmeta!( + SymbolicRegression, + :DocTestSetup, + :(using LossFunctions, DynamicExpressions); + recursive=true, +) makedocs(; sitename="SymbolicRegression.jl", authors="Miles Cranmer", @@ -105,7 +104,10 @@ makedocs(; pages=[ "Contents" => "index_base.md", "Home" => "index.md", - "Examples" => "examples.md", + "Examples" => [ + "Short Examples" => "examples.md", + "Template Expressions" => "examples/template_expression.md", + ], "API" => "api.md", "Losses" => "losses.md", "Types" => "types.md", @@ -138,9 +140,11 @@ apply_to_a_href!(html.root) do element element.attributes["href"] = "#LossFunctions." * element.children[1].text end -# Then, we write the new html to the file: +# Then, we write the new html to the file, only if it has changed: open("docs/build/losses/index.html", "w") do io write(io, string(html)) end -deploydocs(; repo="github.com/MilesCranmer/SymbolicRegression.jl.git") +if !haskey(ENV, "JL_LIVERELOAD") + deploydocs(; repo="github.com/MilesCranmer/SymbolicRegression.jl.git") +end diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index b57900d01..48a2bab41 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -1,27 +1,69 @@ #! format: off -#literate_begin file=src/examples/template_expression.md +#literate_begin file="src/examples/template_expression.md" #= # Searching with template expressions -=# -using SymbolicRegression, MLJBase, Random -using Test: @test #src -function cross((a1, a2, a3), (b1, b2, b3)) - return (a2 * b3 - a3 * b2, a3 * b1 - a1 * b3, a1 * b2 - a2 * b1) -end +Template expressions are a powerful feature in SymbolicRegression.jl that allow you to impose structure +on the symbolic regression search. Rather than searching for a completely free-form expression, you can +specify a template that combines multiple sub-expressions in a prescribed way. -options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos)) -operators = options.operators +This is particularly useful when: +- You have domain knowledge about the functional form of your solution +- You want to learn vector-valued expressions (e.g., force fields, velocity fields) +- You need to enforce constraints on which variables can appear in different parts of the expression +- You want to share sub-expressions between multiple components + +For example, you might know that your system follows a pattern like: +`sin(f(x1, x2)) + g(x3)^2` +where `f` and `g` are unknown functions you want to learn. With template expressions, you can encode +this structure while still letting the symbolic regression search discover the optimal form of the +sub-expressions. + +In this tutorial, we'll walk through a complete example of using template expressions to learn +the components of a particle's motion under magnetic and drag forces. We'll see how to: + +1. Define the structure of our template +2. Specify constraints on which variables each sub-expression can access +3. Set up the symbolic regression search +4. Interpret and use the results + +Let's get started! +=# +using SymbolicRegression, Random +using MLJBase: machine, fit!, predict, report #= -First, let's generate our example data. -Let's take 1000 trials: + +## The Physical Problem + +We'll study a charged particle moving through a magnetic field with temperature-dependent drag. +The total force on the particle will have two components: + +```math +\mathbf{F} = \mathbf{F}_\text{drag} + \mathbf{F}_\text{magnetic} = -\eta(T)\mathbf{v} + q \mathbf{v} \times \mathbf{B}(t) +``` +where we will take ``q = 1`` for simplicity. + +From physics, we know: +- The magnetic force comes from a cross product with the field: ``\mathbf{F}_\text{magnetic} = \mathbf{v} \times \mathbf{B}`` +- The drag force opposes motion, and we'll define a simple model for it: ``\mathbf{F}_\text{drag} = -\eta(T)\mathbf{v}`` + +Now, the parts of this model we don't know: +- The magnetic field ``\mathbf{B}(t)`` varies in time throughout the experiment, but this pattern repeats for each experiment. We want to learn the components of this field, symbolically! +- The drag coefficient ``\eta(T)`` depends only on temperature. We also want to figure out what this is! + +We'll generate synthetic data from a known model and then try to rediscover these relationships, +**only knowing the total force** on a particle for a given experiment, as well as the input variables: +time, velocity, and temperature. +We will do this with template expressions to encode the physical structure of the problem. + +Let's say we run this experiment 1000 times: =# n = 1000 rng = Random.MersenneTwister(0); #= -Say that each time we run the experiment, the temperature is a bit different: +Each time we run the experiment, the temperature is a bit different: =# T = 298.15 .+ 0.5 .* rand(rng, n) T[1:3] @@ -42,46 +84,69 @@ v[1:3] #= **Now, let's create the true unknown model.** -Let's assume magnetic field is sinusoidal, with frequency 1 Hz, -along axes x and y, and decays over t along the z-axis. +Let's assume the magnetic field is sinusoidal with frequency 1 Hz along axes x and y, +and decays exponentially along the z-axis: + +```math +\mathbf{B}(t) = \begin{pmatrix} +\sin(\omega t) \\ +\cos(\omega t) \\ +e^{-t/10} +\end{pmatrix} +\quad \text{where} \quad \omega = 2\pi +``` + +This gives us a rotating magnetic field in the x-y plane that weakens along z: =# -ω = 2π -B = [(sin(ω * ti), cos(ω * ti), exp(-ti / 10)) for ti in t] +omega = 2π +B = [(sin(omega * ti), cos(omega * ti), exp(-ti / 10)) for ti in t] B[1:3] #= We assume the drag force is linear in the velocity and -depends on the temperature with a power law. +depends on the temperature with a power law: + +```math +\mathbf{F}_\text{drag} = -\alpha T^{3/2} \mathbf{v} +\quad \text{where} \quad \alpha = 10^{-5} +``` + +This creates a temperature-dependent damping effect: =# F_d = [-1e-5 * Ti^(3//2) .* vi for (Ti, vi) in zip(T, v)] F_d[1:3] #= -Now, let's compute the true magnetic force: +Now, let's compute the true magnetic force, in 3D: =# +cross((a1, a2, a3), (b1, b2, b3)) = (a2 * b3 - a3 * b2, a3 * b1 - a1 * b3, a1 * b2 - a2 * b1) F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] F_mag[1:3] #= -And sum it to get the total force: +We then sum these to get the total force: =# F = [fd .+ fm for (fd, fm) in zip(F_d, F_mag)] F[1:3] #= -And some random other expression to spice things up: +Just for fun, let's add another variable that we have to predict, +some ``\psi(t)``: +```math +\psi(t) = \sin(3 t) \cos(2 t) +``` =# -E = [sin(ω * ti) * cos(ω * ti) for ti in t] -E[1:3] +psi = [sin(3 * ti) * cos(2 * ti) for ti in t] +psi[1:3] #= This forms our dataset! =# -data = (; t, v, T, F, B, F_d, F_mag, E) +data = (; t, v, T, F, B, F_d, F_mag, psi) keys(data) #= -Now, let's format it for input to the regressor: +Now, let's format the input variables for input to the regressor: =# X = (; t=data.t, @@ -89,7 +154,7 @@ X = (; v_y=[vi[2] for vi in data.v], v_z=[vi[3] for vi in data.v], T=data.T, - E=data.E, + psi=data.psi, ) keys(X) @@ -101,9 +166,9 @@ struct Force{T} x::T y::T z::T - E::T + psi::T end -y = [Force(F..., E) for (F, E) in zip(data.F, data.E)] +y = [Force(F..., psi) for (F, psi) in zip(data.F, data.psi)] y[1:3] #= @@ -119,11 +184,11 @@ expression, numerically evaluate them, and print them. First, let's just make a function that prints the expression: =# function combine_strings(e) - # e is a named tuple of strings representing each formula + ## e is a named tuple of strings representing each formula B_x_padded = e.B_x B_y_padded = e.B_y B_z_padded = e.B_z - return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n │ 𝐅 = ($(e.F_d_scale)) * 𝐯\n ╰ E = $(e.E)" + return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n │ 𝐅 = ($(e.F_d_scale)) * 𝐯\n ╰ Ψ = $(e.psi)" end #= @@ -134,25 +199,25 @@ evaluated in the TemplateExpression, and combines them into the resultant force vector. Inside this function, we can do whatever we want. =# function combine_vectors(e, X) - # This time, e is a named tuple of *vectors*, representing the batched - # evaluation of each formula. + ## This time, e is a named tuple of *vectors*, representing the batched + ## evaluation of each formula. - # First, extract the 3D velocity vectors from the input matrix: + ## First, extract the 3D velocity vectors from the input matrix: v = [(X[2, i], X[3, i], X[4, i]) for i in eachindex(axes(X, 2))] - # Use this to compute the full drag force: + ## Use this to compute the full drag force: F_d = [F_d_scale_i .* vi for (F_d_scale_i, vi) in zip(e.F_d_scale, v)] - # Collect the magnetic field components that we've learned into the vector: + ## Collect the magnetic field components that we've learned into the vector: B = [(bx, by, bz) for (bx, by, bz) in zip(e.B_x, e.B_y, e.B_z)] - # Using this, we compute the magnetic force with a cross product: + ## Using this, we compute the magnetic force with a cross product: F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] - E = e.E + psi = e.psi - # Finally, we combine the drag and magnetic forces into the total force: - return [ForceVector((fd .+ fm)..., ei) for (fd, fm, ei) in zip(F_d, F_mag, E)] + ## Finally, we combine the drag and magnetic forces into the total force: + return [Force((fd .+ fm)..., ei) for (fd, fm, ei) in zip(F_d, F_mag, psi)] end #= @@ -161,12 +226,12 @@ each of them depends on, explicitly. Let's say B only depends on time, and the drag force scale only depends on temperature (we explicitly multiply the velocity in). =# -variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5], E=[1]) +variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5], psi=[1]) #= Now, we can create our template expression: =# -structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :E)}(; +structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :psi)}(; combine_strings=combine_strings, combine_vectors=combine_vectors, variable_constraints=variable_constraints, @@ -174,16 +239,20 @@ structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :E)}(; #= Let's look at an example of how this would be used -in a TemplateExpression: +in a TemplateExpression, for some guess at the form of +the solution: =# +options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, sqrt, exp)) +## The inner operators are an `DynamicExpressions.OperatorEnum` which is used by `Expression`: +operators = options.operators t = Expression(Node{Float64}(; feature=1); operators, variable_names) T = Expression(Node{Float64}(; feature=5); operators, variable_names) B_x = B_y = B_z = 2.1 * cos(t) F_d_scale = 1.0 * sqrt(T) -E = 2.1 * sin(t) * cos(t) +psi = 2.1 * sin(t) * cos(t) ex = TemplateExpression( - (; B_x, B_y, B_z, F_d_scale, E); + (; B_x, B_y, B_z, F_d_scale, psi); structure, operators, variable_names ) @@ -200,9 +269,9 @@ model = SRRegressor(; maxsize=35, expression_type=TemplateExpression, expression_options=(; structure=structure), - # The elementwise needs to operate directly on each row of `y`: + ## The elementwise needs to operate directly on each row of `y`: elementwise_loss=(F1, F2) -> - (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2 + (F1.E - F2.E)^2, + (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2 + (F1.psi - F2.psi)^2, mutation_weights=MutationWeights(; rotate_tree=0.5), batching=true, batch_size=30, From b587afa13d9521e090a0325a9f21c6874cf6150f Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 21:57:04 +0000 Subject: [PATCH 59/74] feat: declare safe operators to have easy aliases --- src/Operators.jl | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/Operators.jl b/src/Operators.jl index e7b99ea10..f99cc3bed 100644 --- a/src/Operators.jl +++ b/src/Operators.jl @@ -106,6 +106,15 @@ DE.get_op_name(::typeof(safe_log1p)) = "log1p" DE.get_op_name(::typeof(safe_acosh)) = "acosh" DE.get_op_name(::typeof(safe_sqrt)) = "sqrt" +# Expression algebra +DE.declare_operator_alias(::typeof(safe_pow), ::Val{2}) = ^ +DE.declare_operator_alias(::typeof(safe_log), ::Val{1}) = log +DE.declare_operator_alias(::typeof(safe_log2), ::Val{1}) = log2 +DE.declare_operator_alias(::typeof(safe_log10), ::Val{1}) = log10 +DE.declare_operator_alias(::typeof(safe_log1p), ::Val{1}) = log1p +DE.declare_operator_alias(::typeof(safe_acosh), ::Val{1}) = acosh +DE.declare_operator_alias(::typeof(safe_sqrt), ::Val{1}) = sqrt + # Deprecated operations: @deprecate pow(x, y) safe_pow(x, y) @deprecate pow_abs(x, y) safe_pow(x, y) From 5ace09ba959f970cd6967d54dc40a6a01d180c58 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 21:59:37 +0000 Subject: [PATCH 60/74] docs: simplify TemplateExpression example --- examples/template_expression_complex.jl | 33 ++++++------------------- 1 file changed, 8 insertions(+), 25 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index 48a2bab41..13d3baecb 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -129,20 +129,10 @@ We then sum these to get the total force: F = [fd .+ fm for (fd, fm) in zip(F_d, F_mag)] F[1:3] -#= -Just for fun, let's add another variable that we have to predict, -some ``\psi(t)``: -```math -\psi(t) = \sin(3 t) \cos(2 t) -``` -=# -psi = [sin(3 * ti) * cos(2 * ti) for ti in t] -psi[1:3] - #= This forms our dataset! =# -data = (; t, v, T, F, B, F_d, F_mag, psi) +data = (; t, v, T, F, B, F_d, F_mag) keys(data) #= @@ -154,7 +144,6 @@ X = (; v_y=[vi[2] for vi in data.v], v_z=[vi[3] for vi in data.v], T=data.T, - psi=data.psi, ) keys(X) @@ -166,9 +155,8 @@ struct Force{T} x::T y::T z::T - psi::T end -y = [Force(F..., psi) for (F, psi) in zip(data.F, data.psi)] +y = [Force(F...) for F in data.F] y[1:3] #= @@ -188,7 +176,7 @@ function combine_strings(e) B_x_padded = e.B_x B_y_padded = e.B_y B_z_padded = e.B_z - return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n │ 𝐅 = ($(e.F_d_scale)) * 𝐯\n ╰ Ψ = $(e.psi)" + return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" end #= @@ -214,10 +202,8 @@ function combine_vectors(e, X) ## Using this, we compute the magnetic force with a cross product: F_mag = [cross(vi, Bi) for (vi, Bi) in zip(v, B)] - psi = e.psi - ## Finally, we combine the drag and magnetic forces into the total force: - return [Force((fd .+ fm)..., ei) for (fd, fm, ei) in zip(F_d, F_mag, psi)] + return [Force((fd .+ fm)...) for (fd, fm) in zip(F_d, F_mag)] end #= @@ -226,12 +212,12 @@ each of them depends on, explicitly. Let's say B only depends on time, and the drag force scale only depends on temperature (we explicitly multiply the velocity in). =# -variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5], psi=[1]) +variable_constraints = (; B_x=[1], B_y=[1], B_z=[1], F_d_scale=[5]) #= Now, we can create our template expression: =# -structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale, :psi)}(; +structure = TemplateStructure{(:B_x, :B_y, :B_z, :F_d_scale)}(; combine_strings=combine_strings, combine_vectors=combine_vectors, variable_constraints=variable_constraints, @@ -249,10 +235,9 @@ t = Expression(Node{Float64}(; feature=1); operators, variable_names) T = Expression(Node{Float64}(; feature=5); operators, variable_names) B_x = B_y = B_z = 2.1 * cos(t) F_d_scale = 1.0 * sqrt(T) -psi = 2.1 * sin(t) * cos(t) ex = TemplateExpression( - (; B_x, B_y, B_z, F_d_scale, psi); + (; B_x, B_y, B_z, F_d_scale); structure, operators, variable_names ) @@ -270,9 +255,7 @@ model = SRRegressor(; expression_type=TemplateExpression, expression_options=(; structure=structure), ## The elementwise needs to operate directly on each row of `y`: - elementwise_loss=(F1, F2) -> - (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2 + (F1.psi - F2.psi)^2, - mutation_weights=MutationWeights(; rotate_tree=0.5), + elementwise_loss=(F1, F2) -> (F1.x - F2.x)^2 + (F1.y - F2.y)^2 + (F1.z - F2.z)^2, batching=true, batch_size=30, ); From 142f721629e3af1056e823d7312866e651969b97 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 22:12:14 +0000 Subject: [PATCH 61/74] docs: improve TemplateExpression example --- examples/template_expression_complex.jl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index 13d3baecb..4061b768c 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -176,7 +176,7 @@ function combine_strings(e) B_x_padded = e.B_x B_y_padded = e.B_y B_z_padded = e.B_z - return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" + return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" end #= @@ -228,7 +228,7 @@ Let's look at an example of how this would be used in a TemplateExpression, for some guess at the form of the solution: =# -options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, sqrt, exp)) +options = Options(; binary_operators=(+, *, /, -, ^), unary_operators=(sin, cos, sqrt, exp)) ## The inner operators are an `DynamicExpressions.OperatorEnum` which is used by `Expression`: operators = options.operators t = Expression(Node{Float64}(; feature=1); operators, variable_names) From f3be7066f4b561ed40adf20aaf48b5919c6c692b Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 22:15:37 +0000 Subject: [PATCH 62/74] docs: tweak readme --- docs/src/examples.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/examples.md b/docs/src/examples.md index 38b393600..106d526a3 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -439,4 +439,4 @@ The above code demonstrates how template expressions can be used to: - Constrains which variables can be used in each component - Create expressions that can output multiple values -You can even output custom structs - see `examples/template_expression_complex.jl` +You can even output custom structs - see the more detailed Template Expression example! From acd3fc555421d3b41ba25c7047dfab3908de23b3 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 22:22:51 +0000 Subject: [PATCH 63/74] docs: tweak temperature dependency --- examples/template_expression_complex.jl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index 4061b768c..a990ee6eb 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -107,13 +107,13 @@ We assume the drag force is linear in the velocity and depends on the temperature with a power law: ```math -\mathbf{F}_\text{drag} = -\alpha T^{3/2} \mathbf{v} +\mathbf{F}_\text{drag} = -\alpha T^{1/2} \mathbf{v} \quad \text{where} \quad \alpha = 10^{-5} ``` This creates a temperature-dependent damping effect: =# -F_d = [-1e-5 * Ti^(3//2) .* vi for (Ti, vi) in zip(T, v)] +F_d = [-1e-5 * Ti^(1//2) .* vi for (Ti, vi) in zip(T, v)] F_d[1:3] #= @@ -228,7 +228,7 @@ Let's look at an example of how this would be used in a TemplateExpression, for some guess at the form of the solution: =# -options = Options(; binary_operators=(+, *, /, -, ^), unary_operators=(sin, cos, sqrt, exp)) +options = Options(; binary_operators=(+, *, /, -), unary_operators=(sin, cos, sqrt, exp)) ## The inner operators are an `DynamicExpressions.OperatorEnum` which is used by `Expression`: operators = options.operators t = Expression(Node{Float64}(; feature=1); operators, variable_names) From e126d4312f1717f7f8c6a3caef54ea68bbdae21c Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Sun, 27 Oct 2024 22:32:50 +0000 Subject: [PATCH 64/74] docs: simplify TemplateExpression example --- docs/src/index_base.md | 2 +- examples/template_expression_complex.jl | 5 +---- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/docs/src/index_base.md b/docs/src/index_base.md index 57a3c4e72..058f08767 100644 --- a/docs/src/index_base.md +++ b/docs/src/index_base.md @@ -1,5 +1,5 @@ # Contents ```@contents -Pages = ["examples.md", "api.md", "types.md", "losses.md"] +Pages = ["examples.md", "examples/template_expression.md", "api.md", "types.md", "losses.md"] ``` diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index a990ee6eb..ba315726c 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -173,10 +173,7 @@ First, let's just make a function that prints the expression: =# function combine_strings(e) ## e is a named tuple of strings representing each formula - B_x_padded = e.B_x - B_y_padded = e.B_y - B_z_padded = e.B_z - return " ╭ 𝐁 = [ $(B_x_padded) , $(B_y_padded) , $(B_z_padded) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" + return " ╭ 𝐁 = [ $(e.B_x) , $(e.B_y) , $(e.B_z) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" end #= From 76d1da0d5a64a1c45584a3c267dfc389af7602c0 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 28 Oct 2024 10:23:25 +0000 Subject: [PATCH 65/74] docs: show raw source code at bottom --- docs/make.jl | 1 - docs/utils.jl | 26 +++++++++++++++++++++++++- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/docs/make.jl b/docs/make.jl index c169b552d..336f2fcb0 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -16,7 +16,6 @@ using SymbolicRegression: AbstractSearchState, @extend_operators using DynamicExpressions -using Literate: markdown include("utils.jl") process_literate_blocks("test") diff --git a/docs/utils.jl b/docs/utils.jl index b1ea4247c..bcb9b3519 100644 --- a/docs/utils.jl +++ b/docs/utils.jl @@ -1,3 +1,4 @@ +using Literate: Literate # Function to process literate blocks in test files function process_literate_blocks(base_path="test") @@ -51,7 +52,7 @@ function process_literate_block(output_file, content, source_file) output_dir = joinpath(@__DIR__, "src", "examples") base_name = first(splitext(basename(output_file))) # Remove any existing extension - markdown(temp_file, output_dir; name=base_name, documenter=true) + Literate.markdown(temp_file, output_dir; name=base_name, documenter=true) # Generate the relative path for EditURL edit_path = relpath(source_file, output_dir) @@ -63,6 +64,29 @@ function process_literate_block(output_file, content, source_file) # Replace the existing EditURL with the correct one new_content = replace(md_content, r"EditURL = .*" => "EditURL = \"$edit_path\"") + # Add a codeblock at the end with the raw julia source + new_content = replace( + new_content, + r"\*This page was generated using \[Literate\.jl\]\(https://github\.com/fredrikekre/Literate\.jl\)\.\*" => """ + + ```@raw html +
+ Show raw source code + ``` + + ```julia + $(replace(content, r"```" => "\\```")) + ``` + + which uses Literate.jl to generate this page. + + ```@raw html +
+ ``` + + """, + ) + # Write the updated content back to the file write(md_file, new_content) From 147edb9ff911d34cd2e796ef6d879ba48c7132c5 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 28 Oct 2024 10:41:02 +0000 Subject: [PATCH 66/74] deps: constraint DE version --- Project.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Project.toml b/Project.toml index d965e30b0..dba0fbe0d 100644 --- a/Project.toml +++ b/Project.toml @@ -46,7 +46,7 @@ Dates = "1" DifferentiationInterface = "0.5, 0.6" DispatchDoctor = "^0.4.17" Distributed = "<0.0.1, 1" -DynamicExpressions = "1" +DynamicExpressions = "1.4" DynamicQuantities = "1" Enzyme = "0.12" JSON3 = "1" From 8a4ce91a7807875ab488463665f1e3caee63d6e2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 28 Oct 2024 21:48:35 +0000 Subject: [PATCH 67/74] feat: color outputs of TemplateExpression --- Project.toml | 2 ++ examples/template_expression_complex.jl | 5 ++++- src/HallOfFame.jl | 12 +++++++----- src/TemplateExpression.jl | 8 +++++++- src/Utils.jl | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 7 deletions(-) diff --git a/Project.toml b/Project.toml index dba0fbe0d..db21d45ac 100644 --- a/Project.toml +++ b/Project.toml @@ -26,6 +26,7 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" Reexport = "189a3867-3050-52da-a836-e630ba90ab69" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b" StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" +StyledStrings = "f489334b-da3d-4c2e-b8f0-e476e12c162b" TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76" [weakdeps] @@ -63,6 +64,7 @@ Random = "<0.0.1, 1" Reexport = "1" SpecialFunctions = "0.10.1, 1, 2" StatsBase = "0.33, 0.34" +StyledStrings = "1" SymbolicUtils = "0.19, ^1.0.5, 2, 3" TOML = "<0.0.1, 1" julia = "1.10" diff --git a/examples/template_expression_complex.jl b/examples/template_expression_complex.jl index ba315726c..b3794e823 100644 --- a/examples/template_expression_complex.jl +++ b/examples/template_expression_complex.jl @@ -173,7 +173,8 @@ First, let's just make a function that prints the expression: =# function combine_strings(e) ## e is a named tuple of strings representing each formula - return " ╭ 𝐁 = [ $(e.B_x) , $(e.B_y) , $(e.B_z) ]\n ╰ 𝐅 = ($(e.F_d_scale)) * 𝐯" + return " ╭ 𝐁 = [ " * e.B_x * " , " * e.B_y * " , " * e.B_z * " ]\n ╰ 𝐅 = (" * e.F_d_scale * ") * 𝐯" + ## (Note that string interpolation will erase the colors, so use `*` instead) end #= @@ -284,3 +285,5 @@ of `TemplateExpression` objects that dominated the Pareto front. =# #literate_end #! format: on + +fit!(mach) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index b99ac03a2..859a01866 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -1,7 +1,7 @@ module HallOfFameModule using DynamicExpressions: AbstractExpression, string_tree -using ..UtilsModule: split_string +using ..UtilsModule: split_string, AnnotatedIOBuffer, dump_buffer using ..CoreModule: MAX_DEGREE, AbstractOptions, Dataset, DATA_TYPE, LOSS_TYPE, relu, create_expression using ..ComplexityModule: compute_complexity @@ -123,7 +123,8 @@ function string_dominating_pareto_curve( hallOfFame, dataset, options; width::Union{Integer,Nothing}=nothing ) terminal_width = (width === nothing) ? 100 : max(100, width::Integer) - buffer = IOBuffer() + _buffer = IOBuffer() + buffer = AnnotatedIOBuffer(_buffer) println(buffer, "Hall of Fame:") println(buffer, '-'^(terminal_width - 1)) print( @@ -161,13 +162,14 @@ function string_dominating_pareto_curve( ) end print(buffer, '-'^(terminal_width - 1)) - return String(take!(buffer)) + return dump_buffer(buffer) end function wrap_equation_string(eqn_string, left_cols_width, terminal_width) dots = "..." equation_width = (terminal_width - 1) - left_cols_width - length(dots) - buffer = IOBuffer() + _buffer = IOBuffer() + buffer = AnnotatedIOBuffer(_buffer) forced_split_eqn = split(eqn_string, '\n') print_pad = false @@ -187,7 +189,7 @@ function wrap_equation_string(eqn_string, left_cols_width, terminal_width) print_pad = true end end - return String(take!(buffer)) + return dump_buffer(buffer) end function format_hall_of_fame(hof::HallOfFame{T,L}, options) where {T,L} diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index ed9e60bf3..189290e79 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -3,6 +3,7 @@ module TemplateExpressionModule using Random: AbstractRNG using Compat: Fix using DispatchDoctor: @unstable +using StyledStrings: @styled_str using DynamicExpressions: DynamicExpressions as DE, AbstractStructuredExpression, @@ -337,16 +338,21 @@ function ComplexityModule.compute_complexity( ) end +_color_string(s::AbstractString, c::Symbol) = styled"{$c:$s}" function DE.string_tree( tree::TemplateExpression, operators::Union{AbstractOperatorEnum,Nothing}=nothing; kws... ) raw_contents = get_contents(tree) if can_combine_strings(tree) function_keys = keys(raw_contents) + colors = Base.Iterators.cycle((:magenta, :green, :red, :blue, :yellow, :cyan)) inner_strings = NamedTuple{function_keys}( map(ex -> DE.string_tree(ex, operators; kws...), values(raw_contents)) ) - return combine_strings(tree, inner_strings) + colored_strings = NamedTuple{function_keys}( + map(_color_string, inner_strings, colors) + ) + return combine_strings(tree, colored_strings) else @assert can_combine(tree) return DE.string_tree(combine(tree, raw_contents), operators; kws...) diff --git a/src/Utils.jl b/src/Utils.jl index 64058fc9d..06935c4d7 100644 --- a/src/Utils.jl +++ b/src/Utils.jl @@ -3,6 +3,7 @@ module UtilsModule using Printf: @printf using MacroTools: splitdef +using StyledStrings: StyledStrings macro ignore(args...) end @@ -267,4 +268,21 @@ function safe_call(f::F, x::T, default::D) where {F,T<:Tuple,D} return output end +@static if VERSION >= v"1.11.0-" + @eval begin + const AnnotatedIOBuffer = Base.AnnotatedIOBuffer + const AnnotatedString = Base.AnnotatedString + end +else + @eval begin + const AnnotatedIOBuffer = StyledStrings.AnnotatedStrings.AnnotatedIOBuffer + const AnnotatedString = StyledStrings.AnnotatedStrings.AnnotatedString + end +end + +dump_buffer(buffer::IOBuffer) = String(take!(buffer)) +function dump_buffer(buffer::AnnotatedIOBuffer) + return AnnotatedString(dump_buffer(buffer.io), buffer.annotations) +end + end From f87e7cc269892503e2583e2b1100c8470f82afa4 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Mon, 28 Oct 2024 22:19:42 +0000 Subject: [PATCH 68/74] refactor: extra annotations within TemplateExpression --- src/TemplateExpression.jl | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/TemplateExpression.jl b/src/TemplateExpression.jl index 189290e79..586589fab 100644 --- a/src/TemplateExpression.jl +++ b/src/TemplateExpression.jl @@ -123,7 +123,7 @@ end # single callable function. function combine(template::TemplateStructure, nt::NamedTuple) - return (template.combine::Function)(nt) + return (template.combine::Function)(nt)::AbstractExpression end function combine_vectors( template::TemplateStructure, nt::NamedTuple, X::Union{AbstractMatrix,Nothing}=nothing @@ -131,13 +131,13 @@ function combine_vectors( combiner = template.combine_vectors::Function if X !== nothing && hasmethod(combiner, typeof((nt, X))) # TODO: Refactor this - return combiner(nt, X) + return combiner(nt, X)::AbstractVector else - return combiner(nt) + return combiner(nt)::AbstractVector end end function combine_strings(template::TemplateStructure, nt::NamedTuple) - return (template.combine_strings::Function)(nt) + return (template.combine_strings::Function)(nt)::AbstractString end function (template::TemplateStructure)( From 5af1357dfcc83ec95cde00ae0d7d01af016cade8 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Tue, 29 Oct 2024 18:42:26 +0000 Subject: [PATCH 69/74] feat: stylize printout --- src/HallOfFame.jl | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/src/HallOfFame.jl b/src/HallOfFame.jl index 859a01866..d09990ad7 100644 --- a/src/HallOfFame.jl +++ b/src/HallOfFame.jl @@ -1,5 +1,6 @@ module HallOfFameModule +using StyledStrings: @styled_str using DynamicExpressions: AbstractExpression, string_tree using ..UtilsModule: split_string, AnnotatedIOBuffer, dump_buffer using ..CoreModule: @@ -119,18 +120,26 @@ function calculate_pareto_frontier(hallOfFame::HallOfFame{T,L,N}) where {T,L,N} return dominating end +const HEADER = let + join( + ( + rpad(styled"{bold:{underline:Complexity}}", 10), + rpad(styled"{bold:{underline:Loss}}", 9), + rpad(styled"{bold:{underline:Score}}", 9), + styled"{bold:{underline:Equation}}", + ), + " ", + ) +end + function string_dominating_pareto_curve( hallOfFame, dataset, options; width::Union{Integer,Nothing}=nothing ) terminal_width = (width === nothing) ? 100 : max(100, width::Integer) _buffer = IOBuffer() buffer = AnnotatedIOBuffer(_buffer) - println(buffer, "Hall of Fame:") - println(buffer, '-'^(terminal_width - 1)) - print( - buffer, - @sprintf("%-10s %-8s %-8s %-8s\n", "Complexity", "Loss", "Score", "Equation") - ) + println(buffer, '─'^(terminal_width - 1)) + println(buffer, HEADER) formatted = format_hall_of_fame(hallOfFame, options) for (tree, score, loss, complexity) in @@ -161,7 +170,7 @@ function string_dominating_pareto_curve( ), ) end - print(buffer, '-'^(terminal_width - 1)) + print(buffer, '─'^(terminal_width - 1)) return dump_buffer(buffer) end From 38cbe0981eb4431c09720f3a65f6a771192cbc4a Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Tue, 29 Oct 2024 22:10:04 +0000 Subject: [PATCH 70/74] feat: switch from ProgressBars.jl to ProgressMeter.jl --- Project.toml | 4 ++-- src/ProgressBars.jl | 48 +++++++++++++++++++++++---------------- src/SearchUtils.jl | 21 +++++++++-------- src/SymbolicRegression.jl | 2 +- 4 files changed, 44 insertions(+), 31 deletions(-) diff --git a/Project.toml b/Project.toml index db21d45ac..ba95ee1f4 100644 --- a/Project.toml +++ b/Project.toml @@ -21,7 +21,7 @@ Optim = "429524aa-4258-5aef-a3af-852621145aeb" Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a" Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7" -ProgressBars = "49802e3a-d2f1-5c88-81d8-b72133a6f568" +ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca" Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" Reexport = "189a3867-3050-52da-a836-e630ba90ab69" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b" @@ -59,7 +59,7 @@ Optim = "~1.8, ~1.9" Pkg = "<0.0.1, 1" PrecompileTools = "1" Printf = "<0.0.1, 1" -ProgressBars = "~1.4, ~1.5" +ProgressMeter = "1.10" Random = "<0.0.1, 1" Reexport = "1" SpecialFunctions = "0.10.1, 1, 2" diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl index 551bac013..ed2e97e90 100644 --- a/src/ProgressBars.jl +++ b/src/ProgressBars.jl @@ -1,38 +1,48 @@ module ProgressBarsModule -using ProgressBars: ProgressBar, set_multiline_postfix +using Compat: Fix +using ProgressMeter: Progress, next! +using StyledStrings: @styled_str +using ..UtilsModule: AnnotatedString # Simple wrapper for a progress bar which stores its own state mutable struct WrappedProgressBar - bar::ProgressBar - state::Union{Int,Nothing} - cycle::Union{Int,Nothing} + bar::Progress + postfix::Vector{Tuple{AnnotatedString,AnnotatedString}} - function WrappedProgressBar(args...; kwargs...) - if haskey(ENV, "SYMBOLIC_REGRESSION_TEST") && - ENV["SYMBOLIC_REGRESSION_TEST"] == "true" - output_stream = devnull - return new(ProgressBar(args...; output_stream, kwargs...), nothing, nothing) + function WrappedProgressBar(n::Integer, niterations::Integer; kwargs...) + init_vector = Tuple{AnnotatedString,AnnotatedString}[] + kwargs = (; kwargs..., desc="Evolving for $niterations iterations...") + if get(ENV, "SYMBOLIC_REGRESSION_TEST", "false") == "true" + # For testing, create a progress bar that writes to devnull + output = devnull + return new(Progress(n; output, kwargs...), init_vector) end - return new(ProgressBar(args...; kwargs...), nothing, nothing) + return new(Progress(n; kwargs...), init_vector) end end -precompile(Tuple{typeof(Base.setproperty!),WrappedProgressBar,Symbol,Int64}) +function barlen(pbar::WrappedProgressBar) + return @something(pbar.bar.barlen, displaysize(stdout)[2]) +end """Iterate a progress bar without needing to store cycle/state externally.""" function manually_iterate!(pbar::WrappedProgressBar) - cur_cycle = pbar.cycle - if cur_cycle === nothing - pbar.cycle, pbar.state = iterate(pbar.bar) - else - pbar.cycle, pbar.state = iterate(pbar.bar, pbar.state) - end + width = barlen(pbar) + postfix = map(Fix{2}(format_for_meter, width), pbar.postfix) + next!(pbar.bar; showvalues=postfix, valuecolor=:none) return nothing end -function set_multiline_postfix!(t::WrappedProgressBar, postfix::AbstractString) - return set_multiline_postfix(t.bar, postfix) +function format_for_meter((k, s), width::Integer) + new_s = if occursin('\n', s) + pieces = [rpad(line, width) for line in split(s, '\n')] + left_margin = length(" $(string(k)): ") + ' '^(width - left_margin) * join(pieces) + else + s + end + return (k, new_s) end end diff --git a/src/SearchUtils.jl b/src/SearchUtils.jl index e11c97112..ed433df65 100644 --- a/src/SearchUtils.jl +++ b/src/SearchUtils.jl @@ -7,6 +7,7 @@ using Printf: @printf, @sprintf using Dates: Dates using Distributed: Distributed, @spawnat, Future, procs, addprocs using StatsBase: mean +using StyledStrings: @styled_str using DispatchDoctor: @unstable using Compat: Fix @@ -17,7 +18,7 @@ using ..ComplexityModule: compute_complexity using ..PopulationModule: Population using ..PopMemberModule: PopMember using ..HallOfFameModule: HallOfFame, string_dominating_pareto_curve -using ..ProgressBarsModule: WrappedProgressBar, set_multiline_postfix!, manually_iterate! +using ..ProgressBarsModule: WrappedProgressBar, manually_iterate!, barlen using ..AdaptiveParsimonyModule: RunningSearchStatistics """ @@ -419,23 +420,25 @@ function update_progress_bar!( head_node_occupation::Float64, parallelism=:serial, ) where {T,L} - equation_strings = string_dominating_pareto_curve( - hall_of_fame, dataset, options; width=progress_bar.bar.width - ) # TODO - include command about "q" here. load_string = if length(equation_speed) > 0 average_speed = sum(equation_speed) / length(equation_speed) @sprintf( - "Expressions evaluated per second: %-5.2e. ", + "Full dataset evaluations per second: %-5.2e. ", round(average_speed, sigdigits=3) ) else - @sprintf("Expressions evaluated per second: [.....]. ") + @sprintf("Full dataset evaluations per second: [.....]. ") end load_string *= get_load_string(; head_node_occupation, parallelism) - load_string *= @sprintf("Press 'q' and then to stop execution early.\n") - equation_strings = load_string * equation_strings - set_multiline_postfix!(progress_bar, equation_strings) + load_string *= @sprintf("Press 'q' and then to stop execution early.") + equation_strings = string_dominating_pareto_curve( + hall_of_fame, dataset, options; width=barlen(progress_bar) + ) + progress_bar.postfix = [ + (styled"{italic:Info}", styled"{italic:$load_string}"), + (styled"{italic:Hall of Fame}", equation_strings), + ] manually_iterate!(progress_bar) return nothing end diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index c15996129..f4fe3f70d 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -788,7 +788,7 @@ function _main_search_loop!( #TODO: need to iterate this on the max cycles remaining! sum_cycle_remaining = sum(state.cycles_remaining) progress_bar = WrappedProgressBar( - 1:sum_cycle_remaining; width=options.terminal_width + sum_cycle_remaining, ropt.niterations; barlen=options.terminal_width ) end last_print_time = time() From ede3f7845effb84c1de3251af300c58823e7b6fe Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Tue, 29 Oct 2024 22:13:35 +0000 Subject: [PATCH 71/74] fix: type instability in `barlen` --- src/ProgressBars.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl index ed2e97e90..c0755b13d 100644 --- a/src/ProgressBars.jl +++ b/src/ProgressBars.jl @@ -22,7 +22,7 @@ mutable struct WrappedProgressBar end end -function barlen(pbar::WrappedProgressBar) +function barlen(pbar::WrappedProgressBar)::Int return @something(pbar.bar.barlen, displaysize(stdout)[2]) end From 80dce0d75b671ce327b1b230e5327fdcc508f5e2 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Tue, 29 Oct 2024 22:17:19 +0000 Subject: [PATCH 72/74] refactor: modularize progress bars --- src/ProgressBars.jl | 10 +++++++--- src/SymbolicRegression.jl | 8 +++++--- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl index c0755b13d..f11955937 100644 --- a/src/ProgressBars.jl +++ b/src/ProgressBars.jl @@ -26,7 +26,7 @@ function barlen(pbar::WrappedProgressBar)::Int return @something(pbar.bar.barlen, displaysize(stdout)[2]) end -"""Iterate a progress bar without needing to store cycle/state externally.""" +"""Iterate a progress bar.""" function manually_iterate!(pbar::WrappedProgressBar) width = barlen(pbar) postfix = map(Fix{2}(format_for_meter, width), pbar.postfix) @@ -36,13 +36,17 @@ end function format_for_meter((k, s), width::Integer) new_s = if occursin('\n', s) - pieces = [rpad(line, width) for line in split(s, '\n')] left_margin = length(" $(string(k)): ") - ' '^(width - left_margin) * join(pieces) + left_padding = ' '^(width - left_margin) + left_padding * newlines_to_spaces(s, width) else s end return (k, new_s) end +function newlines_to_spaces(s::AbstractString, width::Integer) + return join([rpad(line, width) for line in split(s, '\n')]) +end + end diff --git a/src/SymbolicRegression.jl b/src/SymbolicRegression.jl index f4fe3f70d..75bdbfa19 100644 --- a/src/SymbolicRegression.jl +++ b/src/SymbolicRegression.jl @@ -784,12 +784,14 @@ function _main_search_loop!( ropt.verbosity > 0 && @info "Started!" nout = length(datasets) start_time = time() - if ropt.progress + progress_bar = if ropt.progress #TODO: need to iterate this on the max cycles remaining! sum_cycle_remaining = sum(state.cycles_remaining) - progress_bar = WrappedProgressBar( + WrappedProgressBar( sum_cycle_remaining, ropt.niterations; barlen=options.terminal_width ) + else + nothing end last_print_time = time() last_speed_recording_time = time() @@ -937,7 +939,7 @@ function _main_search_loop!( options, total_cycles, cycles_remaining=state.cycles_remaining[j] ) move_window!(state.all_running_search_statistics[j]) - if ropt.progress + if progress_bar !== nothing head_node_occupation = estimate_work_fraction(resource_monitor) update_progress_bar!( progress_bar, From e84f6ab9d03aec500d41b62de2dc7ace918b9342 Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Tue, 29 Oct 2024 23:43:28 +0000 Subject: [PATCH 73/74] fix: type instability in annotated string See https://github.com/JuliaLang/StyledStrings.jl/issues/102 --- src/ProgressBars.jl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/ProgressBars.jl b/src/ProgressBars.jl index f11955937..1b6bc402d 100644 --- a/src/ProgressBars.jl +++ b/src/ProgressBars.jl @@ -2,7 +2,7 @@ module ProgressBarsModule using Compat: Fix using ProgressMeter: Progress, next! -using StyledStrings: @styled_str +using StyledStrings: @styled_str, annotatedstring using ..UtilsModule: AnnotatedString # Simple wrapper for a progress bar which stores its own state @@ -38,7 +38,7 @@ function format_for_meter((k, s), width::Integer) new_s = if occursin('\n', s) left_margin = length(" $(string(k)): ") left_padding = ' '^(width - left_margin) - left_padding * newlines_to_spaces(s, width) + annotatedstring(left_padding, newlines_to_spaces(s, width)) else s end @@ -46,7 +46,7 @@ function format_for_meter((k, s), width::Integer) end function newlines_to_spaces(s::AbstractString, width::Integer) - return join([rpad(line, width) for line in split(s, '\n')]) + return join(rpad(line, width) for line in split(s, '\n')) end end From a084404d28c8ecd554f2f636058c772982d6d92c Mon Sep 17 00:00:00 2001 From: MilesCranmer Date: Wed, 30 Oct 2024 02:21:58 +0000 Subject: [PATCH 74/74] docs: update description of TemplateExpression --- CHANGELOG.md | 74 +++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ca1989c04..170c6dfc5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,8 +12,9 @@ Summary of major recent changes, described in more detail below: - This gives us new features, improves user hackability, and greatly improves ergonomics! - [Created "_Template Expressions_", for fitting expressions under a user-specified functional form (`TemplateExpression <: AbstractExpression`)](#created-template-expressions-for-fitting-expressions-under-a-user-specified-functional-form-templateexpression--abstractexpression) - Template expressions are quite flexible: they are a meta-expression that wraps multiple other expressions, and combines them using a user-specified function. - - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! - - (Note that this still does not permit learning using vector operators, though we are working on that!) + - This enables **vector expressions** - in other words, you can learn multiple components of a vector, simultaneously, with a single expression! Or more generally, you can learn expressions onto any Julia struct. + - (Note that this still does not permit learning using non-scalar operators, though we are working on that!) + - Template expressions also make use of colored strings to represent each part in the printout, to improve readability. - [Created "_Parametric Expressions_", for custom functional forms with per-class parameters: (`ParametricExpression <: AbstractExpression`)](#created-parametric-expressions-for-custom-functional-forms-with-per-class-parameters-parametricexpression--abstractexpression) - This lets you fit expressions that act as _models of multiple systems_, with per-system parameters! - [Introduced a variety of new abstractions for user extensibility](#introduced-a-variety-of-new-abstractions-for-user-extensibility) (**and to support new research on symbolic regression!**) @@ -32,6 +33,8 @@ Summary of major recent changes, described in more detail below: - Segmentation faults caused by this are a likely culprit for some crashes reported during multi-day multi-node searches. - Introduced a new test for aliasing throughout the entire search state to prevent this from happening again. - Increased documentation and examples. +- Improved progress bar. +- StyledStrings integration. - Julia 1.10 is now the minimum supported Julia version. - [Other small features](#other-small-features-in-v100) - Also see the [Update Guide](#update-guide) below for more details on upgrading. @@ -83,8 +86,7 @@ This also lets you fit vector expressions using SymbolicRegression.jl, where vec A `TemplateExpression` is constructed by specifying: - A named tuple of sub-expressions (e.g., `(; f=x1 - x2 * x2, g=1.5 * x3)`). -- A structure function that defines how these sub-expressions are combined both numerically and when printing. -- A `variable_mapping` that defines which variables each sub-expression can access. +- A structure function that defines how these sub-expressions are combined in different contexts. For example, you can create a `TemplateExpression` that enforces the constraint: `sin(f(x1, x2)) + g(x3)^2` - where we evolve `f` and `g` simultaneously. @@ -104,32 +106,44 @@ x2 = Expression(Node{Float64}(; feature=2); operators, variable_names) x3 = Expression(Node{Float64}(; feature=3); operators, variable_names) ``` -A `TemplateExpression` is basically a named tuple of expressions, with a structure function that defines how to combine them -in different contexts. -It also has a `variable_mapping` that defines which variables each sub-expression can access. For example: +To build a `TemplateExpression`, we specify the structure using +a `TemplateStructure` object. This class has several fields: + +- `combine`: Optional function taking a `NamedTuple` of function keys => expressions, + returning a single expression. Fallback method used by `get_tree` + on a `TemplateExpression` to generate a single `Expression`. +- `combine_vectors`: Optional function taking a `NamedTuple` of function keys => vectors, + returning a single vector. Used for evaluating the expression tree. + You may optionally define a method with a second argument `X` for if you wish + to include the data matrix `X` (of shape `[num_features, num_rows]`) in the + computation. +- `combine_strings`: Optional function taking a `NamedTuple` of function keys => strings, + returning a single string. Used for printing the expression tree. +- `variable_constraints`: Optional `NamedTuple` that defines which variables each sub-expression is allowed to access. + For example, requesting `f(x1, x2)` and `g(x3)` would be equivalent to `(; f=[1, 2], g=[3])`. + +Let's see an example: ```julia -variable_mapping = (; f=[1, 2], g=[3]) # We have functions f(x1, x2) and g(x3) # Combine f and g them into a single scalar expression: -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) - return @. sin(nt.f) + nt.g * nt.g -end -function my_structure(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}}) - return "sin($(nt.f)) + $(nt.g)^2" # Generates a string representation of the expression -end +structure = TemplateStructure(; + combine_strings=e -> "sin(" * e.f * ") + (" * e.g * ")^2", + combine_vectors=e -> map((f, g) -> sin(f) + g * g, e.f, e.g), + variable_constraints = (; f=[1, 2], g=[3]), # We constrain it to f(x1, x2) and g(x3) +) ``` This defines how the `TemplateExpression` should be evaluated numerically on a given input, and also how it should be represented as a string: ```julia -julia> f_example = x1 - x2 * x2 +julia> f_example = x1 - x2 * x2; # Normal `Expression` object -julia> g_example = 1.5 * x3 # Normal `Expression` object +julia> g_example = 1.5 * x3; julia> # Create TemplateExpression from these sub-expressions: - st_expr = TemplateExpression((; f=f_example, g=g_example); structure=my_structure, operators, variable_names, variable_mapping); + st_expr = TemplateExpression((; f=f_example, g=g_example); structure, operators, variable_names); julia> st_expr # Prints using `my_structure`! sin(x1 - (x2 * x2)) + 1.5 * x3^2 @@ -147,25 +161,18 @@ We can also use this `TemplateExpression` in SymbolicRegression.jl searches! ```julia using SymbolicRegression using MLJBase: machine, fit!, report - ``` -We first define our structure: - -```julia -function my_structure2(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractString}}}) - return "( $(nt.f) + $(nt.g1), $(nt.f) + $(nt.g2) )" -end -function my_structure2(nt::NamedTuple{<:Any,<:Tuple{Vararg{AbstractVector}}}) - return map(i -> (nt.f[i] + nt.g1[i], nt.f[i] + nt.g2[i]), eachindex(nt.f)) -end -``` - -As well as our variable mapping, which says +We first define our structure. +This also has our variable mapping, which says we are fitting `f(x1, x2)`, `g1(x3)`, and `g2(x3)`: ```julia -variable_mapping = (; f=[1, 2], g1=[3], g2=[3]) +structure = TemplateStructure(; + combine_strings=e -> "( " * e.f * " + " * e.g1 * ", " * e.f * " + " * e.g2 * " )", + combine_vectors=e -> map(i -> (e.f[i] + e.g1[i], e.f[i] + e.g2[i]), eachindex(e.f)), + variable_constraints = (; f=[1, 2], g1=[3], g2=[3]), +) ``` Now, our dataset is a regular 2D array of inputs for `X`. @@ -198,9 +205,10 @@ model = SRRegressor(; binary_operators=(+, *), unary_operators=(sin,), maxsize=15, + elementwise_loss=elementwise_loss, expression_type=TemplateExpression, # Note - this is where we pass custom options to the expression type: - expression_options=(; structure=my_structure2, variable_mapping), + expression_options=(; structure), ) mach = machine(model, X, y) @@ -216,6 +224,8 @@ report(mach) We can also check the expression is split up correctly: ```julia +r = report(mach) +idx = r.best_idx best_expr = r.equations[idx] best_f = get_contents(best_expr).f best_g1 = get_contents(best_expr).g1