From aa3994a518504ee4dcb00101f0047ee4b4c41a5b Mon Sep 17 00:00:00 2001
From: "Anthony D. Blaom" <anthony.blaom@gmail.com>
Date: Fri, 2 Aug 2024 11:01:49 +1200
Subject: [PATCH 1/2] update model registry

---
 src/registry/Metadata.toml | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/registry/Metadata.toml b/src/registry/Metadata.toml
index 3ed0464..75b9043 100644
--- a/src/registry/Metadata.toml
+++ b/src/registry/Metadata.toml
@@ -4590,7 +4590,7 @@
 ":supports_weights" = "`false`"
 ":supports_class_weights" = "`false`"
 ":supports_online" = "`false`"
-":docstring" = "```\nPipeline(component1, component2, ... , componentk; options...)\nPipeline(name1=component1, name2=component2, ..., namek=componentk; options...)\ncomponent1 |> component2 |> ... |> componentk\n```\n\nCreate an instance of a composite model type which sequentially composes the specified components in order. This means `component1` receives inputs, whose output is passed to `component2`, and so forth. A \"component\" is either a `Model` instance, a model type (converted immediately to its default instance) or any callable object. Here the \"output\" of a model is what `predict` returns if it is `Supervised`, or what `transform` returns if it is `Unsupervised`.\n\nNames for the component fields are automatically generated unless explicitly specified, as in\n\n```\nPipeline(encoder=ContinuousEncoder(drop_last=false),\n         stand=Standardizer())\n```\n\nThe `Pipeline` constructor accepts keyword `options` discussed further below.\n\nOrdinary functions (and other callables) may be inserted in the pipeline as shown in the following example:\n\n```\nPipeline(X->coerce(X, :age=>Continuous), OneHotEncoder, ConstantClassifier)\n```\n\n### Syntactic sugar\n\nThe `|>` operator is overloaded to construct pipelines out of models, callables, and existing pipelines:\n\n```julia\nLinearRegressor = @load LinearRegressor pkg=MLJLinearModels add=true\nPCA = @load PCA pkg=MultivariateStats add=true\n\npipe1 = MLJBase.table |> ContinuousEncoder |> Standardizer\npipe2 = PCA |> LinearRegressor\npipe1 |> pipe2\n```\n\nAt most one of the components may be a supervised model, but this model can appear in any position. A pipeline with a `Supervised` component is itself `Supervised` and implements the `predict` operation.  It is otherwise `Unsupervised` (possibly `Static`) and implements `transform`.\n\n### Special operations\n\nIf all the `components` are invertible unsupervised models (ie, implement `inverse_transform`) then `inverse_transform` is implemented for the pipeline. If there are no supervised models, then `predict` is nevertheless implemented, assuming the last component is a model that implements it (some clustering models). Similarly, calling `transform` on a supervised pipeline calls `transform` on the supervised component.\n\n### Optional key-word arguments\n\n  * `prediction_type`  - prediction type of the pipeline; possible values: `:deterministic`, `:probabilistic`, `:interval` (default=`:deterministic` if not inferable)\n  * `operation` - operation applied to the supervised component model, when present; possible values: `predict`, `predict_mean`, `predict_median`, `predict_mode` (default=`predict`)\n  * `cache` - whether the internal machines created for component models should cache model-specific representations of data (see [`machine`](@ref)) (default=`true`)\n\n!!! warning\n    Set `cache=false` to guarantee data anonymization.\n\n\nTo build more complicated non-branching pipelines, refer to the MLJ manual sections on composing models.\n"
+":docstring" = "```\nPipeline(component1, component2, ... , componentk; options...)\nPipeline(name1=component1, name2=component2, ..., namek=componentk; options...)\ncomponent1 |> component2 |> ... |> componentk\n```\n\nCreate an instance of a composite model type which sequentially composes the specified components in order. This means `component1` receives inputs, whose output is passed to `component2`, and so forth. A \"component\" is either a `Model` instance, a model type (converted immediately to its default instance) or any callable object. Here the \"output\" of a model is what `predict` returns if it is `Supervised`, or what `transform` returns if it is `Unsupervised`.\n\nNames for the component fields are automatically generated unless explicitly specified, as in\n\n```julia\nPipeline(encoder=ContinuousEncoder(drop_last=false),\n         stand=Standardizer())\n```\n\nThe `Pipeline` constructor accepts keyword `options` discussed further below.\n\nOrdinary functions (and other callables) may be inserted in the pipeline as shown in the following example:\n\n```\nPipeline(X->coerce(X, :age=>Continuous), OneHotEncoder, ConstantClassifier)\n```\n\n### Syntactic sugar\n\nThe `|>` operator is overloaded to construct pipelines out of models, callables, and existing pipelines:\n\n```julia\nLinearRegressor = @load LinearRegressor pkg=MLJLinearModels add=true\nPCA = @load PCA pkg=MultivariateStats add=true\n\npipe1 = MLJBase.table |> ContinuousEncoder |> Standardizer\npipe2 = PCA |> LinearRegressor\npipe1 |> pipe2\n```\n\nAt most one of the components may be a supervised model, but this model can appear in any position. A pipeline with a `Supervised` component is itself `Supervised` and implements the `predict` operation.  It is otherwise `Unsupervised` (possibly `Static`) and implements `transform`.\n\n### Special operations\n\nIf all the `components` are invertible unsupervised models (ie, implement `inverse_transform`) then `inverse_transform` is implemented for the pipeline. If there are no supervised models, then `predict` is nevertheless implemented, assuming the last component is a model that implements it (some clustering models). Similarly, calling `transform` on a supervised pipeline calls `transform` on the supervised component.\n\n### Transformers that need a target in training\n\nSome transformers that have type `Unsupervised` (so that the output of `transform` is propagated in pipelines) may require a target variable for training. An example are so-called target encoders (which transform categorical input features, based on some target observations). Provided they appear before any `Supervised` component in the pipelines, such models are supported. Of course a target must be provided whenever training such a pipeline, whether or not it contains a `Supervised` component.\n\n### Optional key-word arguments\n\n  * `prediction_type`  - prediction type of the pipeline; possible values: `:deterministic`, `:probabilistic`, `:interval` (default=`:deterministic` if not inferable)\n  * `operation` - operation applied to the supervised component model, when present; possible values: `predict`, `predict_mean`, `predict_median`, `predict_mode` (default=`predict`)\n  * `cache` - whether the internal machines created for component models should cache model-specific representations of data (see [`machine`](@ref)) (default=`true`)\n\n!!! warning\n    Set `cache=false` to guarantee data anonymization.\n\n\nTo build more complicated non-branching pipelines, refer to the MLJ manual sections on composing models.\n"
 ":name" = "Pipeline"
 ":human_name" = "static pipeline"
 ":is_supervised" = "`false`"
@@ -5850,7 +5850,7 @@
 ":supports_weights" = "`false`"
 ":supports_class_weights" = "`false`"
 ":supports_online" = "`false`"
-":docstring" = "```\ntuned_model = TunedModel(; model=<model to be mutated>,\n                         tuning=RandomSearch(),\n                         resampling=Holdout(),\n                         range=nothing,\n                         measure=nothing,\n                         n=default_n(tuning, range),\n                         operation=nothing,\n                         other_options...)\n```\n\nConstruct a model wrapper for hyper-parameter optimization of a supervised learner, specifying the `tuning` strategy and `model` whose hyper-parameters are to be mutated.\n\n```\ntuned_model = TunedModel(; models=<models to be compared>,\n                         resampling=Holdout(),\n                         measure=nothing,\n                         n=length(models),\n                         operation=nothing,\n                         other_options...)\n```\n\nConstruct a wrapper for multiple `models`, for selection of an optimal one (equivalent to specifying `tuning=Explicit()` and `range=models` above). Elements of the iterator `models` need not have a common type, but they must all be `Deterministic` or all be `Probabilistic` *and this is not checked* but inferred from the first element generated.\n\nSee below for a complete list of options.\n\n### Training\n\nCalling `fit!(mach)` on a machine `mach=machine(tuned_model, X, y)` or `mach=machine(tuned_model, X, y, w)` will:\n\n  * Instigate a search, over clones of `model`, with the hyperparameter mutations specified by `range`, for a model optimizing the specified `measure`, using performance evaluations carried out using the specified `tuning` strategy and `resampling` strategy. In the case `models` is explictly listed, the search is instead over the models generated by the iterator `models`.\n  * Fit an internal machine, based on the optimal model `fitted_params(mach).best_model`, wrapping the optimal `model` object in *all* the provided data `X`, `y`(, `w`). Calling `predict(mach, Xnew)` then returns predictions on `Xnew` of this internal machine. The final train can be supressed by setting `train_best=false`.\n\n### Search space\n\nThe `range` objects supported depend on the `tuning` strategy specified. Query the `strategy` docstring for details. To optimize over an explicit list `v` of models of the same type, use `strategy=Explicit()` and specify `model=v[1]` and `range=v`.\n\nThe number of models searched is specified by `n`. If unspecified, then `MLJTuning.default_n(tuning, range)` is used. When `n` is increased and `fit!(mach)` called again, the old search history is re-instated and the search continues where it left off.\n\n### Measures (metrics)\n\nIf more than one `measure` is specified, then only the first is optimized (unless `strategy` is multi-objective) but the performance against every measure specified will be computed and reported in `report(mach).best_performance` and other relevant attributes of the generated report. Options exist to pass per-observation weights or class weights to measures; see below.\n\n*Important.* If a custom measure, `my_measure` is used, and the measure is a score, rather than a loss, be sure to check that `MLJ.orientation(my_measure) == :score` to ensure maximization of the measure, rather than minimization. Override an incorrect value with `MLJ.orientation(::typeof(my_measure)) = :score`.\n\n### Accessing the fitted parameters and other training (tuning) outcomes\n\nA Plots.jl plot of performance estimates is returned by `plot(mach)` or `heatmap(mach)`.\n\nOnce a tuning machine `mach` has bee trained as above, then `fitted_params(mach)` has these keys/values:\n\n|                  key |                                   value |\n| --------------------:| ---------------------------------------:|\n|         `best_model` |                  optimal model instance |\n| `best_fitted_params` | learned parameters of the optimal model |\n\nThe named tuple `report(mach)` includes these keys/values:\n\n|                  key |                                                              value |\n| --------------------:| ------------------------------------------------------------------:|\n|         `best_model` |                                             optimal model instance |\n| `best_history_entry` | corresponding entry in the history, including performance estimate |\n|        `best_report` |          report generated by fitting the optimal model to all data |\n|            `history` |                tuning strategy-specific history of all evaluations |\n\nplus other key/value pairs specific to the `tuning` strategy.\n\nEach element of `history` is a property-accessible object with these properties:\n\n|           key |                                                             value |\n| -------------:| -----------------------------------------------------------------:|\n|     `measure` |                                      vector of measures (metrics) |\n| `measurement` |                           vector of measurements, one per measure |\n|    `per_fold` |           vector of vectors of unaggregated per-fold measurements |\n|  `evaluation` | full `PerformanceEvaluation`/`CompactPerformaceEvaluation` object |\n\n### Complete list of key-word options\n\n  * `model`: `Supervised` model prototype that is cloned and mutated to generate models for evaluation\n  * `models`: Alternatively, an iterator of MLJ models to be explicitly evaluated. These may have varying types.\n  * `tuning=RandomSearch()`: tuning strategy to be applied (eg, `Grid()`). See the [Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models) section of the MLJ manual for a complete list of options.\n  * `resampling=Holdout()`: resampling strategy (eg, `Holdout()`, `CV()`), `StratifiedCV()`) to be applied in performance evaluations\n  * `measure`: measure or measures to be applied in performance evaluations; only the first used in optimization (unless the strategy is multi-objective) but all reported to the history\n  * `weights`: per-observation weights to be passed the measure(s) in performance evaluations, where supported. Check support with `supports_weights(measure)`.\n  * `class_weights`: class weights to be passed the measure(s) in performance evaluations, where supported. Check support with `supports_class_weights(measure)`.\n  * `repeats=1`: for generating train/test sets multiple times in resampling (\"Monte Carlo\" resampling); see [`evaluate!`](@ref) for details\n  * `operation`/`operations` - One of `predict`, `predict_mean`, `predict_mode`, `predict_median`, or `predict_joint`, or a vector of these of the same length as `measure`/`measures`. Automatically inferred if left unspecified.\n  * `range`: range object; tuning strategy documentation describes supported types\n  * `selection_heuristic`: the rule determining how the best model is decided. According to the default heuristic, `NaiveSelection()`, `measure` (or the first element of `measure`) is evaluated for each resample and these per-fold measurements are aggregrated. The model with the lowest (resp. highest) aggregate is chosen if the measure is a `:loss` (resp. a `:score`).\n  * `n`: number of iterations (ie, models to be evaluated); set by tuning strategy if left unspecified\n  * `train_best=true`: whether to train the optimal model\n  * `acceleration=default_resource()`: mode of parallelization for tuning strategies that support this\n  * `acceleration_resampling=CPU1()`: mode of parallelization for resampling\n  * `check_measure=true`: whether to check `measure` is compatible with the specified `model` and `operation`)\n  * `cache=true`: whether to cache model-specific representations of user-suplied data; set to `false` to conserve memory. Speed gains likely limited to the case `resampling isa Holdout`.\n  * `compact_history=true`: whether to write `CompactPerformanceEvaluation`](@ref) or regular [`PerformanceEvaluation`](@ref) objects to the history (accessed via the `:evaluation` key); the compact form excludes some fields to conserve memory.\n"
+":docstring" = "```\ntuned_model = TunedModel(; model=<model to be mutated>,\n                         tuning=RandomSearch(),\n                         resampling=Holdout(),\n                         range=nothing,\n                         measure=nothing,\n                         n=default_n(tuning, range),\n                         operation=nothing,\n                         other_options...)\n```\n\nConstruct a model wrapper for hyper-parameter optimization of a supervised learner, specifying the `tuning` strategy and `model` whose hyper-parameters are to be mutated.\n\n```\ntuned_model = TunedModel(; models=<models to be compared>,\n                         resampling=Holdout(),\n                         measure=nothing,\n                         n=length(models),\n                         operation=nothing,\n                         other_options...)\n```\n\nConstruct a wrapper for multiple `models`, for selection of an optimal one (equivalent to specifying `tuning=Explicit()` and `range=models` above). Elements of the iterator `models` need not have a common type, but they must all be `Deterministic` or all be `Probabilistic` *and this is not checked* but inferred from the first element generated.\n\nSee below for a complete list of options.\n\n### Training\n\nCalling `fit!(mach)` on a machine `mach=machine(tuned_model, X, y)` or `mach=machine(tuned_model, X, y, w)` will:\n\n  * Instigate a search, over clones of `model`, with the hyperparameter mutations specified by `range`, for a model optimizing the specified `measure`, using performance evaluations carried out using the specified `tuning` strategy and `resampling` strategy. In the case `models` is explictly listed, the search is instead over the models generated by the iterator `models`.\n  * Fit an internal machine, based on the optimal model `fitted_params(mach).best_model`, wrapping the optimal `model` object in *all* the provided data `X`, `y`(, `w`). Calling `predict(mach, Xnew)` then returns predictions on `Xnew` of this internal machine. The final train can be supressed by setting `train_best=false`.\n\n### Search space\n\nThe `range` objects supported depend on the `tuning` strategy specified. Query the `strategy` docstring for details. To optimize over an explicit list `v` of models of the same type, use `strategy=Explicit()` and specify `model=v[1]` and `range=v`.\n\nThe number of models searched is specified by `n`. If unspecified, then `MLJTuning.default_n(tuning, range)` is used. When `n` is increased and `fit!(mach)` called again, the old search history is re-instated and the search continues where it left off.\n\n### Measures (metrics)\n\nIf more than one `measure` is specified, then only the first is optimized (unless `strategy` is multi-objective) but the performance against every measure specified will be computed and reported in `report(mach).best_performance` and other relevant attributes of the generated report. Options exist to pass per-observation weights or class weights to measures; see below.\n\n*Important.* If a custom measure, `my_measure` is used, and the measure is a score, rather than a loss, be sure to check that `MLJ.orientation(my_measure) == :score` to ensure maximization of the measure, rather than minimization. Override an incorrect value with `MLJ.orientation(::typeof(my_measure)) = :score`.\n\n### Accessing the fitted parameters and other training (tuning) outcomes\n\nA Plots.jl plot of performance estimates is returned by `plot(mach)` or `heatmap(mach)`.\n\nOnce a tuning machine `mach` has bee trained as above, then `fitted_params(mach)` has these keys/values:\n\n|                  key |                                   value |\n| --------------------:| ---------------------------------------:|\n|         `best_model` |                  optimal model instance |\n| `best_fitted_params` | learned parameters of the optimal model |\n\nThe named tuple `report(mach)` includes these keys/values:\n\n|                  key |                                                              value |\n| --------------------:| ------------------------------------------------------------------:|\n|         `best_model` |                                             optimal model instance |\n| `best_history_entry` | corresponding entry in the history, including performance estimate |\n|        `best_report` |          report generated by fitting the optimal model to all data |\n|            `history` |                tuning strategy-specific history of all evaluations |\n\nplus other key/value pairs specific to the `tuning` strategy.\n\nEach element of `history` is a property-accessible object with these properties:\n\n|           key |                                                             value |\n| -------------:| -----------------------------------------------------------------:|\n|     `measure` |                                      vector of measures (metrics) |\n| `measurement` |                           vector of measurements, one per measure |\n|    `per_fold` |           vector of vectors of unaggregated per-fold measurements |\n|  `evaluation` | full `PerformanceEvaluation`/`CompactPerformaceEvaluation` object |\n\n### Complete list of key-word options\n\n  * `model`: `Supervised` model prototype that is cloned and mutated to generate models for evaluation\n  * `models`: Alternatively, an iterator of MLJ models to be explicitly evaluated. These may have varying types.\n  * `tuning=RandomSearch()`: tuning strategy to be applied (eg, `Grid()`). See the [Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models) section of the MLJ manual for a complete list of options.\n  * `resampling=Holdout()`: resampling strategy (eg, `Holdout()`, `CV()`), `StratifiedCV()`) to be applied in performance evaluations\n  * `measure`: measure or measures to be applied in performance evaluations; only the first used in optimization (unless the strategy is multi-objective) but all reported to the history\n  * `weights`: per-observation weights to be passed the measure(s) in performance evaluations, where supported. Check support with `supports_weights(measure)`.\n  * `class_weights`: class weights to be passed the measure(s) in performance evaluations, where supported. Check support with `supports_class_weights(measure)`.\n  * `repeats=1`: for generating train/test sets multiple times in resampling (\"Monte Carlo\" resampling); see [`evaluate!`](@ref) for details\n  * `operation`/`operations` - One of `predict`, `predict_mean`, `predict_mode`, `predict_median`, or `predict_joint`, or a vector of these of the same length as `measure`/`measures`. Automatically inferred if left unspecified.\n  * `range`: range object; tuning strategy documentation describes supported types\n  * `selection_heuristic`: the rule determining how the best model is decided. According to the default heuristic, `NaiveSelection()`, `measure` (or the first element of `measure`) is evaluated for each resample and these per-fold measurements are aggregrated. The model with the lowest (resp. highest) aggregate is chosen if the measure is a `:loss` (resp. a `:score`).\n  * `n`: number of iterations (ie, models to be evaluated); set by tuning strategy if left unspecified\n  * `train_best=true`: whether to train the optimal model\n  * `acceleration=default_resource()`: mode of parallelization for tuning strategies that support this\n  * `acceleration_resampling=CPU1()`: mode of parallelization for resampling\n  * `check_measure=true`: whether to check `measure` is compatible with the specified `model` and `operation`)\n  * `cache=true`: whether to cache model-specific representations of user-suplied data; set to `false` to conserve memory. Speed gains likely limited to the case `resampling isa Holdout`.\n  * `compact_history=true`: whether to write `CompactPerformanceEvaluation`](@ref) or regular [`PerformanceEvaluation`](@ref) objects to the history (accessed via the `:evaluation` key); the compact form excludes some fields to conserve memory.\n  * `logger=default_logger()`: a logger for externally reporting model performance evaluations, such as an `MLJFlow.Logger` instance. On startup, `default_logger()=nothing`; use `default_logger(logger)` to set a global logger.\n"
 ":name" = "TunedModel"
 ":human_name" = "probabilistic tuned model"
 ":is_supervised" = "`true`"
@@ -5918,11 +5918,11 @@
 ":load_path" = "FeatureSelection.RecursiveFeatureElimination"
 ":package_uuid" = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
 ":package_url" = "https://github.com/JuliaAI/FeatureSelection.jl"
-":is_wrapper" = "`false`"
+":is_wrapper" = "`true`"
 ":supports_weights" = "`false`"
 ":supports_class_weights" = "`false`"
 ":supports_online" = "`false`"
-":docstring" = "```\nRecursiveFeatureElimination(model, n_features, step)\n```\n\nThis model implements a recursive feature elimination algorithm for feature selection. It recursively removes features, training a base model on the remaining features and evaluating their importance until the desired number of features is selected.\n\nConstruct an instance with default hyper-parameters using the syntax `rfe_model = RecursiveFeatureElimination(model=...)`. Provide keyword arguments to override hyper-parameter defaults.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `rfe_model` to data with\n\n```\nmach = machine(rfe_model, X, y)\n```\n\nOR, if the base model supports weights, as\n\n```\nmach = machine(rfe_model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of the scitype as that required by the base model; check column scitypes with `schema(X)` and column scitypes required by base model with `input_scitype(basemodel)`.\n  * `y` is the target, which can be any table of responses whose element scitype is   `Continuous` or `Finite` depending on the `target_scitype` required by the base model;   check the scitype with `scitype(y)`.\n  * `w` is the observation weights which can either be `nothing`(default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`. This is different from `weights` kernel which is an hyperparameter to the model, see below.\n\nTrain the machine using `fit!(mach, rows=...)`.\n\n# Hyper-parameters\n\n  * model: A base model with a `fit` method that provides information on feature feature importance (i.e `reports_feature_importances(model) == true`)\n  * n_features::Real = 0: The number of features to select. If `0`, half of the features are selected. If a positive integer, the parameter is the absolute number of features to select. If a real number between 0 and 1, it is the fraction of features to select.\n  * step::Real=1: If the value of step is at least 1, it signifies the quantity of features to eliminate in each iteration. Conversely, if step falls strictly within the range of 0.0 to 1.0, it denotes the proportion (rounded down) of features to remove during each iteration.\n\n# Operations\n\n  * `transform(mach, X)`: transform the input table `X` into a new table containing only\n\ncolumns corresponding to features gotten from the RFE algorithm.\n\n  * `predict(mach, X)`: transform the input table `X` into a new table same as in\n  * `transform(mach, X)` above and predict using the fitted base model on the transformed table.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `features_left`: names of features remaining after recursive feature elimination.\n  * `model_fitresult`: fitted parameters of the base model.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `scores`: dictionary of scores for each feature in the training dataset. The model deems highly scored variables more significant.\n  * `model_report`: report for the fitted base model.\n\n# Examples\n\n```\nusing FeatureSelection, MLJ, StableRNGs\n\nRandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree\n\n# Creates a dataset where the target only depends on the first 5 columns of the input table.\nA = rand(rng, 50, 10);\ny = 10 .* sin.(\n        pi .* A[:, 1] .* A[:, 2]\n    ) + 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]);\nX = MLJ.table(A);\n\n# fit a rfe model\nrf = RandomForestRegressor()\nselector = RecursiveFeatureElimination(model = rf)\nmach = machine(selector, X, y)\nfit!(mach)\n\n# view the feature importances\nfeature_importances(mach)\n\n# predict using the base model\nXnew = MLJ.table(rand(rng, 50, 10));\npredict(mach, Xnew)\n\n```\n"
+":docstring" = "```\nRecursiveFeatureElimination(model; n_features=0, step=1)\n```\n\nThis model implements a recursive feature elimination algorithm for feature selection. It recursively removes features, training a base model on the remaining features and evaluating their importance until the desired number of features is selected.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `rfe_model` to data with\n\n```\nmach = machine(rfe_model, X, y)\n```\n\nOR, if the base model supports weights, as\n\n```\nmach = machine(rfe_model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of the scitype as that required by the base model; check column scitypes with `schema(X)` and column scitypes required by base model with `input_scitype(basemodel)`.\n  * `y` is the target, which can be any table of responses whose element scitype is   `Continuous` or `Finite` depending on the `target_scitype` required by the base model;   check the scitype with `scitype(y)`.\n  * `w` is the observation weights which can either be `nothing`(default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`. This is different from `weights` kernel which is an hyperparameter to the model, see below.\n\nTrain the machine using `fit!(mach, rows=...)`.\n\n# Hyper-parameters\n\n  * model: A base model with a `fit` method that provides information on feature feature importance (i.e `reports_feature_importances(model) == true`)\n  * n_features::Real = 0: The number of features to select. If `0`, half of the features are selected. If a positive integer, the parameter is the absolute number of features to select. If a real number between 0 and 1, it is the fraction of features to select.\n  * step::Real=1: If the value of step is at least 1, it signifies the quantity of features to eliminate in each iteration. Conversely, if step falls strictly within the range of 0.0 to 1.0, it denotes the proportion (rounded down) of features to remove during each iteration.\n\n# Operations\n\n  * `transform(mach, X)`: transform the input table `X` into a new table containing only columns corresponding to features accepted by the RFE algorithm.\n  * `predict(mach, X)`: transform the input table `X` into a new table same as in `transform(mach, X)` above and predict using the fitted base model on the transformed table.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `features_left`: names of features remaining after recursive feature elimination.\n  * `model_fitresult`: fitted parameters of the base model.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `scores`: dictionary of scores for each feature in the training dataset. The model deems highly scored variables more significant.\n  * `model_report`: report for the fitted base model.\n\n# Examples\n\nThe following example assumes you have MLJDecisionTreeInterface in the active package ennvironment.\n\n```\nusing MLJ\n\nRandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree\n\n# Creates a dataset where the target only depends on the first 5 columns of the input table.\nA = rand(50, 10);\ny = 10 .* sin.(\n        pi .* A[:, 1] .* A[:, 2]\n    ) + 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5];\nX = MLJ.table(A);\n\n# fit a rfe model:\nrf = RandomForestRegressor()\nselector = RecursiveFeatureElimination(rf, n_features=2)\nmach = machine(selector, X, y)\nfit!(mach)\n\n# view the feature importances\nfeature_importances(mach)\n\n# predict using the base model trained on the reduced feature set:\nXnew = MLJ.table(rand(50, 10));\npredict(mach, Xnew)\n\n# transform data with all features to the reduced feature set:\ntransform(mach, Xnew)\n```\n"
 ":name" = "RecursiveFeatureElimination"
 ":human_name" = "deterministic recursive feature elimination"
 ":is_supervised" = "`true`"
@@ -6462,16 +6462,16 @@
 ":supports_weights" = "`true`"
 ":supports_class_weights" = "`false`"
 ":supports_online" = "`false`"
-":docstring" = "```\nMultitargetSRRegressor\n```\n\nA model type for constructing a Multi-Target Symbolic Regression via Evolutionary Search, based on [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl), and implementing the MLJ model interface.\n\nFrom MLJ, the type can be imported using\n\n```\nMultitargetSRRegressor = @load MultitargetSRRegressor pkg=SymbolicRegression\n```\n\nDo `model = MultitargetSRRegressor()` to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `MultitargetSRRegressor(binary_operators=...)`.\n\nMulti-target Symbolic Regression regressor (`MultitargetSRRegressor`) conducts several searches for expressions that predict each target variable from a set of input variables. All data is assumed to be `Continuous`. The search is performed using an evolutionary algorithm. This algorithm is described in the paper https://arxiv.org/abs/2305.01582.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `model` to data with\n\n```\nmach = machine(model, X, y)\n```\n\nOR\n\n```\nmach = machine(model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of scitype\n\n`Continuous`; check column scitypes with `schema(X)`. Variable names in discovered expressions will be taken from the column names of `X`, if available. Units in columns of `X` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n\n  * `y` is the target, which can be any table of target variables whose element scitype is `Continuous`; check the scitype with `schema(y)`. Units in columns of `y` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n  * `w` is the observation weights which can either be `nothing` (default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`. The same weights are used for all targets.\n\nTrain the machine using `fit!(mach)`, inspect the discovered expressions with `report(mach)`, and predict on new data with `predict(mach, Xnew)`. Note that unlike other regressors, symbolic regression stores a list of lists of trained models. The models chosen from each of these lists is defined by the function `selection_method` keyword argument, which by default balances accuracy and complexity. You can override this at prediction time by passing a named tuple with keys `data` and `idx`.\n\n# Hyper-parameters\n\n  * `binary_operators`: Vector of binary operators (functions) to use.   Each operator should be defined for two input scalars,   and one output scalar. All operators   need to be defined over the entire real line (excluding infinity - these   are stopped before they are input), or return `NaN` where not defined.   For speed, define it so it takes two reals   of the same type as input, and outputs the same type. For the SymbolicUtils   simplification backend, you will need to define a generic method of the   operator so it takes arbitrary types.\n  * `unary_operators`: Same, but for   unary operators (one input scalar, gives an output scalar).\n  * `constraints`: Array of pairs specifying size constraints   for each operator. The constraints for a binary operator should be a 2-tuple   (e.g., `(-1, -1)`) and the constraints for a unary operator should be an `Int`.   A size constraint is a limit to the size of the subtree   in each argument of an operator. e.g., `[(^)=>(-1, 3)]` means that the   `^` operator can have arbitrary size (`-1`) in its left argument,   but a maximum size of `3` in its right argument. Default is   no constraints.\n  * `batching`: Whether to evolve based on small mini-batches of data,   rather than the entire dataset.\n  * `batch_size`: What batch size to use if using batching.\n  * `elementwise_loss`: What elementwise loss function to use. Can be one of   the following losses, or any other loss of type   `SupervisedLoss`. You can also pass a function that takes   a scalar target (left argument), and scalar predicted (right   argument), and returns a scalar. This will be averaged   over the predicted data. If weights are supplied, your   function should take a third argument for the weight scalar.   Included losses:       Regression:           - `LPDistLoss{P}()`,           - `L1DistLoss()`,           - `L2DistLoss()` (mean square),           - `LogitDistLoss()`,           - `HuberLoss(d)`,           - `L1EpsilonInsLoss(ϵ)`,           - `L2EpsilonInsLoss(ϵ)`,           - `PeriodicLoss(c)`,           - `QuantileLoss(τ)`,       Classification:           - `ZeroOneLoss()`,           - `PerceptronLoss()`,           - `L1HingeLoss()`,           - `SmoothedL1HingeLoss(γ)`,           - `ModifiedHuberLoss()`,           - `L2MarginLoss()`,           - `ExpLoss()`,           - `SigmoidLoss()`,           - `DWDMarginLoss(q)`.\n  * `loss_function`: Alternatively, you may redefine the loss used   as any function of `tree::AbstractExpressionNode{T}`, `dataset::Dataset{T}`,   and `options::Options`, so long as you output a non-negative   scalar of type `T`. This is useful if you want to use a loss   that takes into account derivatives, or correlations across   the dataset. This also means you could use a custom evaluation   for a particular expression. If you are using   `batching=true`, then your function should   accept a fourth argument `idx`, which is either `nothing`   (indicating that the full dataset should be used), or a vector   of indices to use for the batch.   For example,\n\n    ```\n      function my_loss(tree, dataset::Dataset{T,L}, options)::L where {T,L}\n          prediction, flag = eval_tree_array(tree, dataset.X, options)\n          if !flag\n              return L(Inf)\n          end\n          return sum((prediction .- dataset.y) .^ 2) / dataset.n\n      end\n    ```\n  * `node_type::Type{N}=Node`: The type of node to use for the search.   For example, `Node` or `GraphNode`.\n  * `populations`: How many populations of equations to use.\n  * `population_size`: How many equations in each population.\n  * `ncycles_per_iteration`: How many generations to consider per iteration.\n  * `tournament_selection_n`: Number of expressions considered in each tournament.\n  * `tournament_selection_p`: The fittest expression in a tournament is to be   selected with probability `p`, the next fittest with probability `p*(1-p)`,   and so forth.\n  * `topn`: Number of equations to return to the host process, and to   consider for the hall of fame.\n  * `complexity_of_operators`: What complexity should be assigned to each operator,   and the occurrence of a constant or variable. By default, this is 1   for all operators. Can be a real number as well, in which case   the complexity of an expression will be rounded to the nearest integer.   Input this in the form of, e.g., [(^) => 3, sin => 2].\n  * `complexity_of_constants`: What complexity should be assigned to use of a constant.   By default, this is 1.\n  * `complexity_of_variables`: What complexity should be assigned to use of a variable,   which can also be a vector indicating different per-variable complexity.   By default, this is 1.\n  * `alpha`: The probability of accepting an equation mutation   during regularized evolution is given by exp(-delta_loss/(alpha * T)),   where T goes from 1 to 0. Thus, alpha=infinite is the same as no annealing.\n  * `maxsize`: Maximum size of equations during the search.\n  * `maxdepth`: Maximum depth of equations during the search, by default   this is set equal to the maxsize.\n  * `parsimony`: A multiplicative factor for how much complexity is   punished.\n  * `dimensional_constraint_penalty`: An additive factor if the dimensional   constraint is violated.\n  * `dimensionless_constants_only`: Whether to only allow dimensionless   constants.\n  * `use_frequency`: Whether to use a parsimony that adapts to the   relative proportion of equations at each complexity; this will   ensure that there are a balanced number of equations considered   for every complexity.\n  * `use_frequency_in_tournament`: Whether to use the adaptive parsimony described   above inside the score, rather than just at the mutation accept/reject stage.\n  * `adaptive_parsimony_scaling`: How much to scale the adaptive parsimony term   in the loss. Increase this if the search is spending too much time   optimizing the most complex equations.\n  * `turbo`: Whether to use `LoopVectorization.@turbo` to evaluate expressions.   This can be significantly faster, but is only compatible with certain   operators. *Experimental!*\n  * `bumper`: Whether to use Bumper.jl for faster evaluation. *Experimental!*\n  * `migration`: Whether to migrate equations between processes.\n  * `hof_migration`: Whether to migrate equations from the hall of fame   to processes.\n  * `fraction_replaced`: What fraction of each population to replace with   migrated equations at the end of each cycle.\n  * `fraction_replaced_hof`: What fraction to replace with hall of fame   equations at the end of each cycle.\n  * `should_simplify`: Whether to simplify equations. If you   pass a custom objective, this will be set to `false`.\n  * `should_optimize_constants`: Whether to use an optimization algorithm   to periodically optimize constants in equations.\n  * `optimizer_algorithm`: Select algorithm to use for optimizing constants. Default   is `Optim.BFGS(linesearch=LineSearches.BackTracking())`.\n  * `optimizer_nrestarts`: How many different random starting positions to consider   for optimization of constants.\n  * `optimizer_probability`: Probability of performing optimization of constants at   the end of a given iteration.\n  * `optimizer_iterations`: How many optimization iterations to perform. This gets  passed to `Optim.Options` as `iterations`. The default is 8.\n  * `optimizer_f_calls_limit`: How many function calls to allow during optimization.   This gets passed to `Optim.Options` as `f_calls_limit`. The default is   `0` which means no limit.\n  * `optimizer_options`: General options for the constant optimization. For details   we refer to the documentation on `Optim.Options` from the `Optim.jl` package.   Options can be provided here as `NamedTuple`, e.g. `(iterations=16,)`, as a   `Dict`, e.g. Dict(:x_tol => 1.0e-32,), or as an `Optim.Options` instance.\n  * `output_file`: What file to store equations to, as a backup.\n  * `perturbation_factor`: When mutating a constant, either   multiply or divide by (1+perturbation_factor)^(rand()+1).\n  * `probability_negate_constant`: Probability of negating a constant in the equation   when mutating it.\n  * `mutation_weights`: Relative probabilities of the mutations. The struct   `MutationWeights` should be passed to these options.   See its documentation on `MutationWeights` for the different weights.\n  * `crossover_probability`: Probability of performing crossover.\n  * `annealing`: Whether to use simulated annealing.\n  * `warmup_maxsize_by`: Whether to slowly increase the max size from 5 up to   `maxsize`. If nonzero, specifies the fraction through the search   at which the maxsize should be reached.\n  * `verbosity`: Whether to print debugging statements or   not.\n  * `print_precision`: How many digits to print when printing   equations. By default, this is 5.\n  * `save_to_file`: Whether to save equations to a file during the search.\n  * `bin_constraints`: See `constraints`. This is the same, but specified for binary   operators only (for example, if you have an operator that is both a binary   and unary operator).\n  * `una_constraints`: Likewise, for unary operators.\n  * `seed`: What random seed to use. `nothing` uses no seed.\n  * `progress`: Whether to use a progress bar output (`verbosity` will   have no effect).\n  * `early_stop_condition`: Float - whether to stop early if the mean loss gets below this value.   Function - a function taking (loss, complexity) as arguments and returning true or false.\n  * `timeout_in_seconds`: Float64 - the time in seconds after which to exit (as an alternative to the number of iterations).\n  * `max_evals`: Int (or Nothing) - the maximum number of evaluations of expressions to perform.\n  * `skip_mutation_failures`: Whether to simply skip over mutations that fail or are rejected, rather than to replace the mutated   expression with the original expression and proceed normally.\n  * `nested_constraints`: Specifies how many times a combination of operators can be nested. For example,   `[sin => [cos => 0], cos => [cos => 2]]` specifies that `cos` may never appear within a `sin`,   but `sin` can be nested with itself an unlimited number of times. The second term specifies that `cos`   can be nested up to 2 times within a `cos`, so that `cos(cos(cos(x)))` is allowed (as well as any combination   of `+` or `-` within it), but `cos(cos(cos(cos(x))))` is not allowed. When an operator is not specified,   it is assumed that it can be nested an unlimited number of times. This requires that there is no operator   which is used both in the unary operators and the binary operators (e.g., `-` could be both subtract, and negation).   For binary operators, both arguments are treated the same way, and the max of each argument is constrained.\n  * `deterministic`: Use a global counter for the birth time, rather than calls to `time()`. This gives   perfect resolution, and is therefore deterministic. However, it is not thread safe, and must be used   in serial mode.\n  * `define_helper_functions`: Whether to define helper functions   for constructing and evaluating trees.\n  * `niterations::Int=10`: The number of iterations to perform the search.   More iterations will improve the results.\n  * `parallelism=:multithreading`: What parallelism mode to use.   The options are `:multithreading`, `:multiprocessing`, and `:serial`.   By default, multithreading will be used. Multithreading uses less memory,   but multiprocessing can handle multi-node compute. If using `:multithreading`   mode, the number of threads available to julia are used. If using   `:multiprocessing`, `numprocs` processes will be created dynamically if   `procs` is unset. If you have already allocated processes, pass them   to the `procs` argument and they will be used.   You may also pass a string instead of a symbol, like `\"multithreading\"`.\n  * `numprocs::Union{Int, Nothing}=nothing`:  The number of processes to use,   if you want `equation_search` to set this up automatically. By default   this will be `4`, but can be any number (you should pick a number <=   the number of cores available).\n  * `procs::Union{Vector{Int}, Nothing}=nothing`: If you have set up   a distributed run manually with `procs = addprocs()` and `@everywhere`,   pass the `procs` to this keyword argument.\n  * `addprocs_function::Union{Function, Nothing}=nothing`: If using multiprocessing   (`parallelism=:multithreading`), and are not passing `procs` manually,   then they will be allocated dynamically using `addprocs`. However,   you may also pass a custom function to use instead of `addprocs`.   This function should take a single positional argument,   which is the number of processes to use, as well as the `lazy` keyword argument.   For example, if set up on a slurm cluster, you could pass   `addprocs_function = addprocs_slurm`, which will set up slurm processes.\n  * `heap_size_hint_in_bytes::Union{Int,Nothing}=nothing`: On Julia 1.9+, you may set the `--heap-size-hint`   flag on Julia processes, recommending garbage collection once a process   is close to the recommended size. This is important for long-running distributed   jobs where each process has an independent memory, and can help avoid   out-of-memory errors. By default, this is set to `Sys.free_memory() / numprocs`.\n  * `runtests::Bool=true`: Whether to run (quick) tests before starting the   search, to see if there will be any problems during the equation search   related to the host environment.\n  * `loss_type::Type=Nothing`: If you would like to use a different type   for the loss than for the data you passed, specify the type here.   Note that if you pass complex data `::Complex{L}`, then the loss   type will automatically be set to `L`.\n  * `selection_method::Function`: Function to selection expression from   the Pareto frontier for use in `predict`.   See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.   This function should return a single integer specifying   the index of the expression to use. By default, this maximizes   the score (a pound-for-pound rating) of expressions reaching the threshold   of 1.5x the minimum loss. To override this at prediction time, you can pass   a named tuple with keys `data` and `idx` to `predict`. See the Operations   section for details.\n  * `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing   the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.\n\n# Operations\n\n  * `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. The expression used for prediction is defined by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.\n  * `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys `data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier, as determined by the `selection_method` function. Override in `predict` by passing a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). The outer vector is indexed by target variable, and the inner vector is ordered by increasing complexity. `T` is equal to the element type of the passed data.\n  * `equation_strings::Vector{Vector{String}}`: The expressions discovered by the search, represented as strings for easy inspection.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing  a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). The outer vector is indexed by target variable, and the inner vector is ordered by increasing complexity.\n  * `equation_strings::Vector{Vector{String}}`: The expressions discovered by the search, represented as strings for easy inspection.\n  * `complexities::Vector{Vector{Int}}`: The complexity of each expression in each Pareto frontier.\n  * `losses::Vector{Vector{L}}`: The loss of each expression in each Pareto frontier, according to the loss function specified in the model. The type `L` is the loss type, which is usually the same as the element type of data passed (i.e., `T`), but can differ if complex data types are passed.\n  * `scores::Vector{Vector{L}}`: A metric which considers both the complexity and loss of an expression, equal to the change in the log-loss divided by the change in complexity, relative to the previous expression along the Pareto frontier. A larger score aims to indicate an expression is more likely to be the true expression generating the data, but this is very problem-dependent and generally several other factors should be considered.\n\n# Examples\n\n```julia\nusing MLJ\nMultitargetSRRegressor = @load MultitargetSRRegressor pkg=SymbolicRegression\nX = (a=rand(100), b=rand(100), c=rand(100))\nY = (y1=(@. cos(X.c) * 2.1 - 0.9), y2=(@. X.a * X.b + X.c))\nmodel = MultitargetSRRegressor(binary_operators=[+, -, *], unary_operators=[exp], niterations=100)\nmach = machine(model, X, Y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equations used:\nr = report(mach)\nfor (output_index, (eq, i)) in enumerate(zip(r.equation_strings, r.best_idx))\n    println(\"Equation used for \", output_index, \": \", eq[i])\nend\n```\n\nSee also [`SRRegressor`](@ref).\n"
+":docstring" = "```\nMultitargetSRRegressor\n```\n\nA model type for constructing a Multi-Target Symbolic Regression via Evolutionary Search, based on [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl), and implementing the MLJ model interface.\n\nFrom MLJ, the type can be imported using\n\n```\nMultitargetSRRegressor = @load MultitargetSRRegressor pkg=SymbolicRegression\n```\n\nDo `model = MultitargetSRRegressor()` to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `MultitargetSRRegressor(binary_operators=...)`.\n\nMulti-target Symbolic Regression regressor (`MultitargetSRRegressor`) conducts several searches for expressions that predict each target variable from a set of input variables. All data is assumed to be `Continuous`. The search is performed using an evolutionary algorithm. This algorithm is described in the paper https://arxiv.org/abs/2305.01582.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `model` to data with\n\n```\nmach = machine(model, X, y)\n```\n\nOR\n\n```\nmach = machine(model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of scitype\n\n`Continuous`; check column scitypes with `schema(X)`. Variable names in discovered expressions will be taken from the column names of `X`, if available. Units in columns of `X` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n\n  * `y` is the target, which can be any table of target variables whose element scitype is `Continuous`; check the scitype with `schema(y)`. Units in columns of `y` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n  * `w` is the observation weights which can either be `nothing` (default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`. The same weights are used for all targets.\n\nTrain the machine using `fit!(mach)`, inspect the discovered expressions with `report(mach)`, and predict on new data with `predict(mach, Xnew)`. Note that unlike other regressors, symbolic regression stores a list of lists of trained models. The models chosen from each of these lists is defined by the function `selection_method` keyword argument, which by default balances accuracy and complexity. You can override this at prediction time by passing a named tuple with keys `data` and `idx`.\n\n# Hyper-parameters\n\n  * `binary_operators`: Vector of binary operators (functions) to use.   Each operator should be defined for two input scalars,   and one output scalar. All operators   need to be defined over the entire real line (excluding infinity - these   are stopped before they are input), or return `NaN` where not defined.   For speed, define it so it takes two reals   of the same type as input, and outputs the same type. For the SymbolicUtils   simplification backend, you will need to define a generic method of the   operator so it takes arbitrary types.\n  * `unary_operators`: Same, but for   unary operators (one input scalar, gives an output scalar).\n  * `constraints`: Array of pairs specifying size constraints   for each operator. The constraints for a binary operator should be a 2-tuple   (e.g., `(-1, -1)`) and the constraints for a unary operator should be an `Int`.   A size constraint is a limit to the size of the subtree   in each argument of an operator. e.g., `[(^)=>(-1, 3)]` means that the   `^` operator can have arbitrary size (`-1`) in its left argument,   but a maximum size of `3` in its right argument. Default is   no constraints.\n  * `batching`: Whether to evolve based on small mini-batches of data,   rather than the entire dataset.\n  * `batch_size`: What batch size to use if using batching.\n  * `elementwise_loss`: What elementwise loss function to use. Can be one of   the following losses, or any other loss of type   `SupervisedLoss`. You can also pass a function that takes   a scalar target (left argument), and scalar predicted (right   argument), and returns a scalar. This will be averaged   over the predicted data. If weights are supplied, your   function should take a third argument for the weight scalar.   Included losses:       Regression:           - `LPDistLoss{P}()`,           - `L1DistLoss()`,           - `L2DistLoss()` (mean square),           - `LogitDistLoss()`,           - `HuberLoss(d)`,           - `L1EpsilonInsLoss(ϵ)`,           - `L2EpsilonInsLoss(ϵ)`,           - `PeriodicLoss(c)`,           - `QuantileLoss(τ)`,       Classification:           - `ZeroOneLoss()`,           - `PerceptronLoss()`,           - `L1HingeLoss()`,           - `SmoothedL1HingeLoss(γ)`,           - `ModifiedHuberLoss()`,           - `L2MarginLoss()`,           - `ExpLoss()`,           - `SigmoidLoss()`,           - `DWDMarginLoss(q)`.\n  * `loss_function`: Alternatively, you may redefine the loss used   as any function of `tree::Node{T}`, `dataset::Dataset{T}`,   and `options::Options`, so long as you output a non-negative   scalar of type `T`. This is useful if you want to use a loss   that takes into account derivatives, or correlations across   the dataset. This also means you could use a custom evaluation   for a particular expression. If you are using   `batching=true`, then your function should   accept a fourth argument `idx`, which is either `nothing`   (indicating that the full dataset should be used), or a vector   of indices to use for the batch.   For example,\n\n    ```\n      function my_loss(tree, dataset::Dataset{T,L}, options)::L where {T,L}\n          prediction, flag = eval_tree_array(tree, dataset.X, options)\n          if !flag\n              return L(Inf)\n          end\n          return sum((prediction .- dataset.y) .^ 2) / dataset.n\n      end\n    ```\n  * `populations`: How many populations of equations to use.\n  * `population_size`: How many equations in each population.\n  * `ncycles_per_iteration`: How many generations to consider per iteration.\n  * `tournament_selection_n`: Number of expressions considered in each tournament.\n  * `tournament_selection_p`: The fittest expression in a tournament is to be   selected with probability `p`, the next fittest with probability `p*(1-p)`,   and so forth.\n  * `topn`: Number of equations to return to the host process, and to   consider for the hall of fame.\n  * `complexity_of_operators`: What complexity should be assigned to each operator,   and the occurrence of a constant or variable. By default, this is 1   for all operators. Can be a real number as well, in which case   the complexity of an expression will be rounded to the nearest integer.   Input this in the form of, e.g., [(^) => 3, sin => 2].\n  * `complexity_of_constants`: What complexity should be assigned to use of a constant.   By default, this is 1.\n  * `complexity_of_variables`: What complexity should be assigned to each variable.   By default, this is 1.\n  * `alpha`: The probability of accepting an equation mutation   during regularized evolution is given by exp(-delta_loss/(alpha * T)),   where T goes from 1 to 0. Thus, alpha=infinite is the same as no annealing.\n  * `maxsize`: Maximum size of equations during the search.\n  * `maxdepth`: Maximum depth of equations during the search, by default   this is set equal to the maxsize.\n  * `parsimony`: A multiplicative factor for how much complexity is   punished.\n  * `dimensional_constraint_penalty`: An additive factor if the dimensional   constraint is violated.\n  * `use_frequency`: Whether to use a parsimony that adapts to the   relative proportion of equations at each complexity; this will   ensure that there are a balanced number of equations considered   for every complexity.\n  * `use_frequency_in_tournament`: Whether to use the adaptive parsimony described   above inside the score, rather than just at the mutation accept/reject stage.\n  * `adaptive_parsimony_scaling`: How much to scale the adaptive parsimony term   in the loss. Increase this if the search is spending too much time   optimizing the most complex equations.\n  * `turbo`: Whether to use `LoopVectorization.@turbo` to evaluate expressions.   This can be significantly faster, but is only compatible with certain   operators. *Experimental!*\n  * `migration`: Whether to migrate equations between processes.\n  * `hof_migration`: Whether to migrate equations from the hall of fame   to processes.\n  * `fraction_replaced`: What fraction of each population to replace with   migrated equations at the end of each cycle.\n  * `fraction_replaced_hof`: What fraction to replace with hall of fame   equations at the end of each cycle.\n  * `should_simplify`: Whether to simplify equations. If you   pass a custom objective, this will be set to `false`.\n  * `should_optimize_constants`: Whether to use an optimization algorithm   to periodically optimize constants in equations.\n  * `optimizer_nrestarts`: How many different random starting positions to consider   for optimization of constants.\n  * `optimizer_algorithm`: Select algorithm to use for optimizing constants. Default   is \"BFGS\", but \"NelderMead\" is also supported.\n  * `optimizer_options`: General options for the constant optimization. For details   we refer to the documentation on `Optim.Options` from the `Optim.jl` package.   Options can be provided here as `NamedTuple`, e.g. `(iterations=16,)`, as a   `Dict`, e.g. Dict(:x_tol => 1.0e-32,), or as an `Optim.Options` instance.\n  * `output_file`: What file to store equations to, as a backup.\n  * `perturbation_factor`: When mutating a constant, either   multiply or divide by (1+perturbation_factor)^(rand()+1).\n  * `probability_negate_constant`: Probability of negating a constant in the equation   when mutating it.\n  * `mutation_weights`: Relative probabilities of the mutations. The struct   `MutationWeights` should be passed to these options.   See its documentation on `MutationWeights` for the different weights.\n  * `crossover_probability`: Probability of performing crossover.\n  * `annealing`: Whether to use simulated annealing.\n  * `warmup_maxsize_by`: Whether to slowly increase the max size from 5 up to   `maxsize`. If nonzero, specifies the fraction through the search   at which the maxsize should be reached.\n  * `verbosity`: Whether to print debugging statements or   not.\n  * `print_precision`: How many digits to print when printing   equations. By default, this is 5.\n  * `save_to_file`: Whether to save equations to a file during the search.\n  * `bin_constraints`: See `constraints`. This is the same, but specified for binary   operators only (for example, if you have an operator that is both a binary   and unary operator).\n  * `una_constraints`: Likewise, for unary operators.\n  * `seed`: What random seed to use. `nothing` uses no seed.\n  * `progress`: Whether to use a progress bar output (`verbosity` will   have no effect).\n  * `early_stop_condition`: Float - whether to stop early if the mean loss gets below this value.   Function - a function taking (loss, complexity) as arguments and returning true or false.\n  * `timeout_in_seconds`: Float64 - the time in seconds after which to exit (as an alternative to the number of iterations).\n  * `max_evals`: Int (or Nothing) - the maximum number of evaluations of expressions to perform.\n  * `skip_mutation_failures`: Whether to simply skip over mutations that fail or are rejected, rather than to replace the mutated   expression with the original expression and proceed normally.\n  * `enable_autodiff`: Whether to enable automatic differentiation functionality. This is turned off by default.   If turned on, this will be turned off if one of the operators does not have well-defined gradients.\n  * `nested_constraints`: Specifies how many times a combination of operators can be nested. For example,   `[sin => [cos => 0], cos => [cos => 2]]` specifies that `cos` may never appear within a `sin`,   but `sin` can be nested with itself an unlimited number of times. The second term specifies that `cos`   can be nested up to 2 times within a `cos`, so that `cos(cos(cos(x)))` is allowed (as well as any combination   of `+` or `-` within it), but `cos(cos(cos(cos(x))))` is not allowed. When an operator is not specified,   it is assumed that it can be nested an unlimited number of times. This requires that there is no operator   which is used both in the unary operators and the binary operators (e.g., `-` could be both subtract, and negation).   For binary operators, both arguments are treated the same way, and the max of each argument is constrained.\n  * `deterministic`: Use a global counter for the birth time, rather than calls to `time()`. This gives   perfect resolution, and is therefore deterministic. However, it is not thread safe, and must be used   in serial mode.\n  * `define_helper_functions`: Whether to define helper functions   for constructing and evaluating trees.\n  * `niterations::Int=10`: The number of iterations to perform the search.   More iterations will improve the results.\n  * `parallelism=:multithreading`: What parallelism mode to use.   The options are `:multithreading`, `:multiprocessing`, and `:serial`.   By default, multithreading will be used. Multithreading uses less memory,   but multiprocessing can handle multi-node compute. If using `:multithreading`   mode, the number of threads available to julia are used. If using   `:multiprocessing`, `numprocs` processes will be created dynamically if   `procs` is unset. If you have already allocated processes, pass them   to the `procs` argument and they will be used.   You may also pass a string instead of a symbol, like `\"multithreading\"`.\n  * `numprocs::Union{Int, Nothing}=nothing`:  The number of processes to use,   if you want `equation_search` to set this up automatically. By default   this will be `4`, but can be any number (you should pick a number <=   the number of cores available).\n  * `procs::Union{Vector{Int}, Nothing}=nothing`: If you have set up   a distributed run manually with `procs = addprocs()` and `@everywhere`,   pass the `procs` to this keyword argument.\n  * `addprocs_function::Union{Function, Nothing}=nothing`: If using multiprocessing   (`parallelism=:multithreading`), and are not passing `procs` manually,   then they will be allocated dynamically using `addprocs`. However,   you may also pass a custom function to use instead of `addprocs`.   This function should take a single positional argument,   which is the number of processes to use, as well as the `lazy` keyword argument.   For example, if set up on a slurm cluster, you could pass   `addprocs_function = addprocs_slurm`, which will set up slurm processes.\n  * `heap_size_hint_in_bytes::Union{Int,Nothing}=nothing`: On Julia 1.9+, you may set the `--heap-size-hint`   flag on Julia processes, recommending garbage collection once a process   is close to the recommended size. This is important for long-running distributed   jobs where each process has an independent memory, and can help avoid   out-of-memory errors. By default, this is set to `Sys.free_memory() / numprocs`.\n  * `runtests::Bool=true`: Whether to run (quick) tests before starting the   search, to see if there will be any problems during the equation search   related to the host environment.\n  * `loss_type::Type=Nothing`: If you would like to use a different type   for the loss than for the data you passed, specify the type here.   Note that if you pass complex data `::Complex{L}`, then the loss   type will automatically be set to `L`.\n  * `selection_method::Function`: Function to selection expression from   the Pareto frontier for use in `predict`.   See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.   This function should return a single integer specifying   the index of the expression to use. By default, this maximizes   the score (a pound-for-pound rating) of expressions reaching the threshold   of 1.5x the minimum loss. To override this at prediction time, you can pass   a named tuple with keys `data` and `idx` to `predict`. See the Operations   section for details.\n  * `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing   the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.\n\n# Operations\n\n  * `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. The expression used for prediction is defined by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.\n  * `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys `data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier, as determined by the `selection_method` function. Override in `predict` by passing a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). The outer vector is indexed by target variable, and the inner vector is ordered by increasing complexity. `T` is equal to the element type of the passed data.\n  * `equation_strings::Vector{Vector{String}}`: The expressions discovered by the search, represented as strings for easy inspection.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `best_idx::Vector{Int}`: The index of the best expression in each Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing  a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Vector{Node{T}}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). The outer vector is indexed by target variable, and the inner vector is ordered by increasing complexity.\n  * `equation_strings::Vector{Vector{String}}`: The expressions discovered by the search, represented as strings for easy inspection.\n  * `complexities::Vector{Vector{Int}}`: The complexity of each expression in each Pareto frontier.\n  * `losses::Vector{Vector{L}}`: The loss of each expression in each Pareto frontier, according to the loss function specified in the model. The type `L` is the loss type, which is usually the same as the element type of data passed (i.e., `T`), but can differ if complex data types are passed.\n  * `scores::Vector{Vector{L}}`: A metric which considers both the complexity and loss of an expression, equal to the change in the log-loss divided by the change in complexity, relative to the previous expression along the Pareto frontier. A larger score aims to indicate an expression is more likely to be the true expression generating the data, but this is very problem-dependent and generally several other factors should be considered.\n\n# Examples\n\n```julia\nusing MLJ\nMultitargetSRRegressor = @load MultitargetSRRegressor pkg=SymbolicRegression\nX = (a=rand(100), b=rand(100), c=rand(100))\nY = (y1=(@. cos(X.c) * 2.1 - 0.9), y2=(@. X.a * X.b + X.c))\nmodel = MultitargetSRRegressor(binary_operators=[+, -, *], unary_operators=[exp], niterations=100)\nmach = machine(model, X, Y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equations used:\nr = report(mach)\nfor (output_index, (eq, i)) in enumerate(zip(r.equation_strings, r.best_idx))\n    println(\"Equation used for \", output_index, \": \", eq[i])\nend\n```\n\nSee also [`SRRegressor`](@ref).\n"
 ":name" = "MultitargetSRRegressor"
 ":human_name" = "Multi-Target Symbolic Regression via Evolutionary Search"
 ":is_supervised" = "`true`"
 ":prediction_type" = ":deterministic"
 ":abstract_type" = "`MLJModelInterface.Deterministic`"
 ":implemented_methods" = []
-":hyperparameters" = "`(:binary_operators, :unary_operators, :constraints, :elementwise_loss, :loss_function, :tournament_selection_n, :tournament_selection_p, :topn, :complexity_of_operators, :complexity_of_constants, :complexity_of_variables, :parsimony, :dimensional_constraint_penalty, :dimensionless_constants_only, :alpha, :maxsize, :maxdepth, :turbo, :bumper, :migration, :hof_migration, :should_simplify, :should_optimize_constants, :output_file, :node_type, :populations, :perturbation_factor, :annealing, :batching, :batch_size, :mutation_weights, :crossover_probability, :warmup_maxsize_by, :use_frequency, :use_frequency_in_tournament, :adaptive_parsimony_scaling, :population_size, :ncycles_per_iteration, :fraction_replaced, :fraction_replaced_hof, :verbosity, :print_precision, :save_to_file, :probability_negate_constant, :seed, :bin_constraints, :una_constraints, :progress, :terminal_width, :optimizer_algorithm, :optimizer_nrestarts, :optimizer_probability, :optimizer_iterations, :optimizer_f_calls_limit, :optimizer_options, :use_recorder, :recorder_file, :early_stop_condition, :timeout_in_seconds, :max_evals, :skip_mutation_failures, :nested_constraints, :deterministic, :define_helper_functions, :fast_cycle, :npopulations, :npop, :niterations, :parallelism, :numprocs, :procs, :addprocs_function, :heap_size_hint_in_bytes, :runtests, :loss_type, :selection_method, :dimensions_type)`"
-":hyperparameter_types" = "`(\"Any\", \"Any\", \"Any\", \"Union{Nothing, Function, LossFunctions.Traits.SupervisedLoss}\", \"Union{Nothing, Function}\", \"Integer\", \"Real\", \"Integer\", \"Any\", \"Union{Nothing, Real}\", \"Union{Nothing, Real, AbstractVector}\", \"Real\", \"Union{Nothing, Real}\", \"Bool\", \"Real\", \"Integer\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Bool}\", \"Bool\", \"Union{Nothing, AbstractString}\", \"Type\", \"Integer\", \"Real\", \"Bool\", \"Bool\", \"Integer\", \"Union{SymbolicRegression.CoreModule.MutationWeightsModule.MutationWeights, NamedTuple, AbstractVector}\", \"Real\", \"Real\", \"Bool\", \"Bool\", \"Real\", \"Integer\", \"Integer\", \"Real\", \"Real\", \"Union{Nothing, Integer}\", \"Integer\", \"Bool\", \"Real\", \"Any\", \"Any\", \"Any\", \"Union{Nothing, Bool}\", \"Union{Nothing, Integer}\", \"Union{AbstractString, Optim.AbstractOptimizer}\", \"Integer\", \"Real\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Union{Nothing, Dict, NamedTuple, Optim.Options}\", \"Bool\", \"AbstractString\", \"Union{Nothing, Function, Real}\", \"Union{Nothing, Real}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Int64\", \"Symbol\", \"Union{Nothing, Int64}\", \"Union{Nothing, Vector{Int64}}\", \"Union{Nothing, Function}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Function\", \"Type{D} where D<:DynamicQuantities.AbstractDimensions\")`"
-":hyperparameter_ranges" = "`(nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing)`"
+":hyperparameters" = "`(:binary_operators, :unary_operators, :constraints, :elementwise_loss, :loss_function, :tournament_selection_n, :tournament_selection_p, :topn, :complexity_of_operators, :complexity_of_constants, :complexity_of_variables, :parsimony, :dimensional_constraint_penalty, :alpha, :maxsize, :maxdepth, :turbo, :migration, :hof_migration, :should_simplify, :should_optimize_constants, :output_file, :populations, :perturbation_factor, :annealing, :batching, :batch_size, :mutation_weights, :crossover_probability, :warmup_maxsize_by, :use_frequency, :use_frequency_in_tournament, :adaptive_parsimony_scaling, :population_size, :ncycles_per_iteration, :fraction_replaced, :fraction_replaced_hof, :verbosity, :print_precision, :save_to_file, :probability_negate_constant, :seed, :bin_constraints, :una_constraints, :progress, :terminal_width, :optimizer_algorithm, :optimizer_nrestarts, :optimizer_probability, :optimizer_iterations, :optimizer_options, :val_recorder, :recorder_file, :early_stop_condition, :timeout_in_seconds, :max_evals, :skip_mutation_failures, :enable_autodiff, :nested_constraints, :deterministic, :define_helper_functions, :fast_cycle, :npopulations, :npop, :niterations, :parallelism, :numprocs, :procs, :addprocs_function, :heap_size_hint_in_bytes, :runtests, :loss_type, :selection_method, :dimensions_type)`"
+":hyperparameter_types" = "`(\"Any\", \"Any\", \"Any\", \"Union{Nothing, Function, LossFunctions.Traits.SupervisedLoss}\", \"Union{Nothing, Function}\", \"Integer\", \"Real\", \"Integer\", \"Any\", \"Union{Nothing, Real}\", \"Union{Nothing, Real}\", \"Real\", \"Union{Nothing, Real}\", \"Real\", \"Integer\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Bool}\", \"Bool\", \"Union{Nothing, AbstractString}\", \"Integer\", \"Real\", \"Bool\", \"Bool\", \"Integer\", \"Union{SymbolicRegression.CoreModule.OptionsStructModule.MutationWeights, NamedTuple, AbstractVector}\", \"Real\", \"Real\", \"Bool\", \"Bool\", \"Real\", \"Integer\", \"Integer\", \"Real\", \"Real\", \"Union{Nothing, Integer}\", \"Integer\", \"Bool\", \"Real\", \"Any\", \"Any\", \"Any\", \"Union{Nothing, Bool}\", \"Union{Nothing, Integer}\", \"AbstractString\", \"Integer\", \"Real\", \"Union{Nothing, Integer}\", \"Union{Nothing, Dict, NamedTuple, Optim.Options}\", \"Val\", \"AbstractString\", \"Union{Nothing, Function, Real}\", \"Union{Nothing, Real}\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Any\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Int64\", \"Symbol\", \"Union{Nothing, Int64}\", \"Union{Nothing, Vector{Int64}}\", \"Union{Nothing, Function}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Function\", \"Type{D} where D<:DynamicQuantities.AbstractDimensions\")`"
+":hyperparameter_ranges" = "`(nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing)`"
 ":iteration_parameter" = "`nothing`"
 ":supports_training_losses" = "`false`"
 ":reports_feature_importances" = "`false`"
@@ -6498,16 +6498,16 @@
 ":supports_weights" = "`true`"
 ":supports_class_weights" = "`false`"
 ":supports_online" = "`false`"
-":docstring" = "```\nSRRegressor\n```\n\nA model type for constructing a Symbolic Regression via Evolutionary Search, based on [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl), and implementing the MLJ model interface.\n\nFrom MLJ, the type can be imported using\n\n```\nSRRegressor = @load SRRegressor pkg=SymbolicRegression\n```\n\nDo `model = SRRegressor()` to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `SRRegressor(binary_operators=...)`.\n\nSingle-target Symbolic Regression regressor (`SRRegressor`) searches for symbolic expressions that predict a single target variable from a set of input variables. All data is assumed to be `Continuous`. The search is performed using an evolutionary algorithm. This algorithm is described in the paper https://arxiv.org/abs/2305.01582.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `model` to data with\n\n```\nmach = machine(model, X, y)\n```\n\nOR\n\n```\nmach = machine(model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of scitype `Continuous`; check column scitypes with `schema(X)`. Variable names in discovered expressions will be taken from the column names of `X`, if available. Units in columns of `X` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n  * `y` is the target, which can be any `AbstractVector` whose element scitype is   `Continuous`; check the scitype with `scitype(y)`. Units in `y` (use `DynamicQuantities`   for units) will trigger dimensional analysis to be used.\n  * `w` is the observation weights which can either be `nothing` (default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`.\n\nTrain the machine using `fit!(mach)`, inspect the discovered expressions with `report(mach)`, and predict on new data with `predict(mach, Xnew)`. Note that unlike other regressors, symbolic regression stores a list of trained models. The model chosen from this list is defined by the function `selection_method` keyword argument, which by default balances accuracy and complexity. You can override this at prediction time by passing a named tuple with keys `data` and `idx`.\n\n# Hyper-parameters\n\n  * `binary_operators`: Vector of binary operators (functions) to use.   Each operator should be defined for two input scalars,   and one output scalar. All operators   need to be defined over the entire real line (excluding infinity - these   are stopped before they are input), or return `NaN` where not defined.   For speed, define it so it takes two reals   of the same type as input, and outputs the same type. For the SymbolicUtils   simplification backend, you will need to define a generic method of the   operator so it takes arbitrary types.\n  * `unary_operators`: Same, but for   unary operators (one input scalar, gives an output scalar).\n  * `constraints`: Array of pairs specifying size constraints   for each operator. The constraints for a binary operator should be a 2-tuple   (e.g., `(-1, -1)`) and the constraints for a unary operator should be an `Int`.   A size constraint is a limit to the size of the subtree   in each argument of an operator. e.g., `[(^)=>(-1, 3)]` means that the   `^` operator can have arbitrary size (`-1`) in its left argument,   but a maximum size of `3` in its right argument. Default is   no constraints.\n  * `batching`: Whether to evolve based on small mini-batches of data,   rather than the entire dataset.\n  * `batch_size`: What batch size to use if using batching.\n  * `elementwise_loss`: What elementwise loss function to use. Can be one of   the following losses, or any other loss of type   `SupervisedLoss`. You can also pass a function that takes   a scalar target (left argument), and scalar predicted (right   argument), and returns a scalar. This will be averaged   over the predicted data. If weights are supplied, your   function should take a third argument for the weight scalar.   Included losses:       Regression:           - `LPDistLoss{P}()`,           - `L1DistLoss()`,           - `L2DistLoss()` (mean square),           - `LogitDistLoss()`,           - `HuberLoss(d)`,           - `L1EpsilonInsLoss(ϵ)`,           - `L2EpsilonInsLoss(ϵ)`,           - `PeriodicLoss(c)`,           - `QuantileLoss(τ)`,       Classification:           - `ZeroOneLoss()`,           - `PerceptronLoss()`,           - `L1HingeLoss()`,           - `SmoothedL1HingeLoss(γ)`,           - `ModifiedHuberLoss()`,           - `L2MarginLoss()`,           - `ExpLoss()`,           - `SigmoidLoss()`,           - `DWDMarginLoss(q)`.\n  * `loss_function`: Alternatively, you may redefine the loss used   as any function of `tree::AbstractExpressionNode{T}`, `dataset::Dataset{T}`,   and `options::Options`, so long as you output a non-negative   scalar of type `T`. This is useful if you want to use a loss   that takes into account derivatives, or correlations across   the dataset. This also means you could use a custom evaluation   for a particular expression. If you are using   `batching=true`, then your function should   accept a fourth argument `idx`, which is either `nothing`   (indicating that the full dataset should be used), or a vector   of indices to use for the batch.   For example,\n\n    ```\n      function my_loss(tree, dataset::Dataset{T,L}, options)::L where {T,L}\n          prediction, flag = eval_tree_array(tree, dataset.X, options)\n          if !flag\n              return L(Inf)\n          end\n          return sum((prediction .- dataset.y) .^ 2) / dataset.n\n      end\n    ```\n  * `node_type::Type{N}=Node`: The type of node to use for the search.   For example, `Node` or `GraphNode`.\n  * `populations`: How many populations of equations to use.\n  * `population_size`: How many equations in each population.\n  * `ncycles_per_iteration`: How many generations to consider per iteration.\n  * `tournament_selection_n`: Number of expressions considered in each tournament.\n  * `tournament_selection_p`: The fittest expression in a tournament is to be   selected with probability `p`, the next fittest with probability `p*(1-p)`,   and so forth.\n  * `topn`: Number of equations to return to the host process, and to   consider for the hall of fame.\n  * `complexity_of_operators`: What complexity should be assigned to each operator,   and the occurrence of a constant or variable. By default, this is 1   for all operators. Can be a real number as well, in which case   the complexity of an expression will be rounded to the nearest integer.   Input this in the form of, e.g., [(^) => 3, sin => 2].\n  * `complexity_of_constants`: What complexity should be assigned to use of a constant.   By default, this is 1.\n  * `complexity_of_variables`: What complexity should be assigned to use of a variable,   which can also be a vector indicating different per-variable complexity.   By default, this is 1.\n  * `alpha`: The probability of accepting an equation mutation   during regularized evolution is given by exp(-delta_loss/(alpha * T)),   where T goes from 1 to 0. Thus, alpha=infinite is the same as no annealing.\n  * `maxsize`: Maximum size of equations during the search.\n  * `maxdepth`: Maximum depth of equations during the search, by default   this is set equal to the maxsize.\n  * `parsimony`: A multiplicative factor for how much complexity is   punished.\n  * `dimensional_constraint_penalty`: An additive factor if the dimensional   constraint is violated.\n  * `dimensionless_constants_only`: Whether to only allow dimensionless   constants.\n  * `use_frequency`: Whether to use a parsimony that adapts to the   relative proportion of equations at each complexity; this will   ensure that there are a balanced number of equations considered   for every complexity.\n  * `use_frequency_in_tournament`: Whether to use the adaptive parsimony described   above inside the score, rather than just at the mutation accept/reject stage.\n  * `adaptive_parsimony_scaling`: How much to scale the adaptive parsimony term   in the loss. Increase this if the search is spending too much time   optimizing the most complex equations.\n  * `turbo`: Whether to use `LoopVectorization.@turbo` to evaluate expressions.   This can be significantly faster, but is only compatible with certain   operators. *Experimental!*\n  * `bumper`: Whether to use Bumper.jl for faster evaluation. *Experimental!*\n  * `migration`: Whether to migrate equations between processes.\n  * `hof_migration`: Whether to migrate equations from the hall of fame   to processes.\n  * `fraction_replaced`: What fraction of each population to replace with   migrated equations at the end of each cycle.\n  * `fraction_replaced_hof`: What fraction to replace with hall of fame   equations at the end of each cycle.\n  * `should_simplify`: Whether to simplify equations. If you   pass a custom objective, this will be set to `false`.\n  * `should_optimize_constants`: Whether to use an optimization algorithm   to periodically optimize constants in equations.\n  * `optimizer_algorithm`: Select algorithm to use for optimizing constants. Default   is `Optim.BFGS(linesearch=LineSearches.BackTracking())`.\n  * `optimizer_nrestarts`: How many different random starting positions to consider   for optimization of constants.\n  * `optimizer_probability`: Probability of performing optimization of constants at   the end of a given iteration.\n  * `optimizer_iterations`: How many optimization iterations to perform. This gets  passed to `Optim.Options` as `iterations`. The default is 8.\n  * `optimizer_f_calls_limit`: How many function calls to allow during optimization.   This gets passed to `Optim.Options` as `f_calls_limit`. The default is   `0` which means no limit.\n  * `optimizer_options`: General options for the constant optimization. For details   we refer to the documentation on `Optim.Options` from the `Optim.jl` package.   Options can be provided here as `NamedTuple`, e.g. `(iterations=16,)`, as a   `Dict`, e.g. Dict(:x_tol => 1.0e-32,), or as an `Optim.Options` instance.\n  * `output_file`: What file to store equations to, as a backup.\n  * `perturbation_factor`: When mutating a constant, either   multiply or divide by (1+perturbation_factor)^(rand()+1).\n  * `probability_negate_constant`: Probability of negating a constant in the equation   when mutating it.\n  * `mutation_weights`: Relative probabilities of the mutations. The struct   `MutationWeights` should be passed to these options.   See its documentation on `MutationWeights` for the different weights.\n  * `crossover_probability`: Probability of performing crossover.\n  * `annealing`: Whether to use simulated annealing.\n  * `warmup_maxsize_by`: Whether to slowly increase the max size from 5 up to   `maxsize`. If nonzero, specifies the fraction through the search   at which the maxsize should be reached.\n  * `verbosity`: Whether to print debugging statements or   not.\n  * `print_precision`: How many digits to print when printing   equations. By default, this is 5.\n  * `save_to_file`: Whether to save equations to a file during the search.\n  * `bin_constraints`: See `constraints`. This is the same, but specified for binary   operators only (for example, if you have an operator that is both a binary   and unary operator).\n  * `una_constraints`: Likewise, for unary operators.\n  * `seed`: What random seed to use. `nothing` uses no seed.\n  * `progress`: Whether to use a progress bar output (`verbosity` will   have no effect).\n  * `early_stop_condition`: Float - whether to stop early if the mean loss gets below this value.   Function - a function taking (loss, complexity) as arguments and returning true or false.\n  * `timeout_in_seconds`: Float64 - the time in seconds after which to exit (as an alternative to the number of iterations).\n  * `max_evals`: Int (or Nothing) - the maximum number of evaluations of expressions to perform.\n  * `skip_mutation_failures`: Whether to simply skip over mutations that fail or are rejected, rather than to replace the mutated   expression with the original expression and proceed normally.\n  * `nested_constraints`: Specifies how many times a combination of operators can be nested. For example,   `[sin => [cos => 0], cos => [cos => 2]]` specifies that `cos` may never appear within a `sin`,   but `sin` can be nested with itself an unlimited number of times. The second term specifies that `cos`   can be nested up to 2 times within a `cos`, so that `cos(cos(cos(x)))` is allowed (as well as any combination   of `+` or `-` within it), but `cos(cos(cos(cos(x))))` is not allowed. When an operator is not specified,   it is assumed that it can be nested an unlimited number of times. This requires that there is no operator   which is used both in the unary operators and the binary operators (e.g., `-` could be both subtract, and negation).   For binary operators, both arguments are treated the same way, and the max of each argument is constrained.\n  * `deterministic`: Use a global counter for the birth time, rather than calls to `time()`. This gives   perfect resolution, and is therefore deterministic. However, it is not thread safe, and must be used   in serial mode.\n  * `define_helper_functions`: Whether to define helper functions   for constructing and evaluating trees.\n  * `niterations::Int=10`: The number of iterations to perform the search.   More iterations will improve the results.\n  * `parallelism=:multithreading`: What parallelism mode to use.   The options are `:multithreading`, `:multiprocessing`, and `:serial`.   By default, multithreading will be used. Multithreading uses less memory,   but multiprocessing can handle multi-node compute. If using `:multithreading`   mode, the number of threads available to julia are used. If using   `:multiprocessing`, `numprocs` processes will be created dynamically if   `procs` is unset. If you have already allocated processes, pass them   to the `procs` argument and they will be used.   You may also pass a string instead of a symbol, like `\"multithreading\"`.\n  * `numprocs::Union{Int, Nothing}=nothing`:  The number of processes to use,   if you want `equation_search` to set this up automatically. By default   this will be `4`, but can be any number (you should pick a number <=   the number of cores available).\n  * `procs::Union{Vector{Int}, Nothing}=nothing`: If you have set up   a distributed run manually with `procs = addprocs()` and `@everywhere`,   pass the `procs` to this keyword argument.\n  * `addprocs_function::Union{Function, Nothing}=nothing`: If using multiprocessing   (`parallelism=:multithreading`), and are not passing `procs` manually,   then they will be allocated dynamically using `addprocs`. However,   you may also pass a custom function to use instead of `addprocs`.   This function should take a single positional argument,   which is the number of processes to use, as well as the `lazy` keyword argument.   For example, if set up on a slurm cluster, you could pass   `addprocs_function = addprocs_slurm`, which will set up slurm processes.\n  * `heap_size_hint_in_bytes::Union{Int,Nothing}=nothing`: On Julia 1.9+, you may set the `--heap-size-hint`   flag on Julia processes, recommending garbage collection once a process   is close to the recommended size. This is important for long-running distributed   jobs where each process has an independent memory, and can help avoid   out-of-memory errors. By default, this is set to `Sys.free_memory() / numprocs`.\n  * `runtests::Bool=true`: Whether to run (quick) tests before starting the   search, to see if there will be any problems during the equation search   related to the host environment.\n  * `loss_type::Type=Nothing`: If you would like to use a different type   for the loss than for the data you passed, specify the type here.   Note that if you pass complex data `::Complex{L}`, then the loss   type will automatically be set to `L`.\n  * `selection_method::Function`: Function to selection expression from   the Pareto frontier for use in `predict`.   See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.   This function should return a single integer specifying   the index of the expression to use. By default, this maximizes   the score (a pound-for-pound rating) of expressions reaching the threshold   of 1.5x the minimum loss. To override this at prediction time, you can pass   a named tuple with keys `data` and `idx` to `predict`. See the Operations   section for details.\n  * `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing   the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.\n\n# Operations\n\n  * `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. The expression used for prediction is defined by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.\n  * `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys `data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `best_idx::Int`: The index of the best expression in the Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing   a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Node{T}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). `T` is equal to the element type of the passed data.\n  * `equation_strings::Vector{String}`: The expressions discovered by the search, represented as strings for easy inspection.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `best_idx::Int`: The index of the best expression in the Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing  a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Node{T}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity).\n  * `equation_strings::Vector{String}`: The expressions discovered by the search, represented as strings for easy inspection.\n  * `complexities::Vector{Int}`: The complexity of each expression in the Pareto frontier.\n  * `losses::Vector{L}`: The loss of each expression in the Pareto frontier, according to the loss function specified in the model. The type `L` is the loss type, which is usually the same as the element type of data passed (i.e., `T`), but can differ if complex data types are passed.\n  * `scores::Vector{L}`: A metric which considers both the complexity and loss of an expression, equal to the change in the log-loss divided by the change in complexity, relative to the previous expression along the Pareto frontier. A larger score aims to indicate an expression is more likely to be the true expression generating the data, but this is very problem-dependent and generally several other factors should be considered.\n\n# Examples\n\n```julia\nusing MLJ\nSRRegressor = @load SRRegressor pkg=SymbolicRegression\nX, y = @load_boston\nmodel = SRRegressor(binary_operators=[+, -, *], unary_operators=[exp], niterations=100)\nmach = machine(model, X, y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equation used:\nr = report(mach)\nprintln(\"Equation used:\", r.equation_strings[r.best_idx])\n```\n\nWith units and variable names:\n\n```julia\nusing MLJ\nusing DynamicQuantities\nSRegressor = @load SRRegressor pkg=SymbolicRegression\n\nX = (; x1=rand(32) .* us\"km/h\", x2=rand(32) .* us\"km\")\ny = @. X.x2 / X.x1 + 0.5us\"h\"\nmodel = SRRegressor(binary_operators=[+, -, *, /])\nmach = machine(model, X, y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equation used:\nr = report(mach)\nprintln(\"Equation used:\", r.equation_strings[r.best_idx])\n```\n\nSee also [`MultitargetSRRegressor`](@ref).\n"
+":docstring" = "```\nSRRegressor\n```\n\nA model type for constructing a Symbolic Regression via Evolutionary Search, based on [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl), and implementing the MLJ model interface.\n\nFrom MLJ, the type can be imported using\n\n```\nSRRegressor = @load SRRegressor pkg=SymbolicRegression\n```\n\nDo `model = SRRegressor()` to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `SRRegressor(binary_operators=...)`.\n\nSingle-target Symbolic Regression regressor (`SRRegressor`) searches for symbolic expressions that predict a single target variable from a set of input variables. All data is assumed to be `Continuous`. The search is performed using an evolutionary algorithm. This algorithm is described in the paper https://arxiv.org/abs/2305.01582.\n\n# Training data\n\nIn MLJ or MLJBase, bind an instance `model` to data with\n\n```\nmach = machine(model, X, y)\n```\n\nOR\n\n```\nmach = machine(model, X, y, w)\n```\n\nHere:\n\n  * `X` is any table of input features (eg, a `DataFrame`) whose columns are of scitype `Continuous`; check column scitypes with `schema(X)`. Variable names in discovered expressions will be taken from the column names of `X`, if available. Units in columns of `X` (use `DynamicQuantities` for units) will trigger dimensional analysis to be used.\n  * `y` is the target, which can be any `AbstractVector` whose element scitype is   `Continuous`; check the scitype with `scitype(y)`. Units in `y` (use `DynamicQuantities`   for units) will trigger dimensional analysis to be used.\n  * `w` is the observation weights which can either be `nothing` (default) or an `AbstractVector` whoose element scitype is `Count` or `Continuous`.\n\nTrain the machine using `fit!(mach)`, inspect the discovered expressions with `report(mach)`, and predict on new data with `predict(mach, Xnew)`. Note that unlike other regressors, symbolic regression stores a list of trained models. The model chosen from this list is defined by the function `selection_method` keyword argument, which by default balances accuracy and complexity. You can override this at prediction time by passing a named tuple with keys `data` and `idx`.\n\n# Hyper-parameters\n\n  * `binary_operators`: Vector of binary operators (functions) to use.   Each operator should be defined for two input scalars,   and one output scalar. All operators   need to be defined over the entire real line (excluding infinity - these   are stopped before they are input), or return `NaN` where not defined.   For speed, define it so it takes two reals   of the same type as input, and outputs the same type. For the SymbolicUtils   simplification backend, you will need to define a generic method of the   operator so it takes arbitrary types.\n  * `unary_operators`: Same, but for   unary operators (one input scalar, gives an output scalar).\n  * `constraints`: Array of pairs specifying size constraints   for each operator. The constraints for a binary operator should be a 2-tuple   (e.g., `(-1, -1)`) and the constraints for a unary operator should be an `Int`.   A size constraint is a limit to the size of the subtree   in each argument of an operator. e.g., `[(^)=>(-1, 3)]` means that the   `^` operator can have arbitrary size (`-1`) in its left argument,   but a maximum size of `3` in its right argument. Default is   no constraints.\n  * `batching`: Whether to evolve based on small mini-batches of data,   rather than the entire dataset.\n  * `batch_size`: What batch size to use if using batching.\n  * `elementwise_loss`: What elementwise loss function to use. Can be one of   the following losses, or any other loss of type   `SupervisedLoss`. You can also pass a function that takes   a scalar target (left argument), and scalar predicted (right   argument), and returns a scalar. This will be averaged   over the predicted data. If weights are supplied, your   function should take a third argument for the weight scalar.   Included losses:       Regression:           - `LPDistLoss{P}()`,           - `L1DistLoss()`,           - `L2DistLoss()` (mean square),           - `LogitDistLoss()`,           - `HuberLoss(d)`,           - `L1EpsilonInsLoss(ϵ)`,           - `L2EpsilonInsLoss(ϵ)`,           - `PeriodicLoss(c)`,           - `QuantileLoss(τ)`,       Classification:           - `ZeroOneLoss()`,           - `PerceptronLoss()`,           - `L1HingeLoss()`,           - `SmoothedL1HingeLoss(γ)`,           - `ModifiedHuberLoss()`,           - `L2MarginLoss()`,           - `ExpLoss()`,           - `SigmoidLoss()`,           - `DWDMarginLoss(q)`.\n  * `loss_function`: Alternatively, you may redefine the loss used   as any function of `tree::Node{T}`, `dataset::Dataset{T}`,   and `options::Options`, so long as you output a non-negative   scalar of type `T`. This is useful if you want to use a loss   that takes into account derivatives, or correlations across   the dataset. This also means you could use a custom evaluation   for a particular expression. If you are using   `batching=true`, then your function should   accept a fourth argument `idx`, which is either `nothing`   (indicating that the full dataset should be used), or a vector   of indices to use for the batch.   For example,\n\n    ```\n      function my_loss(tree, dataset::Dataset{T,L}, options)::L where {T,L}\n          prediction, flag = eval_tree_array(tree, dataset.X, options)\n          if !flag\n              return L(Inf)\n          end\n          return sum((prediction .- dataset.y) .^ 2) / dataset.n\n      end\n    ```\n  * `populations`: How many populations of equations to use.\n  * `population_size`: How many equations in each population.\n  * `ncycles_per_iteration`: How many generations to consider per iteration.\n  * `tournament_selection_n`: Number of expressions considered in each tournament.\n  * `tournament_selection_p`: The fittest expression in a tournament is to be   selected with probability `p`, the next fittest with probability `p*(1-p)`,   and so forth.\n  * `topn`: Number of equations to return to the host process, and to   consider for the hall of fame.\n  * `complexity_of_operators`: What complexity should be assigned to each operator,   and the occurrence of a constant or variable. By default, this is 1   for all operators. Can be a real number as well, in which case   the complexity of an expression will be rounded to the nearest integer.   Input this in the form of, e.g., [(^) => 3, sin => 2].\n  * `complexity_of_constants`: What complexity should be assigned to use of a constant.   By default, this is 1.\n  * `complexity_of_variables`: What complexity should be assigned to each variable.   By default, this is 1.\n  * `alpha`: The probability of accepting an equation mutation   during regularized evolution is given by exp(-delta_loss/(alpha * T)),   where T goes from 1 to 0. Thus, alpha=infinite is the same as no annealing.\n  * `maxsize`: Maximum size of equations during the search.\n  * `maxdepth`: Maximum depth of equations during the search, by default   this is set equal to the maxsize.\n  * `parsimony`: A multiplicative factor for how much complexity is   punished.\n  * `dimensional_constraint_penalty`: An additive factor if the dimensional   constraint is violated.\n  * `use_frequency`: Whether to use a parsimony that adapts to the   relative proportion of equations at each complexity; this will   ensure that there are a balanced number of equations considered   for every complexity.\n  * `use_frequency_in_tournament`: Whether to use the adaptive parsimony described   above inside the score, rather than just at the mutation accept/reject stage.\n  * `adaptive_parsimony_scaling`: How much to scale the adaptive parsimony term   in the loss. Increase this if the search is spending too much time   optimizing the most complex equations.\n  * `turbo`: Whether to use `LoopVectorization.@turbo` to evaluate expressions.   This can be significantly faster, but is only compatible with certain   operators. *Experimental!*\n  * `migration`: Whether to migrate equations between processes.\n  * `hof_migration`: Whether to migrate equations from the hall of fame   to processes.\n  * `fraction_replaced`: What fraction of each population to replace with   migrated equations at the end of each cycle.\n  * `fraction_replaced_hof`: What fraction to replace with hall of fame   equations at the end of each cycle.\n  * `should_simplify`: Whether to simplify equations. If you   pass a custom objective, this will be set to `false`.\n  * `should_optimize_constants`: Whether to use an optimization algorithm   to periodically optimize constants in equations.\n  * `optimizer_nrestarts`: How many different random starting positions to consider   for optimization of constants.\n  * `optimizer_algorithm`: Select algorithm to use for optimizing constants. Default   is \"BFGS\", but \"NelderMead\" is also supported.\n  * `optimizer_options`: General options for the constant optimization. For details   we refer to the documentation on `Optim.Options` from the `Optim.jl` package.   Options can be provided here as `NamedTuple`, e.g. `(iterations=16,)`, as a   `Dict`, e.g. Dict(:x_tol => 1.0e-32,), or as an `Optim.Options` instance.\n  * `output_file`: What file to store equations to, as a backup.\n  * `perturbation_factor`: When mutating a constant, either   multiply or divide by (1+perturbation_factor)^(rand()+1).\n  * `probability_negate_constant`: Probability of negating a constant in the equation   when mutating it.\n  * `mutation_weights`: Relative probabilities of the mutations. The struct   `MutationWeights` should be passed to these options.   See its documentation on `MutationWeights` for the different weights.\n  * `crossover_probability`: Probability of performing crossover.\n  * `annealing`: Whether to use simulated annealing.\n  * `warmup_maxsize_by`: Whether to slowly increase the max size from 5 up to   `maxsize`. If nonzero, specifies the fraction through the search   at which the maxsize should be reached.\n  * `verbosity`: Whether to print debugging statements or   not.\n  * `print_precision`: How many digits to print when printing   equations. By default, this is 5.\n  * `save_to_file`: Whether to save equations to a file during the search.\n  * `bin_constraints`: See `constraints`. This is the same, but specified for binary   operators only (for example, if you have an operator that is both a binary   and unary operator).\n  * `una_constraints`: Likewise, for unary operators.\n  * `seed`: What random seed to use. `nothing` uses no seed.\n  * `progress`: Whether to use a progress bar output (`verbosity` will   have no effect).\n  * `early_stop_condition`: Float - whether to stop early if the mean loss gets below this value.   Function - a function taking (loss, complexity) as arguments and returning true or false.\n  * `timeout_in_seconds`: Float64 - the time in seconds after which to exit (as an alternative to the number of iterations).\n  * `max_evals`: Int (or Nothing) - the maximum number of evaluations of expressions to perform.\n  * `skip_mutation_failures`: Whether to simply skip over mutations that fail or are rejected, rather than to replace the mutated   expression with the original expression and proceed normally.\n  * `enable_autodiff`: Whether to enable automatic differentiation functionality. This is turned off by default.   If turned on, this will be turned off if one of the operators does not have well-defined gradients.\n  * `nested_constraints`: Specifies how many times a combination of operators can be nested. For example,   `[sin => [cos => 0], cos => [cos => 2]]` specifies that `cos` may never appear within a `sin`,   but `sin` can be nested with itself an unlimited number of times. The second term specifies that `cos`   can be nested up to 2 times within a `cos`, so that `cos(cos(cos(x)))` is allowed (as well as any combination   of `+` or `-` within it), but `cos(cos(cos(cos(x))))` is not allowed. When an operator is not specified,   it is assumed that it can be nested an unlimited number of times. This requires that there is no operator   which is used both in the unary operators and the binary operators (e.g., `-` could be both subtract, and negation).   For binary operators, both arguments are treated the same way, and the max of each argument is constrained.\n  * `deterministic`: Use a global counter for the birth time, rather than calls to `time()`. This gives   perfect resolution, and is therefore deterministic. However, it is not thread safe, and must be used   in serial mode.\n  * `define_helper_functions`: Whether to define helper functions   for constructing and evaluating trees.\n  * `niterations::Int=10`: The number of iterations to perform the search.   More iterations will improve the results.\n  * `parallelism=:multithreading`: What parallelism mode to use.   The options are `:multithreading`, `:multiprocessing`, and `:serial`.   By default, multithreading will be used. Multithreading uses less memory,   but multiprocessing can handle multi-node compute. If using `:multithreading`   mode, the number of threads available to julia are used. If using   `:multiprocessing`, `numprocs` processes will be created dynamically if   `procs` is unset. If you have already allocated processes, pass them   to the `procs` argument and they will be used.   You may also pass a string instead of a symbol, like `\"multithreading\"`.\n  * `numprocs::Union{Int, Nothing}=nothing`:  The number of processes to use,   if you want `equation_search` to set this up automatically. By default   this will be `4`, but can be any number (you should pick a number <=   the number of cores available).\n  * `procs::Union{Vector{Int}, Nothing}=nothing`: If you have set up   a distributed run manually with `procs = addprocs()` and `@everywhere`,   pass the `procs` to this keyword argument.\n  * `addprocs_function::Union{Function, Nothing}=nothing`: If using multiprocessing   (`parallelism=:multithreading`), and are not passing `procs` manually,   then they will be allocated dynamically using `addprocs`. However,   you may also pass a custom function to use instead of `addprocs`.   This function should take a single positional argument,   which is the number of processes to use, as well as the `lazy` keyword argument.   For example, if set up on a slurm cluster, you could pass   `addprocs_function = addprocs_slurm`, which will set up slurm processes.\n  * `heap_size_hint_in_bytes::Union{Int,Nothing}=nothing`: On Julia 1.9+, you may set the `--heap-size-hint`   flag on Julia processes, recommending garbage collection once a process   is close to the recommended size. This is important for long-running distributed   jobs where each process has an independent memory, and can help avoid   out-of-memory errors. By default, this is set to `Sys.free_memory() / numprocs`.\n  * `runtests::Bool=true`: Whether to run (quick) tests before starting the   search, to see if there will be any problems during the equation search   related to the host environment.\n  * `loss_type::Type=Nothing`: If you would like to use a different type   for the loss than for the data you passed, specify the type here.   Note that if you pass complex data `::Complex{L}`, then the loss   type will automatically be set to `L`.\n  * `selection_method::Function`: Function to selection expression from   the Pareto frontier for use in `predict`.   See `SymbolicRegression.MLJInterfaceModule.choose_best` for an example.   This function should return a single integer specifying   the index of the expression to use. By default, this maximizes   the score (a pound-for-pound rating) of expressions reaching the threshold   of 1.5x the minimum loss. To override this at prediction time, you can pass   a named tuple with keys `data` and `idx` to `predict`. See the Operations   section for details.\n  * `dimensions_type::AbstractDimensions`: The type of dimensions to use when storing   the units of the data. By default this is `DynamicQuantities.SymbolicDimensions`.\n\n# Operations\n\n  * `predict(mach, Xnew)`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. The expression used for prediction is defined by the `selection_method` function, which can be seen by viewing `report(mach).best_idx`.\n  * `predict(mach, (data=Xnew, idx=i))`: Return predictions of the target given features `Xnew`, which should have same scitype as `X` above. By passing a named tuple with keys `data` and `idx`, you are able to specify the equation you wish to evaluate in `idx`.\n\n# Fitted parameters\n\nThe fields of `fitted_params(mach)` are:\n\n  * `best_idx::Int`: The index of the best expression in the Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing   a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Node{T}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity). `T` is equal to the element type of the passed data.\n  * `equation_strings::Vector{String}`: The expressions discovered by the search, represented as strings for easy inspection.\n\n# Report\n\nThe fields of `report(mach)` are:\n\n  * `best_idx::Int`: The index of the best expression in the Pareto frontier,  as determined by the `selection_method` function. Override in `predict` by passing  a named tuple with keys `data` and `idx`.\n  * `equations::Vector{Node{T}}`: The expressions discovered by the search, represented in a dominating Pareto frontier (i.e., the best expressions found for each complexity).\n  * `equation_strings::Vector{String}`: The expressions discovered by the search, represented as strings for easy inspection.\n  * `complexities::Vector{Int}`: The complexity of each expression in the Pareto frontier.\n  * `losses::Vector{L}`: The loss of each expression in the Pareto frontier, according to the loss function specified in the model. The type `L` is the loss type, which is usually the same as the element type of data passed (i.e., `T`), but can differ if complex data types are passed.\n  * `scores::Vector{L}`: A metric which considers both the complexity and loss of an expression, equal to the change in the log-loss divided by the change in complexity, relative to the previous expression along the Pareto frontier. A larger score aims to indicate an expression is more likely to be the true expression generating the data, but this is very problem-dependent and generally several other factors should be considered.\n\n# Examples\n\n```julia\nusing MLJ\nSRRegressor = @load SRRegressor pkg=SymbolicRegression\nX, y = @load_boston\nmodel = SRRegressor(binary_operators=[+, -, *], unary_operators=[exp], niterations=100)\nmach = machine(model, X, y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equation used:\nr = report(mach)\nprintln(\"Equation used:\", r.equation_strings[r.best_idx])\n```\n\nWith units and variable names:\n\n```julia\nusing MLJ\nusing DynamicQuantities\nSRegressor = @load SRRegressor pkg=SymbolicRegression\n\nX = (; x1=rand(32) .* us\"km/h\", x2=rand(32) .* us\"km\")\ny = @. X.x2 / X.x1 + 0.5us\"h\"\nmodel = SRRegressor(binary_operators=[+, -, *, /])\nmach = machine(model, X, y)\nfit!(mach)\ny_hat = predict(mach, X)\n# View the equation used:\nr = report(mach)\nprintln(\"Equation used:\", r.equation_strings[r.best_idx])\n```\n\nSee also [`MultitargetSRRegressor`](@ref).\n"
 ":name" = "SRRegressor"
 ":human_name" = "Symbolic Regression via Evolutionary Search"
 ":is_supervised" = "`true`"
 ":prediction_type" = ":deterministic"
 ":abstract_type" = "`MLJModelInterface.Deterministic`"
 ":implemented_methods" = []
-":hyperparameters" = "`(:binary_operators, :unary_operators, :constraints, :elementwise_loss, :loss_function, :tournament_selection_n, :tournament_selection_p, :topn, :complexity_of_operators, :complexity_of_constants, :complexity_of_variables, :parsimony, :dimensional_constraint_penalty, :dimensionless_constants_only, :alpha, :maxsize, :maxdepth, :turbo, :bumper, :migration, :hof_migration, :should_simplify, :should_optimize_constants, :output_file, :node_type, :populations, :perturbation_factor, :annealing, :batching, :batch_size, :mutation_weights, :crossover_probability, :warmup_maxsize_by, :use_frequency, :use_frequency_in_tournament, :adaptive_parsimony_scaling, :population_size, :ncycles_per_iteration, :fraction_replaced, :fraction_replaced_hof, :verbosity, :print_precision, :save_to_file, :probability_negate_constant, :seed, :bin_constraints, :una_constraints, :progress, :terminal_width, :optimizer_algorithm, :optimizer_nrestarts, :optimizer_probability, :optimizer_iterations, :optimizer_f_calls_limit, :optimizer_options, :use_recorder, :recorder_file, :early_stop_condition, :timeout_in_seconds, :max_evals, :skip_mutation_failures, :nested_constraints, :deterministic, :define_helper_functions, :fast_cycle, :npopulations, :npop, :niterations, :parallelism, :numprocs, :procs, :addprocs_function, :heap_size_hint_in_bytes, :runtests, :loss_type, :selection_method, :dimensions_type)`"
-":hyperparameter_types" = "`(\"Any\", \"Any\", \"Any\", \"Union{Nothing, Function, LossFunctions.Traits.SupervisedLoss}\", \"Union{Nothing, Function}\", \"Integer\", \"Real\", \"Integer\", \"Any\", \"Union{Nothing, Real}\", \"Union{Nothing, Real, AbstractVector}\", \"Real\", \"Union{Nothing, Real}\", \"Bool\", \"Real\", \"Integer\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Bool}\", \"Bool\", \"Union{Nothing, AbstractString}\", \"Type\", \"Integer\", \"Real\", \"Bool\", \"Bool\", \"Integer\", \"Union{SymbolicRegression.CoreModule.MutationWeightsModule.MutationWeights, NamedTuple, AbstractVector}\", \"Real\", \"Real\", \"Bool\", \"Bool\", \"Real\", \"Integer\", \"Integer\", \"Real\", \"Real\", \"Union{Nothing, Integer}\", \"Integer\", \"Bool\", \"Real\", \"Any\", \"Any\", \"Any\", \"Union{Nothing, Bool}\", \"Union{Nothing, Integer}\", \"Union{AbstractString, Optim.AbstractOptimizer}\", \"Integer\", \"Real\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Union{Nothing, Dict, NamedTuple, Optim.Options}\", \"Bool\", \"AbstractString\", \"Union{Nothing, Function, Real}\", \"Union{Nothing, Real}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Int64\", \"Symbol\", \"Union{Nothing, Int64}\", \"Union{Nothing, Vector{Int64}}\", \"Union{Nothing, Function}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Function\", \"Type{D} where D<:DynamicQuantities.AbstractDimensions\")`"
-":hyperparameter_ranges" = "`(nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing)`"
+":hyperparameters" = "`(:binary_operators, :unary_operators, :constraints, :elementwise_loss, :loss_function, :tournament_selection_n, :tournament_selection_p, :topn, :complexity_of_operators, :complexity_of_constants, :complexity_of_variables, :parsimony, :dimensional_constraint_penalty, :alpha, :maxsize, :maxdepth, :turbo, :migration, :hof_migration, :should_simplify, :should_optimize_constants, :output_file, :populations, :perturbation_factor, :annealing, :batching, :batch_size, :mutation_weights, :crossover_probability, :warmup_maxsize_by, :use_frequency, :use_frequency_in_tournament, :adaptive_parsimony_scaling, :population_size, :ncycles_per_iteration, :fraction_replaced, :fraction_replaced_hof, :verbosity, :print_precision, :save_to_file, :probability_negate_constant, :seed, :bin_constraints, :una_constraints, :progress, :terminal_width, :optimizer_algorithm, :optimizer_nrestarts, :optimizer_probability, :optimizer_iterations, :optimizer_options, :val_recorder, :recorder_file, :early_stop_condition, :timeout_in_seconds, :max_evals, :skip_mutation_failures, :enable_autodiff, :nested_constraints, :deterministic, :define_helper_functions, :fast_cycle, :npopulations, :npop, :niterations, :parallelism, :numprocs, :procs, :addprocs_function, :heap_size_hint_in_bytes, :runtests, :loss_type, :selection_method, :dimensions_type)`"
+":hyperparameter_types" = "`(\"Any\", \"Any\", \"Any\", \"Union{Nothing, Function, LossFunctions.Traits.SupervisedLoss}\", \"Union{Nothing, Function}\", \"Integer\", \"Real\", \"Integer\", \"Any\", \"Union{Nothing, Real}\", \"Union{Nothing, Real}\", \"Real\", \"Union{Nothing, Real}\", \"Real\", \"Integer\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Bool}\", \"Bool\", \"Union{Nothing, AbstractString}\", \"Integer\", \"Real\", \"Bool\", \"Bool\", \"Integer\", \"Union{SymbolicRegression.CoreModule.OptionsStructModule.MutationWeights, NamedTuple, AbstractVector}\", \"Real\", \"Real\", \"Bool\", \"Bool\", \"Real\", \"Integer\", \"Integer\", \"Real\", \"Real\", \"Union{Nothing, Integer}\", \"Integer\", \"Bool\", \"Real\", \"Any\", \"Any\", \"Any\", \"Union{Nothing, Bool}\", \"Union{Nothing, Integer}\", \"AbstractString\", \"Integer\", \"Real\", \"Union{Nothing, Integer}\", \"Union{Nothing, Dict, NamedTuple, Optim.Options}\", \"Val\", \"AbstractString\", \"Union{Nothing, Function, Real}\", \"Union{Nothing, Real}\", \"Union{Nothing, Integer}\", \"Bool\", \"Bool\", \"Any\", \"Bool\", \"Bool\", \"Bool\", \"Union{Nothing, Integer}\", \"Union{Nothing, Integer}\", \"Int64\", \"Symbol\", \"Union{Nothing, Int64}\", \"Union{Nothing, Vector{Int64}}\", \"Union{Nothing, Function}\", \"Union{Nothing, Integer}\", \"Bool\", \"Any\", \"Function\", \"Type{D} where D<:DynamicQuantities.AbstractDimensions\")`"
+":hyperparameter_ranges" = "`(nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing)`"
 ":iteration_parameter" = "`nothing`"
 ":supports_training_losses" = "`false`"
 ":reports_feature_importances" = "`false`"

From 967ee85df08f1902ed23e7d6f91df3453176789c Mon Sep 17 00:00:00 2001
From: "Anthony D. Blaom" <anthony.blaom@gmail.com>
Date: Fri, 2 Aug 2024 11:05:02 +1200
Subject: [PATCH 2/2] bump 0.17.4

---
 Project.toml     | 2 +-
 src/MLJModels.jl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Project.toml b/Project.toml
index 5babd59..13262a4 100644
--- a/Project.toml
+++ b/Project.toml
@@ -1,7 +1,7 @@
 name = "MLJModels"
 uuid = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
 authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
-version = "0.17.3"
+version = "0.17.4"
 
 [deps]
 CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
diff --git a/src/MLJModels.jl b/src/MLJModels.jl
index daad124..36b2c1a 100755
--- a/src/MLJModels.jl
+++ b/src/MLJModels.jl
@@ -1,4 +1,4 @@
-module MLJModels 
+module MLJModels
 
 import MLJModelInterface
 import MLJModelInterface: Model, metadata_pkg, metadata_model, @mlj_model, info,