Make it easier to select expression from Pareto front for evaluation #289

MilesCranmer · 2024-02-17T16:51:30Z

This enables the following syntax:

model = SRRegressor()
mach = machine(model, X, y)
fit!(mach)

# Predict with 3rd equation:
predict(mach, (data=X, idx=3))

# Predict with most complex equation:
r = report(mach)
predict(mach, (data=X, idx=lastindex(r.equations)))

which lets the user specify the equation they wish to use for prediction from the Pareto front.

For multiple outputs:

model = MultitargetSRRegressor()
mach = machine(model, X, y)
fit!(mach)


# Choose the 1st equation for output 1, 10th for output 2, and 5th for output 3:
predict(mach, (data=X, idx=[1, 10, 5]))

TODO:

Add example to documentation
Add example to README
Remove discussion of modifying selection_method (Seems to not work, see Issues with making predictions using the expression selected using the function passed to selection_method PySR#543 by @MrChewi)

@ablaom I would be interested to know if there is any way to make this sort of behavior compatible with MLJ? As it stands I think this might break some MLJ interfaces for users who wish to use this. (They would still be able to use selection_method parameter, but only before fit!)

github-actions · 2024-02-17T17:18:33Z

Benchmark Results

	master	`b5d56e0`...	t[master]/t[`b5d56e0`...]
search/multithreading	24 ± 0.19 s	24.4 ± 1.8 s	0.985
search/serial	33.8 ± 0.62 s	33.9 ± 1.2 s	0.995
utils/best_of_sample	1.44 ± 0.58 μs	1.36 ± 0.49 μs	1.06
utils/check_constraints_x10	12.9 ± 3.4 μs	12.6 ± 3.4 μs	1.02
utils/compute_complexity_x10/Float64	2.25 ± 0.11 μs	2.27 ± 0.13 μs	0.992
utils/compute_complexity_x10/Int64	2.26 ± 0.12 μs	2.27 ± 0.12 μs	1
utils/compute_complexity_x10/nothing	1.44 ± 0.1 μs	1.44 ± 0.13 μs	0.999
utils/optimize_constants_x10	31.3 ± 6.8 ms	30.6 ± 7.1 ms	1.02
time_to_load	1.37 ± 0.052 s	1.39 ± 0.056 s	0.983

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@MilesCranmer

[Diff since v0.23.1](v0.23.1...v0.23.2) **Merged pull requests:** - Formatting overhaul (#278) (@MilesCranmer) - Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer) - Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer) **Closed issues:** - Garbage collection too passive on worker processes (#237) - How can I set the maximum number of nests? (#285)

@MilesCranmer

[Diff since v0.23.1](v0.23.1...v0.23.2) **Merged pull requests:** - Formatting overhaul (#278) (@MilesCranmer) - Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer) - Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer) **Closed issues:** - Garbage collection too passive on worker processes (#237) - How can I set the maximum number of nests? (#285)

ablaom · 2024-02-21T21:09:35Z

Sorry for not catching this earlier. This looks very breaking to me. In MLJ you cannot attach metadata to data at predict time.

MilesCranmer · 2024-02-21T23:01:07Z

I wasn't sure the proper way to incorporate this. SRRegressor basically fits an ensemble of models (each equation) and the user requires an interface to specify which model they wish to evaluate. The default behavior (predict(mach, X)) uses some selection method for picking from the ensemble. But I wasn't sure how to let the user decide themselves.

ablaom · 2024-02-22T01:59:56Z

Something similar was encountered in hierarchical clustering. In that case we wanted the user to choose, after fitting, the height of the dendrogram. There was quite a bit of discussion and the best we came up with was to return a method in the report to let the user predict with a user-specified height. (A hyperparameter controls the default height for the ordinary predict).

I do have the idea to allow more flexible predict arguments in the LearnAPI.jl proposal, but I think it's difficult to justify in MLJ for small number of use-cases, as adding this is quite complicated.

Allow idx to be specified at prediction time

8cee5ca

MilesCranmer added 7 commits February 17, 2024 19:11

Add test for selection method

be200b9

Clean up MLJ code

bd6d1f7

Refactor evaluation within prediction

67ec365

Better naming scheme

1ae2f0a

Reduce code complexity

1fd4781

Reduce code diff

908141e

Update docs for index selection API

22b5b7b

MilesCranmer force-pushed the fix-selection-method branch from e6d35cc to 22b5b7b Compare February 17, 2024 19:12

Proper return value

b5d56e0

MilesCranmer enabled auto-merge February 17, 2024 20:26

MilesCranmer merged commit ccea30f into master Feb 17, 2024
17 checks passed

MilesCranmer deleted the fix-selection-method branch February 17, 2024 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to select expression from Pareto front for evaluation #289

Make it easier to select expression from Pareto front for evaluation #289

MilesCranmer commented Feb 17, 2024 •

edited

Loading

github-actions bot commented Feb 17, 2024 •

edited

Loading

ablaom commented Feb 21, 2024

MilesCranmer commented Feb 21, 2024 •

edited

Loading

ablaom commented Feb 22, 2024

Make it easier to select expression from Pareto front for evaluation #289

Make it easier to select expression from Pareto front for evaluation #289

Conversation

MilesCranmer commented Feb 17, 2024 • edited Loading

github-actions bot commented Feb 17, 2024 • edited Loading

Benchmark Results

Benchmark Plots

ablaom commented Feb 21, 2024

MilesCranmer commented Feb 21, 2024 • edited Loading

ablaom commented Feb 22, 2024

MilesCranmer commented Feb 17, 2024 •

edited

Loading

github-actions bot commented Feb 17, 2024 •

edited

Loading

MilesCranmer commented Feb 21, 2024 •

edited

Loading