Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to select expression from Pareto front for evaluation #289

Merged
merged 9 commits into from
Feb 17, 2024

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Feb 17, 2024

This enables the following syntax:

model = SRRegressor()
mach = machine(model, X, y)
fit!(mach)

# Predict with 3rd equation:
predict(mach, (data=X, idx=3))

# Predict with most complex equation:
r = report(mach)
predict(mach, (data=X, idx=lastindex(r.equations)))

which lets the user specify the equation they wish to use for prediction from the Pareto front.

For multiple outputs:

model = MultitargetSRRegressor()
mach = machine(model, X, y)
fit!(mach)


# Choose the 1st equation for output 1, 10th for output 2, and 5th for output 3:
predict(mach, (data=X, idx=[1, 10, 5]))

TODO:


@ablaom I would be interested to know if there is any way to make this sort of behavior compatible with MLJ? As it stands I think this might break some MLJ interfaces for users who wish to use this. (They would still be able to use selection_method parameter, but only before fit!)

Copy link
Contributor

github-actions bot commented Feb 17, 2024

Benchmark Results

master b5d56e0... t[master]/t[b5d56e0...]
search/multithreading 24 ± 0.19 s 24.4 ± 1.8 s 0.985
search/serial 33.8 ± 0.62 s 33.9 ± 1.2 s 0.995
utils/best_of_sample 1.44 ± 0.58 μs 1.36 ± 0.49 μs 1.06
utils/check_constraints_x10 12.9 ± 3.4 μs 12.6 ± 3.4 μs 1.02
utils/compute_complexity_x10/Float64 2.25 ± 0.11 μs 2.27 ± 0.13 μs 0.992
utils/compute_complexity_x10/Int64 2.26 ± 0.12 μs 2.27 ± 0.12 μs 1
utils/compute_complexity_x10/nothing 1.44 ± 0.1 μs 1.44 ± 0.13 μs 0.999
utils/optimize_constants_x10 31.3 ± 6.8 ms 30.6 ± 7.1 ms 1.02
time_to_load 1.37 ± 0.052 s 1.39 ± 0.056 s 0.983

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@MilesCranmer MilesCranmer merged commit ccea30f into master Feb 17, 2024
17 checks passed
@MilesCranmer MilesCranmer deleted the fix-selection-method branch February 17, 2024 20:52
MilesCranmer added a commit that referenced this pull request Feb 19, 2024
[Diff since v0.23.1](v0.23.1...v0.23.2)

**Merged pull requests:**
- Formatting overhaul (#278) (@MilesCranmer)
- Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer)
- Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer)

**Closed issues:**
- Garbage collection too passive on worker processes (#237)
- How can I set the maximum number of nests? (#285)
MilesCranmer added a commit that referenced this pull request Feb 19, 2024
[Diff since v0.23.1](v0.23.1...v0.23.2)

**Merged pull requests:**
- Formatting overhaul (#278) (@MilesCranmer)
- Avoid julia-formatter on pre-commit.ci (#279) (@MilesCranmer)
- Make it easier to select expression from Pareto front for evaluation (#289) (@MilesCranmer)

**Closed issues:**
- Garbage collection too passive on worker processes (#237)
- How can I set the maximum number of nests? (#285)
@ablaom
Copy link

ablaom commented Feb 21, 2024

Sorry for not catching this earlier. This looks very breaking to me. In MLJ you cannot attach metadata to data at predict time.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Feb 21, 2024

I wasn't sure the proper way to incorporate this. SRRegressor basically fits an ensemble of models (each equation) and the user requires an interface to specify which model they wish to evaluate. The default behavior (predict(mach, X)) uses some selection method for picking from the ensemble. But I wasn't sure how to let the user decide themselves.

@ablaom
Copy link

ablaom commented Feb 22, 2024

Something similar was encountered in hierarchical clustering. In that case we wanted the user to choose, after fitting, the height of the dendrogram. There was quite a bit of discussion and the best we came up with was to return a method in the report to let the user predict with a user-specified height. (A hyperparameter controls the default height for the ordinary predict).

I do have the idea to allow more flexible predict arguments in the LearnAPI.jl proposal, but I think it's difficult to justify in MLJ for small number of use-cases, as adding this is quite complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants