-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Overhaul #187
Comments
1.pretty much depends on how often do users use hall_of_fame, state = equation_search(dataset; options, return_state=true)
dominating = calculate_pareto_curve(hall_of_fame, dataset, options) according to you
so it seems we should just have dataframe, state = equation_search(dataset; options, return_state=true) side note: I don't like On DataFrameI strongly recommend you NOT to depend on DataDrivenDiffEq and MLJboth are good ideas, for MLJ you just want to make an interface package and register it with MLJ ecosystem; I don't know about SciML convention. |
Good tips, thanks! Yes perhaps that is best. e.g., could return a single object More importantly, printing Then, one could pass either Then, there could be new lightweight frontends for MLJ and SciML. |
I am leaning towards an MLJ-style interface. I think the statefulness of the Regressor objects is nice for warm starts, and would be nice for plotting diagnostic info. This might take the form of some sort of extension package that would load if users also import MLJ.jl. |
I wonder if it should come with both a SciML interface (via ModelingToolkit.jl?), and an MLJ one. And the base interface defines internal types for an MLJ-style model setup. |
Drafted the following
When you see this in a REPL, the comments are printed in a light grey color. |
My 2c: 1/2A 2-style interface using a Regarding returning the Pareto curve or the entire HoF, how about having something like this: hall_of_fame, state = equation_search_full(dataset, options)
dominating, state = equation_search(hall_of_fame, dataset, options)
dominating, state = equation_search(dataset, options) With two equation_search(dataset, options) =
equation_search(equation_search_full(dataset, options), dataset, options) Alternatively, as has already come up, full_result = equation_search_full(dataset, options)
dominating_result = equation_search(full_result)
dominating_result = equation_search(dataset, options) Actually, if you made the 3This sound like maybe it could be good as a package extension to DataDrivenDiffEq. 4This sounds like it could be a good package extension to have here. I don't think MLJ interface packages make as much sense now we have package extensions. |
Speaking of 4., I have an attempt here: #226. Indeed I think it makes the most sense to put it in an extension. I like your ideas for 1-2. I’ll think more about this. |
Moving to mid-importance now that the MLJ interface has matured. Remaining API changes would be to improve the low-level interface. |
(Finished a while ago) |
Nice! |
In either version
0.16 or0.17 of SymbolicRegression.jl, I would like to do a big API overhaul, to make the package easier to use on the Julia side. I think the current API has stuck around too long and needs to be cleaned up a bit. The PySR frontend is a bit more developed and users seem to find it easy to use, so I'd like to do the same thing here.I have a few different ideas but I'm interested in hearing opinions from all users with interests in this package, as I'm not sure the best way forward. The API should make SymbolicRegression.jl: (1) easier to use, and (2) easier to interface with other tools.
1. Maintain API, with a few tweaks
This is basically just renaming the current API to PascalCase for types, snake_case for functions/parameters. This would return a
HallOfFame
object, and astate
that the user can pass back toequation_search
to continue the search. However, I'm not a big fan of this because it requires the user to go figure out the different types and make a few different calls which they would need to find from the docs.2. Return a
DataFrames.DataFrame
objectThis is similar to 1, but the returned object would be a
DataFrame
object from the DataFrames.jl package, with columns:[:equation, :complexity, :loss, :score]
. So it would be easier for the user to query. More importantly, I would only return the dominating pareto curve, rather than the entire hall of fame (I doubt anybody wants the entire curve anyways). The user could sort and query this object as they please. It's not too much of an API change and could make it easier for users to use.3. Tighten interface with DataDrivenDiffEq.jl
It might be nice to have a tighter interface with
DataDrivenDiffEq.jl
, which has its own frontend for SymbolicRegression.jl: https://docs.sciml.ai/DataDrivenDiffEq/stable/libs/datadrivensr/examples/example_01/, as well as some other algorithms. e.g., this could look like:I'm not sure whether it makes sense to integrate with the API on the side of SymbolicRegression.jl though; maybe it's simpler to just have a simple and fully-general core API that others like DataDrivenDiffEq.jl can use in their unified APIs.
4. Integrate with MLJ.jl
It has been nice to integrate PySR with scikit-learn as it lets users stick it into existing sklearn tuning pipelines. Maybe MLJ.jl is the Julia version of that? e.g., something like
This might be nice to take advantage of the API developed in PySR, where, e.g.,
m(X, 2)
would get predictions from the 2nd equation. It might also make it easier for users to restart fits, as they wouldn't need to move around a separatestate
object. I guess due to the similarities with scikit-learn, it might feel more automatic to users as well?In general I think it's preferable to make it easy for users to look at the output equations and plot them (this is the major difference between symbolic regression and typical ML algorithms). Maybe some kind of
plot(m::FittedSRRegressor, X, y)
would be nice for plotting the different equations.Please let me know what you think and any suggestions.
cc'ing anyone who might be interested. I am eager to hear your ideas! @AlCap23 @ChrisRackauckas @kazewong @johanbluecreek @CharFox1 @Jgmedina95 @patrick-kidger @Moelf @qwertyjl @Remotion @anicusan
The text was updated successfully, but these errors were encountered: