-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broaden MLJ target_scitype
only when using TemplateExpression
#392
base: master
Are you sure you want to change the base?
Conversation
37e7021
to
a394da2
Compare
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
Pull Request Test Coverage Report for Build 12509324821Details
💛 - Coveralls |
643f610
to
0a916a2
Compare
Thanks for this. Unfortunately, I'm getting a new problem. It looks like certain kinds of table are not supported. And training is extraordinarily slow. using SymbolicRegression
using MLJBase
using Tables
M = SRRegressor
X = (x1 = rand(5), x2 = rand(5))
y = rand(5)
mach = machine(M(), X, y)
@time fit!(mach, verbosity=0) # second attempt
# 6.551545 seconds (209.99 M allocations: 10.681 GiB, 39.67% gc time, 0.10% compilation time: 100% of which was recompilation)
X = Tables.rowtable(X)
machine(M(), X, y) |> fit!
# julia> machine(M(), X, y) |> fit!
# [ Info: Training machine(SRRegressor(defaults = nothing, …), …).
# ┌ Error: Problem fitting the machine machine(SRRegressor(defaults = nothing, …), …).
# └ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:694
# [ Info: Running type checks...
# [ Info: Type checks okay.
# ERROR: MethodError: no method matching haskey(::Vector{@NamedTuple{x1::Float64, x2::Float64}}, ::Symbol)
# The function `haskey` exists, but no method is defined for this combination of argument types.
|
Is this related to this PR or is it a different issue? Is this PR itself ok to merge though? I can work on that issue afterwards |
If I run your example I get julia> @time fit!(mach, verbosity=0)
3.370594 seconds (202.80 M allocations: 10.300 GiB, 22.58% gc time, 1.83% compilation time) Doesn't seem too bad? SRRegressor doesn't change the default parameters if the dataset is small. |
Regarding |
MLJTestIntegration.jl provides more comprehensive testing, and a second signature for However, to improve coverage, you could add MLJTestIntegration tests following the following example, to test with the auto-generated datasets using MLJTestInterface
using SymbolicRegression
using MLJTestIntegration
MLJTestIntegration.test(
M = SRRegressor;
verbosity=0, # bump to debug
throw=true,
level=2,
) Use (Some of the advanced tests require that models be truly reproducible with default hyperparameters. However, as I found this is not always the case, we check this manually before proceeding; for slow models, this can add a bit of test time.)
The first signature above applies a battery of MLJ integration tests to a collection of
The second signature applies the same tests to a single Ordinarily, code defining the model types to be tested must already be The extent of testing is controlled by
By default, exceptions caught in tests are not thrown. If Return valueThe first signature returns
In the special case of The second signature returns a vector of exceptions encountered. Testing with automatic code loadingWorld Age issues pose challenges for testing Julia code if some code ExamplesTesting models in a new MLJ model interface implementationThe following tests the model interface implemented by some model type import MLJTestIntegration
using Test
X, y = MLJTestIntegration.MLJ.make_blobs()
failures, summary = MLJTestIntegration.test([MyClassifier, ], X, y, verbosity=1, mod=@__MODULE__)
@test isempty(failures) Testing models after filtering models in the registryThe following applies comprehensive integration tests to all Pkg.add("MLJGLMInterface")
import MLJBase, MLJTestIntegration
using DataFrames # to view summary
X, y = MLJTestIntegration.MLJ.make_regression();
regressors = MLJTestIntegration.MLJ.models(matching(X, y)) do m
m.package_name == "GLM"
end
# to test code loading *and* load code:
MLJTestIntegration.test(regressors, X, y, verbosity=1, mod=@__MODULE__, level=1)
# comprehensive tests:
failures, summary =
MLJTestIntegration.test(regressors, X, y, verbosity=1, mod=@__MODULE__, level=3)
summary |> DataFrame List of testsTests are applied in sequence. When a test fails, subsequent tests for
These additional tests are applied to
|
I should add that, in our experience, it is rare to find models that pass the interface tests but fail the integrations tests. |
Got it, thanks. I will likely add those integration tests in the future
Just for reference, the SRRegressor can be made reproducible with the following setup: model = SRRegressor(
parallelism=:serial,
deterministic=true,
seed=0 # or whatever other seed
) It's a bit slower but it does guarantee reproducibility. The default Even things like SIMD operations mess with determinism because of things like julia> (0.1 + 0.2) - 0.3
5.551115123125783e-17
julia> 0.1 + (0.2 - 0.3)
2.7755575615628914e-17 |
The test for reproducabiliity is not that fussy, it uses |
Oh what I meant by |
0a916a2
to
1be49f4
Compare
1be49f4
to
29ddf75
Compare
fix: make `:class` col more generic to `X` type
Ok I merged #393 here |
Fixes #390
@ablaom how is this?