Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: TemplateExpression with parameters #394

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Dec 26, 2024

@gm89uk @Andrea-gm here's an attempt at MilesCranmer/PySR#787. If you get a chance to try, I'd be very interested to hear what you think.

The API is like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

You need to specify how many total parameters you ask for in advance. Usually this is just length(unique(i)) for some vector i that contains the categorization of each row. But you can have multiple categorizations too – just pass those as additional columns to the dataset. (They will get converted back to integers when indexing p here)

Those parameters can be used in any way you want. You can pass them to one of the subexpressions:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x, p[i]) + y; num_parameters=2
)

which basically let's you do what ParametricExpression does, but in a fixed functional form way.

You can also get parameters manually, like this:

structure = TemplateStructure{(:f,)}(
    ((; f), (x, y), p) -> f(x) + p[1] * y; num_parameters=1
)

Where you can see we just ask for a single parameter in the structure.

Full example:

using SymbolicRegression
using Random: MersenneTwister
using MLJBase: machine, fit!, report, matrix

# structure => f(x) + p[1] * y, single param
struct_search = TemplateStructure{(:f,)}(
    ((; f), (x, y, i), p) -> f(x) + p[i] * y; num_parameters=2
)

rng = MersenneTwister(0)

# We'll create 2 features, and 1 categorical variable (with 2 categories total)
X = (; x=rand(rng, 32), y=rand(rng, 32), i=rand(rng, 1:2, 32))
true_params = [0.5, -3.0]
true_f(x) = 0.5 * x * x

y = [true_f(X.x[i]) + true_params[X.i[i]] * X.y[i] for i in 1:32]

model = SRRegressor(;
    niterations=20,
    binary_operators=(+, *, -),
    unary_operators=(),
    expression_type=TemplateExpression,
    expression_options=(; structure=struct_search),
    early_stop_condition=(l, c) -> l < 1e-6 && c == 5,
)

mach = machine(model, X, y)
fit!(mach)

r = report(mach)
best_expr_idx = findfirst(
    i -> r.losses[i] < 1e-6 && r.complexities[i] == 5, 1:length(r.equations)
)
best_expr = r.equations[best_expr_idx]

# We can check that we recovered the `true_params`,
# since this problem fully constrains them:
params = get_metadata(best_expr).parameters
@test isapprox(params, true_params; atol=1e-3)

This comment was marked as resolved.

@coveralls
Copy link

coveralls commented Dec 26, 2024

Pull Request Test Coverage Report for Build 12531289252

Details

  • 92 of 127 (72.44%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.9%) to 93.812%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/TemplateExpression.jl 88 123 71.54%
Files with Coverage Reduction New Missed Lines %
src/ComposableExpression.jl 2 96.55%
Totals Coverage Status
Change from base Build 12521558752: -0.9%
Covered Lines: 3214
Relevant Lines: 3426

💛 - Coveralls

@gm89uk
Copy link

gm89uk commented Dec 26, 2024

Apologies in advance for the delay Miles, I'm away over Christmas, and won't be able to test it out till the 28th. Happy holidays

@Andrea-gm
Copy link

Hi Miles. Apologies for the later reply, I am also on and off these days. The API looks great to me, thanks so much for it!
Is it already possible to test this from Python? I am unfortunately not familiar with julia

@gm89uk
Copy link

gm89uk commented Dec 29, 2024

Hi Miles,

Really like the syntax although I misunderstood that parameters = length(unique(i))

For full clarity:
I interpreted (and with ParametricExpression) that parameters were the number of parameters to optimise for each categorial variable. i.e. if you had 1 categorial variable with 4 categories (1,2,3,4), and you set number of parameters to 1, then it would optimise p1 (or whatever) to 4 different values within a function, one for each category; but actually you mean in this instance we should have number of parameters to be 4?

If batching is running, then length(unique(lens)) may be variable with each run.

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?
Nevertheless, a little stuck, thank you again for your patience.
I removed and ran ] add SymbolicRegression#master

structure = TemplateStructure{(:f,)}(
  ((; f), (x1, x2, x3, x4, x5, x6, y, cat), c) -> begin
    o = f(x1, x2, x3, x4, x5, x6, c[cat]) 
    if !o.valid
        return ValidVector(Vector{Float32}(undef, length(o.x)), false)
    end
    #Compute gradients for first 3 variables at once
    for varidx in 1:3
    	grad_column = D(f, varidx)(x1, x2, x3, x4, x5, x6, c[cat]).x
    	if !(all(grad_column .>= 0) || all(grad_column .<= 0))
        	return ValidVector(fill(Float32(1e9), length(o.x)), true)
    	end
    end
    return o
  end;
  num_parameters = length(unique(cat)) #I understand that cat is not defined here, but how do I pass a variable number of parameters so that this code works with batching?
)
model = SRRegressor(
    niterations=1000000,
    binary_operators=[+,-,*,/],
    maxsize=50,
    bumper=true,
    turbo=true,
    populations=18,
    expression_type = TemplateExpression,
    expression_options = (; structure),
    population_size=100
    parsimony = 0.01,
    batching=true,
)
mach = machine(model, X, y)
fit!(mach)

@MilesCranmer
Copy link
Owner Author

So lets say i has 4 categories, 1,2,3,4. To find the form p*sin(x) = y, I should set number of parameters to be 4 or 1?

And if i had 4 categories, and you were searching the equation p1 * sin(x) - p2 = y, then you would set the number of parameters to 8.

In other words, num_parameters is literally the length of the vector c in your example. You can use that vector in any way you want – which makes it super flexible.

Note that confusingly, num_parameters here (still just an idea, nothing is fixed) is different than ParametricExpression's max_parameters, which in this case, would be 1 and 2 respectively, rather than 4 and 8. This is because the ParametricExpression automatically infers the length of c based on the number of categories. Whereas here, we are just setting the length explicitly.

The one downside is you do need to do a bit more work to set up a problem. But I feel like it's so explicit and flexible that it's worth it, and maybe even easier to understand?


This also means you could have two category types: i1 and i2. If i1 had 4 categories, and corresponded to the parameter p1, and i2 had 2 categories for parameter p2, then you would set num_parameters to 6.


Do you think num_parameters is the right word? Or is there a better term here that might differentiate it with respect to the ParametricExpression max_parameters?

@MilesCranmer
Copy link
Owner Author

What about num_parameter_values instead of num_parameters? Then it contrasts a bit with ParametricExpression's max_parameters which is more like the number of parameter variables, rather than the number of actual scalar values.

@MilesCranmer MilesCranmer force-pushed the template-parametric-expression branch from ecb3050 to 415fc5c Compare December 29, 2024 01:21
@MilesCranmer
Copy link
Owner Author

Oh and to try this PR, you need to do

]add SymbolicRegression#template-parametric-expression

@gm89uk
Copy link

gm89uk commented Dec 29, 2024

Thank you Miles, it works well now, I was half asleep from a very delayed flight and didn't realise it wasn't merged yet!

structure = TemplateStructure{(:f,)}(
  ((; f), (x1, x2, x3, x4, x5, x6, y, cat), c) -> begin
    o = f(x1, x2, x3, x4, x5, x6, c[cat]) 
    ...
    return o
  end;
  num_parameters = 12
)

Here if I set num_parameters to 12 (which it is for the whole dataset, but batching is true and num_parameters is only 1 or 2 for that particular batch which calls upon structure, does it matter if num_parameters > length(unique(cat)) in that particular batch?

The code seems to be running ok otherwise.

  1. I think the current naming is a little confusing. I think the word parameter is overloaded here.
    "In other words, num_parameters is literally the length of the vector c in your example. You can use that vector in any way you want – which makes it super flexible."

Would it better to name it exactly what it is total_num_unique_cat although a bit unwieldy?
There is no way to have this automatically determined based on feeding in a tuple of categorial variables initially to removal any confusion?

  1. In the event you have >1 category across multiple functions, f and g, would you still total the number of params as before? So:
structure = TemplateStructure{(:f,:g)}(
  ((; f, g), (x1, x2, x3, x4, x5, x6, y, i1, i2), c) -> begin
    o = f(x1, x2, x3, x4, x5, x6, c[i1]) + g(x1,c[i2])
    ...
    return o
  end;
  num_parameters = 6
)

if i1 had 4 unique categories and i2 had 2 unique categories?

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 29, 2024

Agreed on parameter being an overloaded word here.

total_num_unique_cat

Well technically you could use the same category but have it correspond to multiple parameters. Like cat in 1 to 4 could be used for parameters p1, p2, p3 -> which means 12 independent scalars even though there are only 4 unique categories.

The way you would set it up for this would be:

((; f), (x1, x2, cat), p) -> let
    p1 = p[cat] # gets value 1 to 4
    p2 = p[cat + 4] # gets value 5 to 8
    p3 = p[cat + 8] # gets value 9 to 12

    # for example:
    f(x1) + p1 * x2 + p2 * x2^2 + p3 * x2^3
end

So the number of parameter scalar values is 12, even though there are 4 categories in the dataset.

The other tricky thing is that this p vector doesn’t need to be used for this type of pattern. You could actually just use the p vector to learn a single constant for some sort of functional form — no categories needed. (As an alternative to giving a f() - which is less efficient). So I’m imagining someone could use this to learn parameters for a complex function that they are calling within the template structure.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 29, 2024

In your example it would need

- o = f(x1, x2, x3, x4, x5, x6, c[i1]) + g(x1,c[i2])
+ o = f(x1, x2, x3, x4, x5, x6, c[i1]) + g(x1,c[i2+4])

​​

Assuming that i1 is from 0 to 3 and i2 is from 0 to 1. But if you let i2 be from 4 to 5, then you wouldn’t need the offset.

@MilesCranmer
Copy link
Owner Author

I can’t think of another API here that is as flexible and generic. But clearly this API is a bit confusing for this common type of pattern. So maybe try to brainstorm some alternatives if you have any ideas.

@MilesCranmer
Copy link
Owner Author

Oh here’s one other idea:

structure = TemplateStructure{(:f, :g)}(
    parameters = (p1=4, p2=2),  # Each parameter has its own range 1:n
    template = ((; f, g), (; p1, p2), (x1, x2, i1, i2)) -> 
        f(x1) + p1[i1] + g(x2) + p2[i2]
)

So basically you would provide the parameter groups you want (p1, p2) and then say how large you want each of them to be. It’s cleaner than needing to manually offset things. The implementation is probably easy too.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 29, 2024

And actually we could infer the number of parameters with this. If we write

structure = TemplateStructure{(:f, :g), (:p1, :p2)}(
    template = ((; f, g), (; p1, p2), (x1, x2, i1, i2)) -> 
        f(x1) + p1[i1] + g(x2, p2[i2])
)

Then technically you could have the constructor figure out that p1 is indexed by i1, and compute the number of parameters needed in the forward pass. But I’m not sure if we should do that or not, since it’s a bit “spooky action at a distance”? If things are explicit it might be easier for the user to see how to modify it to an unusual use case. But maybe not, I’m not sure!

@gm89uk
Copy link

gm89uk commented Dec 29, 2024

Hi Miles,

I have to say (with a large disclaimer) that the last suggestion makes much more intuitive sense for me. The disclaimer is this is not my profession and I just do this stuff in my spare time when I get the chance as it greatly adds to my research output.

What might not be intuitive for me might make perfect sense to other users.

Maybe this modification is best of both worlds?

((; f), (x1, x2, cat), p) -> let
    p1 = p[cat] # gets value 1 to 4
    p2 = p[cat + 4] # gets value 5 to 8
    p3 = p[cat + 8] # gets value 9 to 12

    # for example:
    f(x1) + p1 * x2 + p2 * x2^2 + p3 * x2^3
end

In this example, cat presumably will be a vector of ints ranging from 1 to 4, one category with 4 members of that category. If I am understanding right p[cat] would mean 4 parameters for p1 (one for each member of cat), and by setting 5 to 8, it is finding an additional 4 parameters for each member of cat, hence p2.

I would have thought it makes more sense to have p take a tuple p[cat,<parameter_num>]. So that the input would be:

((; f), (x1, x2, cat), p) -> let
    p1 = p[cat,1] # gets value 1 to 4
    p2 = p[cat,2] # gets value 5 to 8
    p3 = p[cat,3] # gets value 9 to 12

    # for example:
    f(x1) + p1 * x2 + p2 * x2^2 + p3 * x2^3
end

Then the constructor would use a mod and length(unique(cat)) to convert it, something like (although I know it would need more tweaking).

p1 = p[mod(1,length(unique(cat)) * length(unique(cat))] #which should return 4
p2 = p[mod(2,length(unique(cat)) * length(unique(cat))] #which should return 8
p3 = p[mod(3,length(unique(cat)) * length(unique(cat))] #which should return 12

Hopefully that would give you the same functionality but without the user having to keep track of the number of unique categories within each categorial variable, but only have to keep track of how many parameters they have called on for each categorial variable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants