[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

amontoison · 2024-07-22T05:04:10Z

As discussed with @mlubin during JuMP-dev 2024, we recently observed with Guillaume Dalle (@gdalle) that JuMP detects additional non-zeros on the diagonal of the Hessian than what is needed.

We also observed other discrepancies:

I don't think that it's a bug, just a part of the code that was probably not optimized for corner cases.

odow · 2024-07-22T05:09:07Z

I think this is expected. We don't try to compress the non-zero terms, even in trivial cases. Is there any evidence that this is a performance concern?

odow · 2024-07-22T05:14:03Z

As a trivial example, a problem with one variable can have > 1 element in the Hessian:

julia> using JuMP, Ipopt

julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> @variable(model, x)
x

julia> @objective(model, Min, sin(x))
sin(x)

julia> @constraint(model, sin(x) >= 0)
sin(x) - 0.0 ≥ 0

julia> optimize!(model)
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        2

amontoison · 2024-07-22T05:18:06Z

The issue is not that you don't compress the non-zero terms; it is that you detect non-zero terms where there are zeros in the Hessian.

If these additional nnz and on the diagonal, it doesn't impact the coloring and thus the performance.
You just allocate more storage than needed (+n in the worst case).

If the detected non-zeros are not on the diagonal, you might require more colors (and thus more directional derivatives) to retrieve all non-zeros of the Hessian of the Lagrangian.

odow · 2024-07-22T05:25:55Z

Do you have an example?

odow · 2024-07-22T05:33:16Z

Is the claim something like this:

julia> import MathOptInterface as MOI

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> objective = :(sin($(x[1])) + $(x[2])^1)
:(sin(MOI.VariableIndex(1)) + MOI.VariableIndex(2) ^ 1)

julia> MOI.Nonlinear.set_objective(model, objective)

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
2-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)

Where (2, 2) in the Hessian is always 0.0?

Are there examples there this matters?

amontoison · 2024-07-22T05:47:19Z

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@NLconstraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@NLconstraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@NLconstraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(row, cols, vals)

julia> sparse(row, cols, vals)
8×8 SparseMatrixCSC{Int64, Int64} with 13 stored entries:
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  1  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  1  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅  1  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  1

amontoison · 2024-07-22T05:55:58Z

It seems that the new nonlinear API fixed something 🤔

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@constraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@constraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@constraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
sparse(rows, cols, vals, 8, 8)

8×8 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅

Benoît helped me to support it today and I never tested it before tonight.

Update: I don't use the non linear API in that case, the constraints are quadratic.

odow · 2024-07-22T05:57:10Z

Your @NLconstraints are actually quadratic, so you don't need to do AD on them.

Here's the underlying issue:

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> MOI.Nonlinear.set_objective(model, :($(x[1]) * $(x[2])))

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (2, 1)

odow · 2024-07-22T06:00:20Z

So I assume somewhere we are detecting Hessian sparsity and if op(x, y) then we are adding (x, x), (x, y), and (y, y). We should special-case *.

amontoison · 2024-07-22T06:16:15Z

Nice catch Oscar 👍
I will try to isolate an example tomorrow with additional non-zeros that are not on the diagonal.

amontoison · 2024-07-22T22:48:25Z

@odow I found a simple example with a coefficient off-diagonal:

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

x0 = [1745, 12000, 110, 3048, 1974, 89.2, 92.8, 8, 3.6, 145]
lvar = [0.00001, 0.00001, 0.00001, 0.00001, 0.00001, 85, 90, 3, 1.2, 145]
uvar = [2000, 16000, 120, 5000, 2000, 93, 95, 12, 4, 162]
@variable(nlp, lvar[i] ≤ x[i = 1:10] ≤ uvar[i], start = x0[i])

a = 0.99
b = 0.9

@expression(nlp, g1, 35.82 - 0.222 * x[10] - b * x[9])
@expression(nlp, g2, -133 + 3 * x[7] - a * x[10])
@expression(nlp, g5, 1.12 * x[1] + 0.13167 * x[1] * x[8] - 0.00667 * x[1] * x[8]^2 - a * x[4])
@expression(nlp, g6, 57.425 + 1.098 * x[8] - 0.038 * x[8]^2 + 0.325 * x[6] - a * x[7])

@constraint(nlp, g1 ≥ 0)
@constraint(nlp, g2 ≥ 0)
@constraint(nlp, -g1 + x[9] * (1 / b - b) ≥ 0)
@constraint(nlp, -g2 + (1 / a - a) * x[10] ≥ 0)
@constraint(nlp, g5 ≥ 0)
@constraint(nlp, g6 ≥ 0)
@constraint(nlp, -g5 + (1 / a - a) * x[4] ≥ 0)
@constraint(nlp, -g6 + (1 / a - a) * x[7] ≥ 0)
@constraint(nlp, 1.22 * x[4] - x[1] - x[5] == 0)
@constraint(nlp, 98000 * x[3] / (x[4] * x[9] + 1000 * x[3]) - x[6] == 0)
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

@objective(nlp, Min, 5.04 * x[1] + 0.035 * x[2] + 10 * x[3] + 3.36 * x[5] - 0.063 * x[4] * x[7])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)

10×10 SparseMatrixCSC{Int64, Int64} with 16 stored entries:
 3  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅  ⋅  1  ⋅  ⋅  ⋅
 2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  4  ⋅  ⋅
 ⋅  ⋅  1  1  ⋅  ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅

I don't understand why H[5,2] is a non-zero.

I tried:

@constraint(nlp, x[2] / x[1] + x[5] / x[1] - x[8] == 0)

instead of

@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

and H[5,2] is not considered anymore as a non-zero.

amontoison · 2024-07-22T23:19:05Z

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

@variable(nlp,  x[i = 1:10])
@constraint(nlp, sum(x[i] for i = 1:10) / x[1]  == 0)
@objective(nlp, Min, x[1]^4)

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)

The Hessian is completely dense 😨

10×10 SparseMatrixCSC{Int64, Int64} with 55 stored entries:
 2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  1  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  1  1  ⋅  ⋅  ⋅
 1  1  1  1  1  1  1  1  ⋅  ⋅
 1  1  1  1  1  1  1  1  1  ⋅
 1  1  1  1  1  1  1  1  1  1

odow · 2024-07-23T00:03:07Z

I think this is even somewhat expected from our sparsity algorithm. We don't recognize that sum(x) / y is decomposable. The sparsity algorithm trades simplicity for some false positives. I guess the sparseconnectivitytracer trades exactness for some cases where it is much slower than JuMP. I can see how this one might be problematic:
https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/458d8141bbe7ffc4552c713ec755241de7b4fa15/src/PureJuMP/tetra.jl#L37-L62

I'm still interested in seeing a model where our choice makes an important runtime difference (other than artificial examples like the one above).

odow · 2024-09-18T23:56:47Z

Perhaps this issue is really asking for better insight on how different formulations might perform:

julia> using NLPModels, NLPModelsJuMP, JuMP, SparseArrays

julia> function nlp_hessian(model)
           nlp = MathOptNLPModel(model)
           rows, cols = hess_structure(nlp)
           n = num_variables(model)
           return SparseArrays.sparse(rows, cols, ones(Int, length(rows)), n, n)
       end
nlp_hessian (generic function with 5 methods)

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x) / x[1]  == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 10 stored entries:
 1  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅
 1  1  1  ⋅
 1  1  1  1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x[i] / x[1] for i in 1:4) == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 7 stored entries:
 1  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅
 1  ⋅  1  ⋅
 1  ⋅  ⋅  1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @variable(model, y)
           @constraint(model, y == sum(x))
           @constraint(model, y / x[1] == 0)
           nlp_hessian(model)
       end
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  1

gdalle · 2024-09-19T04:31:32Z

That's a really cool insight actually!

odow · 2024-12-17T06:04:29Z

This came up again in https://discourse.julialang.org/t/too-many-nonzeroes-for-nlp-in-jump-and-examodels/123901

odow added Project: next-gen nonlinear support Issues relating to nonlinear support Submodule: Nonlinear About the Nonlinear submodule and removed Project: next-gen nonlinear support Issues relating to nonlinear support labels Jul 22, 2024

odow changed the title ~~[NLP] Sparsity pattern of Hessian~~ [Nonlinear] sparsity pattern of Hessian Jul 22, 2024

odow changed the title ~~[Nonlinear] sparsity pattern of Hessian~~ [Nonlinear] sparsity pattern of Hessian with :(x * y) Jul 22, 2024

odow mentioned this issue Sep 19, 2024

Tools to test and debug JuMP models jump-dev/JuMP.jl#3664

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

amontoison commented Jul 22, 2024 •

edited

Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024 •

edited

Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024 •

edited

Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024

odow commented Jul 23, 2024

odow commented Sep 18, 2024

gdalle commented Sep 19, 2024

odow commented Dec 17, 2024

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

Comments

amontoison commented Jul 22, 2024 • edited Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024 • edited Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024 • edited Loading

amontoison commented Jul 22, 2024 • edited Loading

odow commented Jul 22, 2024

odow commented Jul 22, 2024

amontoison commented Jul 22, 2024

amontoison commented Jul 22, 2024 • edited Loading

amontoison commented Jul 22, 2024

odow commented Jul 23, 2024

odow commented Sep 18, 2024

gdalle commented Sep 19, 2024

odow commented Dec 17, 2024

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024 •

edited

Loading

amontoison commented Jul 22, 2024 •

edited

Loading