Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shifted MINRES Allocations Issue #937

Open
farhadrclass opened this issue Dec 11, 2024 · 7 comments
Open

Shifted MINRES Allocations Issue #937

farhadrclass opened this issue Dec 11, 2024 · 7 comments

Comments

@farhadrclass
Copy link

Description:

The MINRES solver unexpectedly allocates memory when using shifted computations, even though it should not. My tests indicate that the solver does not maintain zero allocations as expected, specifically when running the MINRES_lambda test case.

Steps to Reproduce:

Below is the test code used to verify memory allocation for the shifted MINRES solver:

for FC in (Float32, Float64)
  @testset "Data Type: $FC" begin

    A   = FC.(get_div_grad(18, 18, 18))  # Dimension m x n
    m, n = size(A)
    k   = div(n, 2)
    Au  = A[1:k,:]          # Dimension k x n
    Ao  = A[:,1:k]          # Dimension m x k
    b   = Ao * ones(FC, k)  # Dimension m
    c   = Au * ones(FC, n)  # Dimension k
    λ = FC(1.3)

    T = real(FC)
    shifts = T[1; 2; 3; 4; 5]
    nshifts = 5
    nbits_FC = sizeof(FC)  # 8 bits for ComplexF32 and 16 bits for ComplexF64
    nbits_T = sizeof(T) 

    @testset "MINRES" begin
      # MINRES needs:
      # 6 n-vectors: x, r1, r2, w1, w2, y
      storage_minres_bytes(n) = nbits_FC * 6 * n

      expected_minres_bytes = storage_minres_bytes(n)
      minres(A, b)  # warmup
      actual_minres_bytes = @allocated minres(A, b)
      @test expected_minres_bytes  actual_minres_bytes  1.02 * expected_minres_bytes

      solver = MinresSolver(A, b)
      minres!(solver, A, b)  # warmup
      inplace_minres_bytes = @allocated minres!(solver, A, b)
      @test inplace_minres_bytes == 0
    end

    @testset "MINRES_lambda" begin
      # MINRES needs:
      # 6 n-vectors: x, r1, r2, w1, w2, y
      storage_minres_bytes(n) = nbits_FC * 6 * n

      expected_minres_bytes = storage_minres_bytes(n)
      minres(A, b)  # warmup
      actual_minres_bytes = @allocated minres(A, b)
      @test expected_minres_bytes  actual_minres_bytes  1.02 * expected_minres_bytes

      solver = MinresSolver(A, b)
      minres!(solver, A, b, λ = λ)  # warmup
      inplace_minres_bytes = @allocated minres!(solver, A, b, λ = λ)
      @test inplace_minres_bytes == 0
    end
  end
end

Observed Results:

Testing Running tests...
MINRES_lambda: Test Failed at C:\Users\Farhad\Documents\Github\Farhad-Phd\Krylov.jl\test\test_allocations.jl:149
  Expression: inplace_minres_bytes == 0
   Evaluated: 160 == 0

Test Summary:           | Pass  Fail  Error  Total  Time
allocations             |   12     2      2     16  9.7s
  Data Type: Float32    |    3     1             4  3.8s
    MINRES              |    2                   2  1.4s
    MINRES_lambda       |    1     1             2  0.5s
  Data Type: Float64    |    3     1             4  1.8s
    MINRES              |    2                   2  0.9s
    MINRES_lambda       |    1     1             2  0.2s

Expected Results:

The solver should allocate no additional memory when using minres! with a shift parameter (λ). Specifically, the inplace_minres_bytes should equal 0.

Impact:

This issue affects the expected performance and efficiency of the solver, particularly in memory-constrained environments.

Additional Information:

The observed allocation of 160 bytes occurs consistently during the MINRES_lambda test case for both Float32 and Float64 data types.

@amontoison
Copy link
Member

amontoison commented Dec 11, 2024

Impact:

This issue affects the expected performance and efficiency of the solver, particularly in memory-constrained environments.

I will fix it but it's just 160 bits Farhad. :)
The impact should be 0.000001% on the performance.

@amontoison
Copy link
Member

amontoison commented Dec 11, 2024

@farhadrclass
The function doesn't allocate with a shift parameter, the allocations that you see are related to the named tuple used to pass keyword arguments to a Julia function.
You can use the macro @ballocated for more advanced profiling.
I also suppose that you didn't tested too much before saying that this issue affects the performance.

using BenchmarkTools

for FC in (Float32, Float64)
  @testset "Data Type: $FC" begin

    A   = FC.(get_div_grad(18, 18, 18))  # Dimension m x n
    m, n = size(A)
    k   = div(n, 2)
    Au  = A[1:k,:]          # Dimension k x n
    Ao  = A[:,1:k]          # Dimension m x k
    b   = Ao * ones(FC, k)  # Dimension m
    c   = Au * ones(FC, n)  # Dimension k
    λ = FC(1.3)

    T = real(FC)
    shifts = T[1; 2; 3; 4; 5]
    nshifts = 5
    nbits_FC = sizeof(FC)  # 8 bits for ComplexF32 and 16 bits for ComplexF64
    nbits_T = sizeof(T) 

    @testset "MINRES" begin
      # MINRES needs:
      # 6 n-vectors: x, r1, r2, w1, w2, y
      storage_minres_bytes(n) = nbits_FC * 6 * n

      expected_minres_bytes = storage_minres_bytes(n)
      minres(A, b)  # warmup
      actual_minres_bytes = @allocated minres(A, b)
      @test expected_minres_bytes  actual_minres_bytes  1.02 * expected_minres_bytes

      solver = MinresSolver(A, b)
      minres!(solver, A, b)  # warmup
      inplace_minres_bytes = @allocated minres!(solver, A, b)
      @test inplace_minres_bytes == 0
    end

    @testset "MINRES_lambda" begin
      # MINRES needs:
      # 6 n-vectors: x, r1, r2, w1, w2, y
      storage_minres_bytes(n) = nbits_FC * 6 * n

      expected_minres_bytes = storage_minres_bytes(n)
      minres(A, b)  # warmup
      actual_minres_bytes = @ballocated minres($A, $b)
      @test expected_minres_bytes  actual_minres_bytes  1.02 * expected_minres_bytes

      solver = MinresSolver(A, b)
      minres!(solver, A, b; λ)  # warmup
      inplace_minres_bytes = @ballocated minres!($solver, $A, $b, λ=$λ)
      @test inplace_minres_bytes == 0
    end
  end
end
Test Summary:      | Pass  Total   Time
Data Type: Float32 |    4      4  21.6s
Test Summary:      | Pass  Total   Time
Data Type: Float64 |    4      4  24.6s

@farhadrclass
Copy link
Author

Thanks @amontoison

The issue I have is that when I use MINRES in my solver: https://github.com/Farhad-phd/JSOSolvers.jl/blob/iR2/src/R2N.jl
I get allocation when I use minres and cg_solvers in general

Subsolver Time (μs) Allocations Memory
ShiftedLBFGSSolver 72.800 268 9.50 KiB
CgSolver 94.000 899 63.62 KiB
MinresSolver 49.800 443 40.17 KiB

I still need to clean the code a bit but the allocation is there

@amontoison
Copy link
Member

The allocation could be in Krylov.jl but you have to prove it :)
Please profile the code and ping me again if you can isolate the culprit.
You can see that you also have allocations when it's not a KrylovSolver so I suspect that the issue is in your code.

@farhadrclass
Copy link
Author

@amontoison exactly, I am trying to remove my allocation part of the code, however the 3 algorithm uses the same code except the call to subsolver, so I assumed 268 is from there. For CgSolver the main issue is that I need to shift the H with \sgima*I which creates the memory and I am fixing that right now. I was expecting the MinresSolver however to have around 268 or less allocation but that is not the case.

Would it be possible to check on Minres end and see if allocate when using shift ?

@amontoison
Copy link
Member

I checked it with my unit tests above (no allocation with @ballocated) and I verified quickly the code, there is no reason that it allocates.

@amontoison
Copy link
Member

amontoison commented Dec 23, 2024

@farhadrclass
The number of outer iterations in your optimization method depends on how accurately the linear subproblem is solved.
You should also have 268 allocations with CG, MINRES or Shifted-LBFGS if you perform EXACTLY the same number of iterations in your outer loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants