[BUG] `sample` in `mps.jl` broken when using CUDA #92

tipfom · 2024-11-13T14:52:47Z

The following minimal working example crashes in a MethodError: no method matching DenseVector{Float64}(::UndefInitializer, ::Tuple{Int64}) in ITensorMPS version 0.3.1:

using CUDA
using ITensors, ITensorMPS, NDTensors

N = 10
sites = siteinds("Qubit", N)
ψ = NDTensors.cu(MPS(sites, repeat(["+"], N)))

sample!(ψ)

The error can be mitigated by copying the MPS to the cpu first, or by adopting the sample function in the following way:

function sample(rng::AbstractRNG, m::MPS)
  N = length(m)

  if orthocenter(m) != 1
    error("sample: MPS m must have orthocenter(m)==1")
  end
  if abs(1.0 - norm(m[1])) > 1E-8
    error("sample: MPS is not normalized, norm=$(norm(m[1]))")
  end

  result = zeros(Int, N)
  A = m[1]

  for j in 1:N
    s = siteind(m, j)
    d = dim(s)
    # Compute the probability of each state
    # one-by-one and stop when the random
    # number r is below the total prob so far
    pdisc = 0.0
    r = rand(rng)
    # Will need n,An, and pn below
    n = 1
    An = NDTensors.cu(ITensor()) # this line is changed
    pn = 0.0
    while n <= d
      projn = ITensor(s)
      projn[s => n] = 1.0
      An = A * dag(NDTensors.cu(projn)) # this line is changed
      pn = real(scalar(dag(An) * An))
      pdisc += pn
      (r < pdisc) && break
      n += 1
    end
    result[j] = n
    if j < N
      A = m[j + 1] * An
      A *= (1.0 / sqrt(pn))
    end
  end
  return result
end

I suppose that there is a more performant fix, since the above implementation is rather slow.

The text was updated successfully, but these errors were encountered:

mtfishman · 2024-11-13T15:21:38Z

Thanks for the report, it looks like we are implicitly constructing some tensors on CPU inside that code. The fix you wrote is reasonable, though it can be made more general (for other GPU backends) using something like adapt(datatype(A), projn), as seen here: https://github.com/ITensor/ITensorMPS.jl/blob/v0.3.1/src/mps.jl#L818.

mtfishman · 2024-11-13T15:22:32Z

@kmp5VT could you make a PR?

mtfishman · 2024-11-13T15:25:16Z

Also, I see a lot of places in that code that implicitly use double precision, like https://github.com/ITensor/ITensorMPS.jl/blob/v0.3.1/src/mps.jl#L685 (1.0 is implicitly Float64 in Julia). We should make that function more generic/agnostic about the scalar type in order to preserve the precision of the input.

tipfom · 2024-11-13T16:01:00Z

Hi @mtfishman and @kmp5VT, I created a pull request encorporating the desired changes, which fixes the issue in my workflow (A100 gpu and CUDA.jl). I also tested for normal CPU based MPS.

tipfom mentioned this issue Nov 13, 2024

Make sample type agnostic and GPU compatible #93

Merged

mtfishman closed this as completed in #93 Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `sample` in `mps.jl` broken when using CUDA #92

[BUG] `sample` in `mps.jl` broken when using CUDA #92

tipfom commented Nov 13, 2024

mtfishman commented Nov 13, 2024

mtfishman commented Nov 13, 2024

mtfishman commented Nov 13, 2024

tipfom commented Nov 13, 2024

[BUG] sample in mps.jl broken when using CUDA #92

[BUG] sample in mps.jl broken when using CUDA #92

Comments

tipfom commented Nov 13, 2024

mtfishman commented Nov 13, 2024

mtfishman commented Nov 13, 2024

mtfishman commented Nov 13, 2024

tipfom commented Nov 13, 2024

[BUG] `sample` in `mps.jl` broken when using CUDA #92

[BUG] `sample` in `mps.jl` broken when using CUDA #92