Out of Memory Error during SBC #1311

ali-akhavan89 · 2024-11-24T18:14:03Z

ali-akhavan89
Nov 24, 2024

I have successfully implemented the SBC into my inference framework, but I should note that I am dealing with time-series (3 outcome variables with 300 time points; the model has 12 input parameters). Given the tutorial notations, the SBC works fine if num_sbc_samples = 100 and num_posterior_samples = 1_000. But I get the 'Out of Memory Error' (screenshot below) if I set num_sbc_samples = 250 and num_posterior_samples = 2_500. Could you please help me understand what could be the potential causes of seeing this error? Thank you.

Answered by ali-akhavan89

Nov 26, 2024

Thanks, Michael. I think the prior was already on the CPU and the method you suggested didn't solve the problem. Especially, posterior.prior.support.base_constraint.lower_bound.device was on CUDA right before run_sbc. I guess this issue happened because I only transferred the posterior_estimator with posterior.posterior_estimator = posterior.posterior_estimator.to(device). To resolve this, I replaced the prior within the posterior object with a new prior that's on the CPU, using posterior.prior = prior (is it okay to do so for diagnostics purposes after the NN training is done?).

At this point I thought everything is on the CPU. However, run_sbc gave me the same error for reduce_fns=poste…

View full answer

michaeldeistler · 2024-11-25T07:00:39Z

michaeldeistler
Nov 25, 2024
Maintainer

Hi there! Could you try moving the posterior and all observations onto CPU? Or try to move the entire computation into a torch. no_grad() context:

with torch.no_grad():
    ranks, dap_samples = run_sbc(
        thetas, xs, posterior, num_posterior_samples=num_posterior_samples, num_workers=num_workers
    )

1 reply

ali-akhavan89 Nov 25, 2024
Author

Thank you!

I don’t think torch.no_grad solves this (at least, we couldn’t do it). However, I tried moving everything to CPU using (after the training is done on CUDA):

    device = torch.device('cpu')
    thetas = thetas.to(device)
    observations = observations.to(device)
    posterior.posterior_estimator = posterior.posterior_estimator.to(device)

But we got RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Here’s the full log:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 196
    189 print(f"Posterior estimator device: {next(posterior.posterior_estimator.parameters()).device}")
    193 #posterior.posterior_estimator.to(device)
    194 
    195 # Run SBC with reduce_fns to obtain a single scalar KS p-value
--> 196 ranks, dap_samples = run_sbc(
    197     thetas=thetas,
    198     xs=observations,
    199     posterior=posterior,
    200     num_posterior_samples=num_posterior_samples,
    201     num_workers=num_workers,
    202     reduce_fns=posterior.log_prob,  # Reduce function to map to one dimension
    203 )
    205 # Compute check_stats
    206 check_stats = check_sbc(
    207     ranks,
    208     thetas,
    209     dap_samples,
    210     num_posterior_samples=num_posterior_samples,
    211 )

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\diagnostics\sbc.py:86, in run_sbc(thetas, xs, posterior, num_posterior_samples, reduce_fns, num_workers, show_progress_bar, use_batched_sampling, **kwargs)
     78     warnings.warn(
     79         "`sbc_batch_size` is deprecated and will be removed in future versions."
     80         " Use `num_workers` instead.",
     81         DeprecationWarning,
     82         stacklevel=2,
     83     )
     85 # Get posterior samples, batched or parallelized.
---> 86 posterior_samples = get_posterior_samples_on_batch(
     87     xs,
     88     posterior,
     89     (num_posterior_samples,),
     90     num_workers,
     91     show_progress_bar,
     92     use_batched_sampling=use_batched_sampling,
     93 )
     95 # take a random draw from each posterior to get data averaged posterior samples.
     96 dap_samples = posterior_samples[0, :, :]

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\utils\diagnostics_utils.py:38, in get_posterior_samples_on_batch(xs, posterior, sample_shape, num_workers, show_progress_bar, use_batched_sampling)
     35 try:
     36     # has shape (num_samples, batch_size, dim_parameters)
     37     if use_batched_sampling:
---> 38         posterior_samples = posterior.sample_batched(
     39             sample_shape, x=xs, show_progress_bars=show_progress_bar
     40         )
     41     else:
     42         raise NotImplementedError

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\inference\posteriors\direct_posterior.py:178, in DirectPosterior.sample_batched(self, sample_shape, x, max_sampling_batch_size, show_progress_bars)
    170 x = reshape_to_batch_event(x, event_shape=condition_shape)
    172 max_sampling_batch_size = (
    173     self.max_sampling_batch_size
    174     if max_sampling_batch_size is None
    175     else max_sampling_batch_size
    176 )
--> 178 samples = rejection.accept_reject_sample(
    179     proposal=self.posterior_estimator,
    180     accept_reject_fn=lambda theta: within_support(self.prior, theta),
    181     num_samples=num_samples,
    182     show_progress_bars=show_progress_bars,
    183     max_sampling_batch_size=max_sampling_batch_size,
    184     proposal_sampling_kwargs={"condition": x},
    185     alternative_method="build_posterior(..., sample_with='mcmc')",
    186 )[0]
    188 return samples

File ~\.conda\envs\ptgpu\Lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\samplers\rejection\rejection.py:286, in accept_reject_sample(proposal, accept_reject_fn, num_samples, show_progress_bars, warn_acceptance, sample_for_correction_factor, max_sampling_batch_size, proposal_sampling_kwargs, alternative_method, **kwargs)
    281 candidates = proposal.sample(
    282     (sampling_batch_size,),  # type: ignore
    283     **proposal_sampling_kwargs,
    284 )
    285 # SNPE-style rejection-sampling when the proposal is the neural net.
--> 286 are_accepted = accept_reject_fn(candidates)
    287 # Reshape necessary in certain cases which do not follow the shape conventions
    288 # of the "DensityEstimator" class.
    289 are_accepted = are_accepted.reshape(sampling_batch_size, num_xos)

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\inference\posteriors\direct_posterior.py:180, in DirectPosterior.sample_batched.<locals>.<lambda>(theta)
    170 x = reshape_to_batch_event(x, event_shape=condition_shape)
    172 max_sampling_batch_size = (
    173     self.max_sampling_batch_size
    174     if max_sampling_batch_size is None
    175     else max_sampling_batch_size
    176 )
    178 samples = rejection.accept_reject_sample(
    179     proposal=self.posterior_estimator,
--> 180     accept_reject_fn=lambda theta: within_support(self.prior, theta),
    181     num_samples=num_samples,
    182     show_progress_bars=show_progress_bars,
    183     max_sampling_batch_size=max_sampling_batch_size,
    184     proposal_sampling_kwargs={"condition": x},
    185     alternative_method="build_posterior(..., sample_with='mcmc')",
    186 )[0]
    188 return samples

File ~\.conda\envs\ptgpu\Lib\site-packages\sbi\utils\sbiutils.py:616, in within_support(distribution, samples)
    608 # Try to check using the support property, use log prob method otherwise.
    609 # Before torch v1.7.0, `support.check()` returned bools for every element.
    610 # From v1.8.0 on, it directly considers all dimensions of a sample. E.g.,
   (...)
    613 # return [[True, True, True]]. This is relevant for `ImproperEmpirical`
    614 # distributions in SBI, which are used in SNPE.
    615 try:
--> 616     sample_check = distribution.support.check(samples)
    617     if sample_check.shape == samples.shape:
    618         sample_check = torch.all(sample_check, dim=-1)

File ~\.conda\envs\ptgpu\Lib\site-packages\torch\distributions\constraints.py:220, in _IndependentConstraint.check(self, value)
    219 def check(self, value):
--> 220     result = self.base_constraint.check(value)
    221     if result.dim() < self.reinterpreted_batch_ndims:
    222         expected = self.base_constraint.event_dim + self.reinterpreted_batch_ndims

File ~\.conda\envs\ptgpu\Lib\site-packages\torch\distributions\constraints.py:400, in _Interval.check(self, value)
    399 def check(self, value):
--> 400     return (self.lower_bound <= value) & (value <= self.upper_bound)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

We are still working on it, but I want to get some intermediary feedback. Thank you so much!

FYI, removing CUDA entirely from the program works fine (SBC goes through without any error), but the NN training takes a significantly longer time on CPU compared to GPU.

michaeldeistler · 2024-11-26T07:25:23Z

michaeldeistler
Nov 26, 2024
Maintainer

Hey! You are getting this error because the prior is still on GPU. Could you try:

prior.base_dist.high.to("cpu")
prior.base_dist.low.to("cpu")

I also created this issue to make moving the posterior simpler.

0 replies

ali-akhavan89 · 2024-11-26T17:34:56Z

ali-akhavan89
Nov 26, 2024
Author

Thanks, Michael. I think the prior was already on the CPU and the method you suggested didn't solve the problem. Especially, posterior.prior.support.base_constraint.lower_bound.device was on CUDA right before run_sbc. I guess this issue happened because I only transferred the posterior_estimator with posterior.posterior_estimator = posterior.posterior_estimator.to(device). To resolve this, I replaced the prior within the posterior object with a new prior that's on the CPU, using posterior.prior = prior (is it okay to do so for diagnostics purposes after the NN training is done?).

At this point I thought everything is on the CPU. However, run_sbc gave me the same error for reduce_fns=posterior.log_prob. So I transferred reduce_fns to cpu using:

    posterior._device = device
    reduce_fn_cpu = lambda theta, x: posterior.log_prob(theta, x)

    with torch.no_grad():
        ranks, dap_samples = run_sbc(
            thetas=thetas,
            xs=observations,
            posterior=posterior,
            num_posterior_samples=num_posterior_samples,
            num_workers=num_workers,
            reduce_fns=reduce_fn_cpu,
        )

The SBC went through without any errors.

But the calculations (2,500 posterior samples and 250 SBC samples) seem to be pretty slow (around 15 minutes for the settings above with 100% CPU utilization). I am genuinely interested in understanding why these calculations are so intensive. Even when I got the memory error on GPU, it was trying to allocate around 11 GB VRAM, which seems to be very high compared to the number of samples. I'd appreciate any comments or feedback. Thanks again!

3 replies

janfb Nov 27, 2024
Maintainer

Hi @ali-akhavan89

are you using NPE or something else here? If your posterior is based on NLE or NRE, SBC can be slow because it needs to sample with MCMC or VI for every instance.

Regarding the GPU problem, here is another thing you could try: You can just create a new inference object with the trained estimator, e.g., if you are using NPE, and you have a pre-trained density_estimator (from the initial posterior.posterior_estimator) with all weights moved to CPU, and a prior on CPU:

posterior = NPE(prior=prior).build_posterior(density_estimator=density_estimator)   # pass the pre-trained NN here

# simulate SBC test set
thetas = prior.sample((num_sbc_samples,))
xs = simulator(thetas)

# run SBC
ranks, dap_samples = run_sbc(
            thetas=thetas,
            xs=xs,
            posterior=posterior,
            num_posterior_samples=num_posterior_samples,
            num_workers=num_workers,
        )

ali-akhavan89 Nov 27, 2024
Author

Hi,

Thank you! I am using NPE, and I have been using the inference object with the trained estimator on GPU, as you described. I have a simulator without defining any explicit likelihood. Since I have a custom embedding network with multiple CNN and RNNs, I feel that passing the entire posterior to run_sbc might be causing unnecessary computational overheard and the memory allocation issue (is it possible?). Whenever I have a chance, I will come up with a minimum similar example that I can share.

Thanks again,
Ali

janfb Jan 2, 2025
Maintainer

Hi, sorry for the delayed response. For running run_sbc the posterior is required for drawing samples (can be outsourced, see below) and for ranking the sbc samples (cannot be outsourced).

Outsourcing the posterior sampling might help, see here for details: #1319 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory Error during SBC #1311

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Out of Memory Error during SBC #1311

ali-akhavan89 Nov 24, 2024

Replies: 3 comments · 4 replies

michaeldeistler Nov 25, 2024 Maintainer

ali-akhavan89 Nov 25, 2024 Author

michaeldeistler Nov 26, 2024 Maintainer

ali-akhavan89 Nov 26, 2024 Author

janfb Nov 27, 2024 Maintainer

ali-akhavan89 Nov 27, 2024 Author

janfb Jan 2, 2025 Maintainer

ali-akhavan89
Nov 24, 2024

Replies: 3 comments 4 replies

michaeldeistler
Nov 25, 2024
Maintainer

ali-akhavan89 Nov 25, 2024
Author

michaeldeistler
Nov 26, 2024
Maintainer

ali-akhavan89
Nov 26, 2024
Author

janfb Nov 27, 2024
Maintainer

ali-akhavan89 Nov 27, 2024
Author

janfb Jan 2, 2025
Maintainer