AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior #1352

ali-akhavan89 · 2025-01-06T17:04:42Z

I was testing inference = NPE(prior=prior, density_estimator=posterior_nn(model='mdn')) with multi-round training, num_dim = 64, and num_rounds = 6:

https://github.com/sbi-dev/sbi/blob/main/tutorials/02_multiround_inference.ipynb

and I got NaN/Inf assertion error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 13
      9 for _ in range(num_rounds):
     10     theta, x = simulate_for_sbi(simulator, proposal, num_simulations=500)
     11     density_estimator = inference.append_simulations(
     12         theta, x, proposal=proposal
---> 13     ).train(show_train_summary=True)
     14     posterior = inference.build_posterior(density_estimator)
     15     #posterior = inference.build_posterior(density_estimator, sample_with='mcmc')

File /opt/miniconda3/lib/python3.12/site-packages/sbi/inference/trainers/npe/npe_c.py:189, in NPE_C.train(self, num_atoms, training_batch_size, learning_rate, validation_fraction, stop_after_epochs, max_num_epochs, clip_max_norm, calibration_kernel, resume_training, force_first_round_loss, discard_prior_samples, use_combined_loss, retrain_from_scratch, show_train_summary, dataloader_kwargs)
    185     if self.use_non_atomic_loss:
    186         # Take care of z-scoring, pre-compute and store prior terms.
    187         self._set_state_for_mog_proposal()
--> 189 return super().train(**kwargs)

File /opt/miniconda3/lib/python3.12/site-packages/sbi/inference/trainers/npe/npe_base.py:362, in PosteriorEstimator.train(self, training_batch_size, learning_rate, validation_fraction, stop_after_epochs, max_num_epochs, clip_max_norm, calibration_kernel, resume_training, force_first_round_loss, discard_prior_samples, retrain_from_scratch, show_train_summary, dataloader_kwargs)
    355 # Get batches on current device.
    356 theta_batch, x_batch, masks_batch = (
    357     batch[0].to(self._device),
    358     batch[1].to(self._device),
    359     batch[2].to(self._device),
    360 )
--> 362 train_losses = self._loss(
    363     theta_batch,
    364     x_batch,
    365     masks_batch,
    366     proposal,
    367     calibration_kernel,
    368     force_first_round_loss=force_first_round_loss,
    369 )
    370 train_loss = torch.mean(train_losses)
    371 train_loss_sum += train_losses.sum().item()

File /opt/miniconda3/lib/python3.12/site-packages/sbi/inference/trainers/npe/npe_base.py:610, in PosteriorEstimator._loss(self, theta, x, masks, proposal, calibration_kernel, force_first_round_loss)
    606     loss = self._neural_net.loss(theta, x)
    607 else:
    608     # Currently only works for `DensityEstimator` objects.
    609     # Must be extended ones other Estimators are implemented. See #966,
--> 610     loss = -self._log_prob_proposal_posterior(theta, x, masks, proposal)
    612 return calibration_kernel(x) * loss

File /opt/miniconda3/lib/python3.12/site-packages/sbi/inference/trainers/npe/npe_c.py:298, in NPE_C._log_prob_proposal_posterior(self, theta, x, masks, proposal)
    290     if not (
    291         hasattr(self._neural_net.net, "_distribution")
    292         and isinstance(self._neural_net.net._distribution, mdn)
    293     ):
    294         raise ValueError(
    295             "The density estimator must be a MDNtext for non-atomic loss."
    296         )
--> 298     return self._log_prob_proposal_posterior_mog(theta, x, proposal)
    299 else:
    300     if not hasattr(self._neural_net, "log_prob"):

File /opt/miniconda3/lib/python3.12/site-packages/sbi/inference/trainers/npe/npe_c.py:454, in NPE_C._log_prob_proposal_posterior_mog(self, theta, x, proposal)
    452 # Compute the log_prob of theta under the product.
    453 log_prob_proposal_posterior = mog_log_prob(theta, logits_pp, m_pp, prec_pp)
--> 454 assert_all_finite(
    455     log_prob_proposal_posterior,
    456     """the evaluation of the MoG proposal posterior. This is likely due to a
    457     numerical instability in the training procedure. Please create an issue on
    458     Github.""",
    459 )
    461 return log_prob_proposal_posterior

File /opt/miniconda3/lib/python3.12/site-packages/sbi/utils/torchutils.py:422, in assert_all_finite(quantity, description)
    419 """Raise if tensor quantity contains any NaN or Inf element."""
    421 msg = f"NaN/Inf present in {description}."
--> 422 assert torch.isfinite(quantity).all(), msg

AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior. This is likely due to a
            numerical instability in the training procedure. Please create an issue on
            Github..

The text was updated successfully, but these errors were encountered:

manuelgloeckler · 2025-01-07T09:58:14Z

Dear ali-akhavan89,

Thanks for reporting. I was able to reproduce the error on the newest version with the code

import torch

from sbi.analysis import pairplot
from sbi.inference import NPE, simulate_for_sbi
from sbi.neural_nets import posterior_nn
from sbi.utils import BoxUniform
from sbi.utils.user_input_checks import (
    check_sbi_inputs,
    process_prior,
    process_simulator,
)

# 2 rounds: first round simulates from the prior, second round simulates parameter set
# that were sampled from the obtained posterior.
num_rounds = 6
num_dim = 64
# The specific observation we want to focus the inference on.
x_o = torch.zeros(num_dim,)
prior = BoxUniform(low=-2 * torch.ones(num_dim), high=2 * torch.ones(num_dim))
simulator = lambda theta: theta + torch.randn_like(theta) * 0.1

# Ensure compliance with sbi's requirements.
prior, num_parameters, prior_returns_numpy = process_prior(prior)
simulator = process_simulator(simulator, prior, prior_returns_numpy)
check_sbi_inputs(simulator, prior)

inference = NPE(prior=prior, density_estimator=posterior_nn(model='mdn'))

posteriors = []
proposal = prior

for _ in range(num_rounds):
    theta, x = simulate_for_sbi(simulator, proposal, num_simulations=500)

    # In `SNLE` and `SNRE`, you should not pass the `proposal` to
    # `.append_simulations()`
    density_estimator = inference.append_simulations(
        theta, x, proposal=proposal
    ).train(show_train_summary=True)
    posterior = inference.build_posterior(density_estimator)
    posteriors.append(posterior)
    proposal = posterior.set_default_x(x_o)

Regarding the error. I do not think this is some coding bug; it is rather specific to the parameterization, i.e., just changing the density estimator but keeping num_dim=2 will work.

It looks like a numerical error within the training routine (specifically the SNPE-C correction in this specific case). Potential reasons can be:

Some under/overflow in the correction factors (i.e. in the softmax)
Specific to MDNs, the covariance matrix might not become p.s.d.

Some general remarks:

For num_dim=64, the number of simulations is extremely small, i.e., num_simulations=500. This will lead to a terrible approximation in the first round, which likely leads to extreme values in the correction term, causing the nans. For num_dim=30, the above code runs without error.

One likely simply needs more simulations on the first round in this case. Generally, it's hard to address these errors; I think it is already implemented in a numerically stable way (at least what flaot32 can offer).

ali-akhavan89 added the bug Something isn't working label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior #1352

AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior #1352

ali-akhavan89 commented Jan 6, 2025 •

edited

Loading

manuelgloeckler commented Jan 7, 2025

AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior #1352

AssertionError: NaN/Inf present in the evaluation of the MoG proposal posterior #1352

Comments

ali-akhavan89 commented Jan 6, 2025 • edited Loading

manuelgloeckler commented Jan 7, 2025

ali-akhavan89 commented Jan 6, 2025 •

edited

Loading