Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neg_binomial_2_rng can obstruct loo_moment_match() #274

Open
kthayashi opened this issue Sep 15, 2024 · 2 comments
Open

neg_binomial_2_rng can obstruct loo_moment_match() #274

kthayashi opened this issue Sep 15, 2024 · 2 comments

Comments

@kthayashi
Copy link

I'm working on a Stan program that includes a generated quantities block that generates log_lik for use with loo and yrep for posterior predictive checks. This looks something like (with some parts abbreviated as ...):

generated quantities {
  vector[n] log_lik;
  array[n] int yrep;
  ...
  for (i in 1:n) {
    ...
    log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi[i]);
  }
  yrep = neg_binomial_2_rng(mu, phi);
}

I'm using cmdstanr to fit the model and run loo with moment-matching:

mod <- cmdstanr::cmdstan_model(...)
fit <- mod$sample(...)
fit_loo <- fit$loo(moment_match = TRUE, cores = 4)

During the operation of loo_moment_match(), I sometimes get a couple error/exception messages that appear to stem from overflow in the *_rng function in the generated quantities block. These all look like:

Error : Exception: neg_binomial_2_rng: Random number that came from gamma distribution is 1.47285e+09, but must be less than 1073741824.000000 (in '/var/folders/2k/c0vy7xwj4kb9x7hbgtpq5m640000gn/T//RtmpE8xV8L/model-6c3130f99d6e.stan', line 83, column 4 to column 39)

Further, these messages are sometimes (but not usually) followed by an error that causes loo_moment_match() to fail:

Error in mm_list[[ii]]$i : $ operator is invalid for atomic vectors
In addition: Warning message:
In parallel::mclapply(X = I, mc.cores = cores, FUN = function(i) loo_moment_match_i_fun(i)) :
  scheduled cores 4, 1, 3 encountered errors in user code, all values of the jobs will be affected

To the best of my understanding, this appears to happen because loo_moment_match_i_fun() is failing for one or more cases. Perhaps mm_list[[ii]] is NA?

loo/R/loo_moment_matching.R

Lines 130 to 131 in 6e7001e

mm_list <- parallel::mclapply(X = I, mc.cores = cores,
FUN = function(i) loo_moment_match_i_fun(i))

loo/R/loo_moment_matching.R

Lines 142 to 143 in 6e7001e

for (ii in seq_along(I)) {
i <- mm_list[[ii]]$i

I get a small number (~1-3) of the error/exception messages pretty consistently, but the error that causes loo_moment_match() to fail is less common. One place that I've been able to produce this error consistently is within a targets pipeline, which suggests to me that this is something that can be influenced by the RNG state. When I did get this error, it was preceded by ~10 of those error/exception messages. I can confirm that this error can also be produced without targets or callr, just less consistently. I'm using cores = 4 here, but the error can still occur with cores = 1. Commenting out code for yrep and *_rng in the Stan file eliminates the issue entirely, but it is (very so slightly) inconvenient to have to make this change depending on whether I want to use loo_moment_match() with the fitted model. I haven't encountered this problem when the *_rng function is something that is less likely to overflow than the negative binomial.

I wanted to report this issue here since it seems to have something to do with loo_moment_match(). It feels like it could be something related to or not entirely covered by #262. If this is expected behavior, I would appreciate any tips on how to better deal with having both log_lik and yrep in the generated quantities block when it comes to using loo_moment_match(). I'm sorry if any of this is off base, as I do not have a good understanding of the inner workings of the moment-matching code.

Some system info:

> packageVersion("loo")
[1] ‘2.8.0.9000’
> packageVersion("cmdstanr")
[1] ‘0.8.1’
> cmdstanr::cmdstan_version()
[1] "2.35.0"
> R.version
               _                           
platform       aarch64-apple-darwin20      
arch           aarch64                     
os             darwin20                    
system         aarch64, darwin20           
status                                     
major          4                           
minor          4.1                         
year           2024                        
month          06                          
day            14                          
svn rev        86737                       
language       R                           
version.string R version 4.4.1 (2024-06-14)
nickname       Race for Your Life          
@avehtari
Copy link
Collaborator

Thanks for reporting this. Unfortunately, we can't solve this in loo package side. We can improve the message in loo in case of generated quantities block causing errors, but that doesn't solve the issue. It might be good to separate the rng part to stand alone generated quantities or consider if the priors can be made more informative so that it would be less likely that in the moment matching some of the parameter values would not get unreasonable values.

@kthayashi
Copy link
Author

Understood - thank you for this input. It sounds like running rng code separately would be a good rule of thumb for avoiding errors like this in moment matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants