Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(loo_)(s)crps could ask for only one argument with predictions #223

Open
bnicenboim opened this issue Apr 12, 2023 · 5 comments
Open

(loo_)(s)crps could ask for only one argument with predictions #223

bnicenboim opened this issue Apr 12, 2023 · 5 comments

Comments

@bnicenboim
Copy link
Contributor

I think it would make sense to ask the user for only one argument with predictions with at least two columns and split the predictions array. (Also in #221).

In my tests, if you just divide the predictions in two, the estimates are virtually the same:

(ps2 <- crps(ypred1[1:2000,], ypred1[2001:4000,], y = fit$y))

@avehtari said:

I think it would be clearer to have this discussed in a separate issue (and tag @LeeviLindgren). I don't expect there is much difference in the easy case, but if the variability of the importance weights is big then a) the importance weights are different for the two halves (while in the current implementation they are the same), b) the number of draws for the importance weighting is halved, which increases Monte Carlo error. It is possible that in practice a + b are not a problem, but this should be investigated more thoroughly.

@avehtari
Copy link
Collaborator

Thanks for opening this. I agree that one argument would be simpler.

In my tests

Before we discuss any potential implementation issues, can you report the tests you have made? Report also the result for loo_crps() and loo_scrps with khat varying from less than 0.3 to larger than 0.7, and even better if the autocorrelation varies, too. Report the actual values from the two approaches ("virtually the same" is not well defined.). I just want to be sure we're not losing too much in the accuracy, and we need to justify well the splitting. If you don't have time to do such tests, just comment here.

@avehtari
Copy link
Collaborator

@bnicenboim any update on this?

@bnicenboim
Copy link
Contributor Author

I'm sorry, but no :(

Hopefully, I'll get back to it in mid January

@avehtari
Copy link
Collaborator

I just realized that we can use alternative equation (e.g. in Wikipedia)
$$
\mathrm{CRPS}(D,y) = \mathrm{E}{X\sim D}[|X-y|] + \mathrm {E}{X\sim D}[X] -2\mathrm{E}{X\sim D}[X\cdot F{D}(X)]
$$
which does not require two yreps, and no need to divide one yrep to two parts, and no need for permutations.

I can make a PR, but we need to first decide how to handle change in the arguments. The old one

  function(x,
           x2,
           y,
           log_lik,
           ...,
           permutations = 1,
           r_eff = 1,
           cores = getOption("mc.cores", 1))

New one

  function(yrep,
           y,
           log_lik,
           ...,
           r_eff = 1,
           cores = getOption("mc.cores", 1))

@jgabry what is your advice? Do we need to create new functions with new names?

@avehtari
Copy link
Collaborator

When changing (s)crps functions we could also fix #213 by adding possibility to provide psis object as an argument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants