Add order statistic warning #230

yannmclatchie · 2023-09-29T11:08:11Z

This PR adds a warning when we estimate that the elpd difference of a collection of models is purely due to chance, developed with @avehtari.

If more than 11 models are compared, then the median model by elpd is taken as the baseline model, and the risk of the difference in predictive performance being due to random noise is estimated as described by McLatchie and Vehtari (Section 3.2, 2023). This will flag a warning if it is deemed that there is a risk of over-fitting due to the selection process, and users are recommended to avoid model selection based on LOO-CV, and instead to favour of model averaging/stacking or projection predictive inference.

… compared is potentially due to random noise

jgabry

@yannmclatchie Great, thank you for the PR! I put a few suggestions and questions in the review comments.

R/loo_compare.R

jgabry · 2023-09-29T15:45:21Z

R/loo_compare.R

+
+  if (max(elpd_diff) <= order_stat) {
+    # flag warning if we suspect no model is theoretically better than the baseline
+    warning("Difference in performance potentially due to chance.",


Should we point users to the paper in the warning message (e.g., "See McLatchie and Vehtari (2023) for details".)? @avehtari what do you think?

I have now added this

jgabry · 2023-09-29T15:48:02Z

R/loo_compare.R

+#'   If more than \eqn{11} models are compared, then the worst model by elpd is
+#'   taken as the baseline model, and the risk of the difference in predictive


I'm slightly confused by this because the printed loo_compare results will still show the model with the best ELPD as the baseline (i.e. all elpd_diff are relative to that one), right?

If that's still true, then I think we should rephrase this to indicate that this is just happening in an internal check and doesn't affect which model is considered as the baseline for computing elpd_diff in the output. Does that make sense? Or am I just confused (totally possible)?

I have now updated the docstring, let me know if it is now clearer 🙏

codecov-commenter · 2023-10-02T09:39:46Z

Codecov Report

Merging #230 (62a1adc) into master (33854f6) will increase coverage by 0.04%.
Report is 7 commits behind head on master.
The diff coverage is 100.00%.

❗ Current head 62a1adc differs from pull request most recent head 3526fbf. Consider uploading reports for the commit 3526fbf to get more accurate results

@@            Coverage Diff             @@
##           master     #230      +/-   ##
==========================================
+ Coverage   92.96%   93.00%   +0.04%     
==========================================
  Files          30       30              
  Lines        2770     2788      +18     
==========================================
+ Hits         2575     2593      +18     
  Misses        195      195

Files	Coverage Δ
R/loo_compare.R	`96.49% <100.00%> (+0.65%)`	⬆️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

yannmclatchie · 2023-10-02T09:41:33Z

Thanks for the feedback @jgabry ! I've gone through your suggestions and made the changes with also a small change in the logic: we use the median model rather than the worst model to take the internal differences and compute the order statistic.

I have now also updated the documentation to make it clear that this second layer of elpd differences is different to what the user will eventually see and is strictly internal, as you noted. Let me know if there are any further changes that could be made to clarify

jgabry

@yannmclatchie Sorry for the delay on this! I've been super busy recently. I added one more comment suggesting minor edits to the documentation, but otherwise this looks good. After you take a look at my suggestion and approve it I can merge this PR!

@avehtari Does this look good to you too?

R/loo_compare.R

Co-authored-by: Jonah Gabry <[email protected]>

jgabry · 2023-11-10T19:04:15Z

Thanks for updating the doc @yannmclatchie (and thanks for adding this to the package!). @avehtari if you're happy with this I think we can go ahead and merge it once all the automated tests are run again.

avehtari · 2023-11-10T19:07:22Z

I can check this at latest on Tuesday (too late today)

jgabry · 2023-11-10T19:24:43Z

Sounds good, thanks

avehtari · 2023-11-14T15:48:32Z

OK for me

jgabry

Merging now. Thanks again!

yannmclatchie added 4 commits September 22, 2023 10:47

add warning if the difference in predictive performance of the models…

4449b08

… compared is potentially due to random noise

fix test for order statistic warning

5a65d66

update warning docs

e79d675

update warning message

9f31d31

jgabry requested review from jgabry and avehtari September 29, 2023 15:36

jgabry reviewed Sep 29, 2023

View reviewed changes

update documentation and fix the R cmd check NOTE

0629419

jgabry requested changes Nov 9, 2023

View reviewed changes

R/loo_compare.R Outdated Show resolved Hide resolved

edit documentation

3526fbf

Co-authored-by: Jonah Gabry <[email protected]>

jgabry approved these changes Nov 15, 2023

View reviewed changes

jgabry merged commit 426c2d8 into stan-dev:master Nov 15, 2023
6 checks passed

yannmclatchie deleted the order-stat-warning branch November 16, 2023 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add order statistic warning #230

Add order statistic warning #230

yannmclatchie commented Sep 29, 2023 •

edited

Loading

jgabry left a comment

jgabry Sep 29, 2023

avehtari Oct 13, 2023

yannmclatchie Oct 13, 2023

jgabry Sep 29, 2023

yannmclatchie Oct 2, 2023

codecov-commenter commented Oct 2, 2023 •

edited

Loading

yannmclatchie commented Oct 2, 2023

jgabry left a comment

jgabry commented Nov 10, 2023

avehtari commented Nov 10, 2023

jgabry commented Nov 10, 2023

avehtari commented Nov 14, 2023

jgabry left a comment

		#' If more than \eqn{11} models are compared, then the worst model by elpd is
		#' taken as the baseline model, and the risk of the difference in predictive

Add order statistic warning #230

Add order statistic warning #230

Conversation

yannmclatchie commented Sep 29, 2023 • edited Loading

jgabry left a comment

Choose a reason for hiding this comment

jgabry Sep 29, 2023

Choose a reason for hiding this comment

avehtari Oct 13, 2023

Choose a reason for hiding this comment

yannmclatchie Oct 13, 2023

Choose a reason for hiding this comment

jgabry Sep 29, 2023

Choose a reason for hiding this comment

yannmclatchie Oct 2, 2023

Choose a reason for hiding this comment

codecov-commenter commented Oct 2, 2023 • edited Loading

Codecov Report

yannmclatchie commented Oct 2, 2023

jgabry left a comment

Choose a reason for hiding this comment

jgabry commented Nov 10, 2023

avehtari commented Nov 10, 2023

jgabry commented Nov 10, 2023

avehtari commented Nov 14, 2023

jgabry left a comment

Choose a reason for hiding this comment

yannmclatchie commented Sep 29, 2023 •

edited

Loading

codecov-commenter commented Oct 2, 2023 •

edited

Loading