Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test "Prediction accuracy for minority class increases with higher weight" is flaky #747

Closed
MichaelChirico opened this issue Nov 1, 2024 · 2 comments · Fixed by #748
Closed

Comments

@MichaelChirico
Copy link
Contributor

MichaelChirico commented Nov 1, 2024

pkgload::load_all()
mean(grepl("F", capture.output({
  for (ii in 1:100) testthat::test_file(
    "tests/testthat/test_classweights.R",
    reporter = testthat::MinimalReporter)
})))
# [1] 0.03

i.e. it fails about 3% of the time. The test that fails is this one:

expect_gt(acc_minor_weighted, acc_minor)

And the failure reads:

── Failure (test_classweights.R:26:3): Prediction accuracy for minority class increases with higher weight ──
`acc_minor_weighted` is not strictly more than `acc_minor`. Difference: 0

Presumably it's some tiny numeric difference being observed (it would be nice if {testthat} helps us here, right now it's strictly limited to 3 digits' difference: r-lib/testthat#2006).

@mnwright
Copy link
Member

mnwright commented Nov 4, 2024

Thanks! Such tests are always a little bit dangerous (but useful).

I'll increase the sample size and number of trees, that should help.

@MichaelChirico
Copy link
Contributor Author

it's tough to know the right level of tolerable flakiness, IMO 3% is definitely too high (except maybe if it's really costly to increase the precision, but then I would hide such tests from CRAN).

Thanks for addressing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants