test "Prediction accuracy for minority class increases with higher weight" is flaky #747

MichaelChirico · 2024-11-01T08:33:51Z

pkgload::load_all()
mean(grepl("F", capture.output({
  for (ii in 1:100) testthat::test_file(
    "tests/testthat/test_classweights.R",
    reporter = testthat::MinimalReporter)
})))
# [1] 0.03

i.e. it fails about 3% of the time. The test that fails is this one:

ranger/tests/testthat/test_classweights.R

Line 26 in 6e5d6cc

expect_gt(acc_minor_weighted, acc_minor)

And the failure reads:

── Failure (test_classweights.R:26:3): Prediction accuracy for minority class increases with higher weight ──
`acc_minor_weighted` is not strictly more than `acc_minor`. Difference: 0

Presumably it's some tiny numeric difference being observed (it would be nice if {testthat} helps us here, right now it's strictly limited to 3 digits' difference: r-lib/testthat#2006).

The text was updated successfully, but these errors were encountered:

mnwright · 2024-11-04T15:21:08Z

Thanks! Such tests are always a little bit dangerous (but useful).

I'll increase the sample size and number of trees, that should help.

MichaelChirico · 2024-11-04T16:42:47Z

it's tough to know the right level of tolerable flakiness, IMO 3% is definitely too high (except maybe if it's really costly to increase the precision, but then I would hide such tests from CRAN).

Thanks for addressing this!

MichaelChirico mentioned this issue Nov 1, 2024

expect_comparison could use waldo to determine digits in the failure case? r-lib/testthat#2006

Open

mnwright mentioned this issue Nov 4, 2024

More trees and sample size in test #748

Merged

mnwright closed this as completed in #748 Nov 5, 2024

mnwright closed this as completed in ab8ac2c Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test "Prediction accuracy for minority class increases with higher weight" is flaky #747

test "Prediction accuracy for minority class increases with higher weight" is flaky #747

MichaelChirico commented Nov 1, 2024 •

edited

Loading

mnwright commented Nov 4, 2024

MichaelChirico commented Nov 4, 2024

test "Prediction accuracy for minority class increases with higher weight" is flaky #747

test "Prediction accuracy for minority class increases with higher weight" is flaky #747

Comments

MichaelChirico commented Nov 1, 2024 • edited Loading

mnwright commented Nov 4, 2024

MichaelChirico commented Nov 4, 2024

MichaelChirico commented Nov 1, 2024 •

edited

Loading