Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix measure bottlenecks #337

Merged
merged 17 commits into from
Nov 20, 2023
8 changes: 7 additions & 1 deletion R/PredictionDataSurv.R
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,13 @@ filter_prediction_data.PredictionDataSurv = function(pdata, row_ids, ...) {
distr = pdata$distr

if (testDistribution(distr)) { # distribution
pdata$distr = distr[keep]
ok = inherits(distr, c("VectorDistribution", "Matdist", "Arrdist")) &&
length(keep) > 1 # edge case: Arrdist(1xYxZ) and keep = FALSE
if (ok) {
pdata$distr = distr[keep] # we can subset row/samples like this
} else {
pdata$distr = base::switch(keep, distr) # one distribution only
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bblodfon can you tell me what's happening here?

Copy link
Collaborator

@bblodfon bblodfon Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to filter Distribution objects (like the prediction object from surv.parametric) since I noticed we don't do any filtering on those and I needed on the tests to compare RCLL old and new code kind-of. The above is the result of testing some edge cases.

In general you can subset with distr[keep] when you have a Matdist, Arrdist or an VectorDistribution if you have two or more observations. The problem starts when 1 observation is left. distr[keep] with a VectorDistribution leaves a Distribution (e.g. WeibullAFT-something in surv.parametric), a WeightedDiscrete() for Matdist, and an Arrdist of dim 1x... (it doesn't degenerate to a Matdist). And that's all fine, we still use pdata$distr = distr[keep] to do this. But then if you redo some more filtering (on that same specific row index or another to go to 0 observations - a very edge case :), keep will be either TRUE or FALSE, and most of the above are not subsettable or don't work as expected:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
arrdist_logical_subsetting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking this - based on the above, we might want to make an Arrdist to a Matdist when only one observation is left by design in distr6?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaphaelS1 Do you have an opinion on this?

} else {
if (length(dim(distr)) == 2) { # 2d matrix
pdata$distr = distr[keep, , drop = FALSE]
Expand Down
4 changes: 3 additions & 1 deletion R/PredictionSurv.R
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,12 @@ PredictionSurv = R6Class("PredictionSurv",
}
},
.distrify_survarray = function(x) {
if (inherits(x, "array")) { # can be matrix as well
if (inherits(x, "array") && nrow(x) > 0) { # can be matrix as well
# create Matdist or Arrdist (default => median curve)
distr6::as.Distribution(1 - x, fun = "cdf",
decorators = c("CoreStatistics", "ExoticStatistics"))
} else {
NULL
bblodfon marked this conversation as resolved.
Show resolved Hide resolved
}
}
)
Expand Down
4 changes: 2 additions & 2 deletions inst/testthat/helper_expectations.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ expect_prediction_surv = function(p) {
"response", "distr", "lp", "crank"))
checkmate::expect_data_table(data.table::as.data.table(p), nrows = length(p$row_ids))
checkmate::expect_atomic_vector(p$missing)
if ("distr" %in% p$predict_types) {
expect_true(class(p$distr)[[1]] %in% c("VectorDistribution", "Matdist", "Arrdist"))
if ("distr" %in% p$predict_types && !is.null(p$distr)) {
expect_true(class(p$distr)[[1]] %in% c("VectorDistribution", "Matdist", "Arrdist", "WeightedDiscrete"))
}
expect_true(inherits(p, "PredictionSurv"))
}
34 changes: 32 additions & 2 deletions tests/testthat/test_PredictionSurv.R
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,8 @@ test_that("as_prediction_surv", {
})

test_that("filtering", {
p = suppressWarnings(lrn("surv.coxph")$train(task)$predict(task))
p2 = reshape_distr_to_3d(p) # survival array distr
p = suppressWarnings(lrn("surv.coxph")$train(task)$predict(task)) # survival matrix
p2 = reshape_distr_to_3d(p) # survival array
p3 = p$clone()
p4 = p2$clone()
p3$data$distr = p3$distr # Matdist
Expand Down Expand Up @@ -209,4 +209,34 @@ test_that("filtering", {
expect_equal(nrow(p2$data$distr), 3)
expect_true(inherits(p3$data$distr, "Matdist"))
expect_true(inherits(p4$data$distr, "Arrdist"))

# edge case: filter to 1 observation
p$filter(20)
p2$filter(20)
p3$filter(20)
p4$filter(20)
expect_prediction_surv(p)
expect_prediction_surv(p2)
expect_prediction_surv(p3)
expect_prediction_surv(p4)
expect_matrix(p$data$distr, nrows = 1)
expect_array(p2$data$distr, d = 3)
expect_equal(nrow(p2$data$distr), 1)
expect_true(inherits(p3$data$distr, "WeightedDiscrete")) # from Matdist!
expect_true(inherits(p4$data$distr, "Arrdist")) # remains an Arrdist!

# filter to 0 observations using non-existent (positive) id
p$filter(42)
p2$filter(42)
p3$filter(42)
p4$filter(42)

expect_prediction_surv(p)
expect_prediction_surv(p2)
expect_prediction_surv(p3)
expect_prediction_surv(p4)
expect_null(p$distr)
expect_null(p2$distr)
expect_null(p3$distr)
expect_null(p4$distr)
})
Loading