From 3e83535fd9b34c032b1db56bea2200c4d97641ea Mon Sep 17 00:00:00 2001 From: sumny Date: Tue, 21 Apr 2020 23:01:10 +0200 Subject: [PATCH 01/12] add PipeOpFilterRows, PipeOpPredictionUnion, tests and docu --- DESCRIPTION | 2 + NAMESPACE | 2 + R/PipeOpFilterRows.R | 171 +++++++++++++++++++ R/PipeOpPredictionUnion.R | 136 +++++++++++++++ man/PipeOp.Rd | 2 + man/PipeOpEnsemble.Rd | 2 + man/PipeOpImpute.Rd | 2 + man/PipeOpProxy.Rd | 2 + man/PipeOpTaskPreproc.Rd | 2 + man/mlr_pipeops.Rd | 2 + man/mlr_pipeops_boxcox.Rd | 2 + man/mlr_pipeops_branch.Rd | 2 + man/mlr_pipeops_chunk.Rd | 2 + man/mlr_pipeops_classbalancing.Rd | 2 + man/mlr_pipeops_classifavg.Rd | 2 + man/mlr_pipeops_classweights.Rd | 2 + man/mlr_pipeops_colapply.Rd | 2 + man/mlr_pipeops_collapsefactors.Rd | 2 + man/mlr_pipeops_copy.Rd | 2 + man/mlr_pipeops_datefeatures.Rd | 2 + man/mlr_pipeops_encode.Rd | 2 + man/mlr_pipeops_encodeimpact.Rd | 2 + man/mlr_pipeops_encodelmer.Rd | 2 + man/mlr_pipeops_featureunion.Rd | 2 + man/mlr_pipeops_filter.Rd | 2 + man/mlr_pipeops_filterrows.Rd | 155 +++++++++++++++++ man/mlr_pipeops_fixfactors.Rd | 2 + man/mlr_pipeops_histbin.Rd | 2 + man/mlr_pipeops_ica.Rd | 2 + man/mlr_pipeops_imputehist.Rd | 2 + man/mlr_pipeops_imputemean.Rd | 2 + man/mlr_pipeops_imputemedian.Rd | 2 + man/mlr_pipeops_imputemode.Rd | 2 + man/mlr_pipeops_imputenewlvl.Rd | 2 + man/mlr_pipeops_imputesample.Rd | 2 + man/mlr_pipeops_kernelpca.Rd | 2 + man/mlr_pipeops_learner.Rd | 2 + man/mlr_pipeops_missind.Rd | 2 + man/mlr_pipeops_modelmatrix.Rd | 2 + man/mlr_pipeops_mutate.Rd | 2 + man/mlr_pipeops_nop.Rd | 2 + man/mlr_pipeops_pca.Rd | 2 + man/mlr_pipeops_predictionunion.Rd | 151 ++++++++++++++++ man/mlr_pipeops_quantilebin.Rd | 2 + man/mlr_pipeops_regravg.Rd | 2 + man/mlr_pipeops_removeconstants.Rd | 2 + man/mlr_pipeops_scale.Rd | 2 + man/mlr_pipeops_scalemaxabs.Rd | 2 + man/mlr_pipeops_scalerange.Rd | 2 + man/mlr_pipeops_select.Rd | 2 + man/mlr_pipeops_smote.Rd | 2 + man/mlr_pipeops_spatialsign.Rd | 2 + man/mlr_pipeops_subsample.Rd | 2 + man/mlr_pipeops_textvectorizer.Rd | 2 + man/mlr_pipeops_threshold.Rd | 2 + man/mlr_pipeops_unbranch.Rd | 2 + man/mlr_pipeops_yeojohnson.Rd | 2 + tests/testthat/test_pipeop_filterrows.R | 90 ++++++++++ tests/testthat/test_pipeop_predictionunion.R | 107 ++++++++++++ 59 files changed, 916 insertions(+) create mode 100644 R/PipeOpFilterRows.R create mode 100644 R/PipeOpPredictionUnion.R create mode 100644 man/mlr_pipeops_filterrows.Rd create mode 100644 man/mlr_pipeops_predictionunion.Rd create mode 100644 tests/testthat/test_pipeop_filterrows.R create mode 100644 tests/testthat/test_pipeop_predictionunion.R diff --git a/DESCRIPTION b/DESCRIPTION index 81f4c84fa..8f4e2ac87 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -104,6 +104,7 @@ Collate: 'PipeOpEncodeLmer.R' 'PipeOpFeatureUnion.R' 'PipeOpFilter.R' + 'PipeOpFilterRows.R' 'PipeOpFixFactors.R' 'PipeOpHistBin.R' 'PipeOpICA.R' @@ -122,6 +123,7 @@ Collate: 'PipeOpMutate.R' 'PipeOpNOP.R' 'PipeOpPCA.R' + 'PipeOpPredictionUnion.R' 'PipeOpProxy.R' 'PipeOpQuantileBin.R' 'PipeOpRegrAvg.R' diff --git a/NAMESPACE b/NAMESPACE index b755bd88c..497e48071 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -38,6 +38,7 @@ export(PipeOpEncodeLmer) export(PipeOpEnsemble) export(PipeOpFeatureUnion) export(PipeOpFilter) +export(PipeOpFilterRows) export(PipeOpFixFactors) export(PipeOpHistBin) export(PipeOpICA) @@ -56,6 +57,7 @@ export(PipeOpModelMatrix) export(PipeOpMutate) export(PipeOpNOP) export(PipeOpPCA) +export(PipeOpPredictionUnion) export(PipeOpProxy) export(PipeOpQuantileBin) export(PipeOpRegrAvg) diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R new file mode 100644 index 000000000..cd95e60c6 --- /dev/null +++ b/R/PipeOpFilterRows.R @@ -0,0 +1,171 @@ +#' @title PipeOpFilterRows +#' +#' @usage NULL +#' @name mlr_pipeops_filterrows +#' @format [`R6Class`] object inheriting from [`PipeOpTaskPreproc`]. +#' +#' @description +#' Filter rows of the data of a task. Also directly allows for the removal of rows holding missing +#' values. +#' +#' @section Construction: +#' ``` +#' PipeOpFilterRows$new(id = "filterrows", param_vals = list()) +#' ``` +#' +#' * `id` :: `character(1)`\cr +#' Identifier of resulting object, default `"filterrows"`. +#' * `param_vals` :: named `list`\cr +#' List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise +#' be set during construction. Default `list()`. +#' +#' @section Input and Output Channels: +#' Input and output channels are inherited from [`PipeOpTaskPreproc`]. +#' +#' The output during training is the input [`Task`][mlr3::Task] with rows kept according to the +#' filtering (see Parameters) and (possible) rows with missing values removed. +#' +#' The output during prediction is the unchanged input [`Task`][mlr3::Task] if the parameter +#' `skip_during_predict` is `TRUE`. Otherwise it is analogously handled as the output during +#' training. +#' +#' @section State: +#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpTaskPreproc`], +#' as well as the following elements: +#' * `na_ids` :: `integer`\cr +#' The row identifiers that had missing values during training and therefore were removed. See the +#' parameter `na_column`. +#' * `row_ids` :: `integer`\cr +#' The row identifiers that were kept during training according to the parameters `filter`, +#' `na_column` and `invert`. +#' +#' @section Parameters: +#' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as: +#' * `filter` :: `NULL` | `character(1)` | `expression` | `integer`\cr +#' How the rows of the data of the input [`Task`][mlr3::Task] should be filtered. This can be a +#' character vector of length 1 indicating a feature column of logicals in the data of the input +#' [`Task`][mlr3::Task] which forms the basis of the filtering, i.e., all rows that are `TRUE` +#' with respect to this column are kept in the data of the output [`Task`][mlr3::Task]. Moreover, +#' this can be an expression that will result in a logical vector of length `$nrow` of the data of +#' the input [`Task`][mlr3::Task] when evaluated withing the environment of the `$data()` of the +#' input [`Task`][mlr3::Task]. Finally, this can also be an integerish vector that directly +#' specifies the row identifiers of the rows of the data of the input [`Task`][mlr3::Task] that +#' should be kept. Default is `NULL`, i.e., no filtering is done. +#' * `na_column` :: `NULL` | `character`\cr +#' A character vector that specifies the columns of the data of the input [`Task`][mlr3::Task] +#' that should be checked for missing values. If set to `all`, all columns of the data are used. A +#' row is removed if at least one missing value is found with respect to the columns specified. +#' Default is `NULL`, i.e., no removal of missing values is done. +#' * `invert` :: `logical(1)`\cr +#' Should the filtering rule be set-theoretically inverted? Note that this happens after +#' (possible) missing values were removed if `na_column` is specified. Default is `FALSE`. +#' * `skip_during_predict` :: `logical(1)`\cr +#' Should the filtering and missing value removal steps be skipped during prediction? If `TRUE`, +#' the input [`Task`][mlr3::Task] is returned unaltered during prediction. Default is `FALSE`. +#' +#' @section Internals: +#' Uses the [`is.na()`][base::is.na] function for the checking of missing values. +#' +#' @section Methods: +#' Only methods inherited from [`PipeOpTaskPreproc`]/[`PipeOp`]. +#' +#' @examples +#' library("mlr3") +#' task = tsk("pima") +#' po = PipeOpFilterRows$new(param_vals = list( +#' filter = expression(age < median(age) & mass > 30), +#' na_column = "all") +#' ) +#' po$train(list(task)) +#' po$state +#' @family PipeOps +#' @include PipeOpTaskPreproc.R +#' @export +PipeOpFilterRows = R6Class("PipeOpFilterRows", + inherit = PipeOpTaskPreproc, + public = list( + initialize = function(id = "filterrows", param_vals = list()) { + ps = ParamSet$new(params = list( + ParamUty$new("filter", default = NULL, tags = c("train", "predict"), custom_check = function(x) { + ok = test_character(x, any.missing = FALSE, len = 1L) || + is.expression(x) || + test_integerish(x, lower = 1, min.len = 1L) || + is.null(x) + if (!ok) return("Must either be a character vector of length 1, an expression, or an integerish object of row ids") + TRUE + }), + ParamUty$new("na_column", default = NULL, tags = c("train", "predict"), custom_check = function(x) { + check_character(x, any.missing = FALSE, min.len = 1L, null.ok = TRUE) + }), + ParamLgl$new("invert", default = FALSE, tags = c("train", "predict")), + ParamLgl$new("skip_during_predict", default = FALSE, tags = "predict")) + ) + ps$values = list(filter = NULL, na_column = NULL, invert = FALSE, skip_during_predict = FALSE) + super$initialize(id, param_set = ps, param_vals = param_vals) + } + ), + private = list( + .na_and_filter = function(task, skip, set_state) { + if (skip) { + return(task) # early exit if skipped (if skip_during_predict) + } + + row_ids = task$row_ids + + # NA column(s) handling + na = self$param_set$values$na_column + if (!is.null(na)) { + assert_subset(na, choices = c("all", colnames(task$data()))) + if (na == "all") na = colnames(task$data()) + na_ids = which(rowSums(is.na(task$data(cols = na))) > 0L) + row_ids = setdiff(row_ids, na_ids) + } else { + na_ids = integer(0L) + } + + # filtering + filter = self$param_set$values$filter + filter_ids = + if (is.null(filter)) { + row_ids + } else if (is.character(filter)) { + assert_subset(filter, choices = task$feature_names) + filter_column = task$data(cols = filter)[[1L]] + assert_logical(filter_column) + which(filter_column) + } else if(is.expression(filter)) { + filter_expression = eval(filter, envir = task$data()) + assert_logical(filter_expression, len = task$nrow) + which(filter_expression) + } else { + filter = as.integer(filter) + assert_subset(filter, choices = task$row_ids) + filter + } + + row_ids = if (self$param_set$values$invert) { + setdiff(row_ids, filter_ids) + } else { + intersect(row_ids, filter_ids) + } + + # only set the state if required (during training) + if (set_state) { + self$state$na_ids = na_ids + self$state$row_ids = row_ids + } + + task$filter(row_ids) + }, + + .train_task = function(task) { + private$.na_and_filter(task, skip = FALSE, set_state = TRUE) + }, + + .predict_task = function(task) { + private$.na_and_filter(task, skip = self$param_set$values$skip_during_predict, set_state = FALSE) + } + ) +) + +mlr_pipeops$add("filterrows", PipeOpFilterRows) diff --git a/R/PipeOpPredictionUnion.R b/R/PipeOpPredictionUnion.R new file mode 100644 index 000000000..3fb6ef034 --- /dev/null +++ b/R/PipeOpPredictionUnion.R @@ -0,0 +1,136 @@ +#' @title PipeOpPredictionUnion +#' +#' @usage NULL +#' @name mlr_pipeops_predictionunion +#' @format [`R6Class`] object inheriting from [`PipeOp`]. +#' +#' @description +#' Unite predictions from all input predictions into a single +#' [`Prediction`][mlr3::Prediction]. +#' +#' `task_type`s and `predict_types` must be equal across all input predictions. +#' +#' Note that predictions are combined as is, i.e., no checks for duplicated row +#' identifiers etc. are performed. +#' +#' Currently only supports task types `classif` and `regr` by constructing a new +#' [`PredictionClassif`][mlr3::PredictionClassif] and respectively +#' [`PredictionRegr`][mlr3::PredictionRegr]. +#' +#' @section Construction: +#' ``` +#' PipeOpPredictionUnion$new(innum = 0, id = "predictionunion", param_vals = list()) +#' ``` +#' +#' * `innum` :: `numeric(1)` | `character`\cr +#' Determines the number of input channels. If `innum` is 0 (default), a vararg input channel is +#' created that can take an arbitrary number of inputs. If `innum` is a `character` vector, the +#' number of input channels is the length of `innum`. +#' * `id` :: `character(1)`\cr +#' Identifier of the resulting object, default `"predictionunion"`. +#' * `param_vals` :: named `list`\cr +#' List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise +#' be set during construction. Default `list()`. +#' +#' @section Input and Output Channels: +#' [`PipeOpPredictionUnion`] has multiple input channels depending on the `innum` construction +#' argument, named `"input1"`, `"input2"`, ... if `innum` is nonzero; if `innum` is 0, there is only +#' one *vararg* input channel named `"..."`. All input channels take `NULL` during training and a +#' [`Prediction`][mlr3::Prediction] during prediction. +#' +#' [`PipeOpPredictionUnion`] has one output channel named `"output"`, producing `NULL` during +#' training and a [`Prediction`][mlr3::Prediction] during prediction. +#' +#' The output during prediction is a [`Prediction`][mlr3::Prediction] constructed by combining all +#' input [`Prediction`][mlr3::Prediction]s. +#' +#' @section State: +#' The `$state` is left empty (`list()`). +#' +#' @section Parameters: +#' [`PipeOpPredictionUnion`] has no Parameters. +#' +#' @section Internals: +#' Only sets the fields `row_ids`, `truth`, `response` and if applicable `prob` and `se` during +#' construction of the output [`Prediction`][mlr3::Prediction]. +#' +#' @section Fields: +#' Only fields inherited from [`PipeOp`]. +#' +#' @section Methods: +#' Only methods inherited from [`PipeOp`]. +#' +#' @family PipeOps +#' @include PipeOp.R +#' @export +#' @examples +#' library("mlr3") +#' +#' task = tsk("iris") +#' filter = expression(Sepal.Length < median(Sepal.Length)) +#' gr = po("copy", outnum = 2) %>>% gunion(list( +#' po("filterrows", id = "filter1", +#' param_vals = list(filter = filter)) %>>% +#' lrn("classif.rpart", id = "learner1"), +#' po("filterrows", id = "filter2", +#' param_vals = list(filter = filter, invert = TRUE)) %>>% +#' lrn("classif.rpart", id = "learner2") +#' )) %>>% po("predictionunion") +#' +#' gr$train(task) +#' gr$predict(task) +PipeOpPredictionUnion = R6Class("PipeOpPredictionUnion", + inherit = PipeOp, + public = list( + initialize = function(innum = 0L, id = "predictionunion", param_vals = list()) { + assert( + check_int(innum, lower = 0L), + check_character(innum, min.len = 1L, any.missing = FALSE) + ) + if (!is.numeric(innum)) { + innum = length(innum) + } + inname = if (innum) rep_suffix("input", innum) else "..." + super$initialize(id, param_vals = param_vals, + input = data.table(name = inname, train = "NULL", predict = "Prediction"), + output = data.table(name = "output", train = "NULL", predict = "Prediction")) + } + ), + private = list( + .train = function(inputs) { + self$state = list() + list(NULL) + }, + .predict = function(inputs) { + # currently only works for task_type "classif" or "regr" + check = all((unlist(map(inputs[-1L], .f = `[[`, "task_type")) == inputs[[1L]]$task_type) & + unlist(map(inputs[-1L], .f = `[[`, "predict_types")) == inputs[[1L]]$predict_types) + if (!check) { + stopf("Can only unite predictions of the same task type and predict types.") + } + + type = inputs[[1L]]$task_type + if (type %nin% c("classif", "regr")) { + stopf("Currently only supports task types `classif` and `regr`.") + } + + row_ids = unlist(map(inputs, .f = `[[`, "row_ids"), use.names = FALSE) + truth = unlist(map(inputs, .f = `[[`, "truth"), use.names = FALSE) + response = unlist(map(inputs, .f = `[[`, "response"), use.names = FALSE) + + prediction = + if(type == "classif") { + prob = do.call(rbind, map(inputs, .f = `[[`, "prob")) + PredictionClassif$new(row_ids = row_ids, truth = truth, response = response, prob = prob) + } else { + se = unlist(map(inputs, .f = `[[`, "se"), use.names = FALSE) + if (length(se) == 0L) se = NULL + PredictionRegr$new(row_ids = row_ids, truth = truth, response = response, se = se) + } + + list(prediction) + } + ) +) + +mlr_pipeops$add("predictionunion", PipeOpPredictionUnion) diff --git a/man/PipeOp.Rd b/man/PipeOp.Rd index 21479adfc..642907b6a 100644 --- a/man/PipeOp.Rd +++ b/man/PipeOp.Rd @@ -226,6 +226,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -243,6 +244,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/PipeOpEnsemble.Rd b/man/PipeOpEnsemble.Rd index 58fabe329..5082edaa2 100644 --- a/man/PipeOpEnsemble.Rd +++ b/man/PipeOpEnsemble.Rd @@ -109,6 +109,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -126,6 +127,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/PipeOpImpute.Rd b/man/PipeOpImpute.Rd index 030c747bc..68cedb604 100644 --- a/man/PipeOpImpute.Rd +++ b/man/PipeOpImpute.Rd @@ -130,6 +130,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -147,6 +148,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/PipeOpProxy.Rd b/man/PipeOpProxy.Rd index 06ce69248..5557cc9a5 100644 --- a/man/PipeOpProxy.Rd +++ b/man/PipeOpProxy.Rd @@ -105,6 +105,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -122,6 +123,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/PipeOpTaskPreproc.Rd b/man/PipeOpTaskPreproc.Rd index dc5c1234b..5f86fc53c 100644 --- a/man/PipeOpTaskPreproc.Rd +++ b/man/PipeOpTaskPreproc.Rd @@ -197,6 +197,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -214,6 +215,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops.Rd b/man/mlr_pipeops.Rd index 3bf6cc90f..9191fbdac 100644 --- a/man/mlr_pipeops.Rd +++ b/man/mlr_pipeops.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -101,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_boxcox.Rd b/man/mlr_pipeops_boxcox.Rd index 16529dc31..6b335f5b2 100644 --- a/man/mlr_pipeops_boxcox.Rd +++ b/man/mlr_pipeops_boxcox.Rd @@ -95,6 +95,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -112,6 +113,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_branch.Rd b/man/mlr_pipeops_branch.Rd index f3feffeac..01a3b7337 100644 --- a/man/mlr_pipeops_branch.Rd +++ b/man/mlr_pipeops_branch.Rd @@ -115,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -132,6 +133,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_chunk.Rd b/man/mlr_pipeops_chunk.Rd index 12fd9ec0e..d42a4f311 100644 --- a/man/mlr_pipeops_chunk.Rd +++ b/man/mlr_pipeops_chunk.Rd @@ -94,6 +94,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -111,6 +112,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_classbalancing.Rd b/man/mlr_pipeops_classbalancing.Rd index b2fda8775..077c1127e 100644 --- a/man/mlr_pipeops_classbalancing.Rd +++ b/man/mlr_pipeops_classbalancing.Rd @@ -135,6 +135,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -152,6 +153,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_classifavg.Rd b/man/mlr_pipeops_classifavg.Rd index e1464b311..ef4958d1a 100644 --- a/man/mlr_pipeops_classifavg.Rd +++ b/man/mlr_pipeops_classifavg.Rd @@ -102,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -119,6 +120,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_classweights.Rd b/man/mlr_pipeops_classweights.Rd index 4387ddaaa..447534767 100644 --- a/man/mlr_pipeops_classweights.Rd +++ b/man/mlr_pipeops_classweights.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -120,6 +121,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_colapply.Rd b/man/mlr_pipeops_colapply.Rd index 94a704eac..a1eac4e49 100644 --- a/man/mlr_pipeops_colapply.Rd +++ b/man/mlr_pipeops_colapply.Rd @@ -124,6 +124,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -141,6 +142,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_collapsefactors.Rd b/man/mlr_pipeops_collapsefactors.Rd index 5a380a30d..96370f7b2 100644 --- a/man/mlr_pipeops_collapsefactors.Rd +++ b/man/mlr_pipeops_collapsefactors.Rd @@ -91,6 +91,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -108,6 +109,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_copy.Rd b/man/mlr_pipeops_copy.Rd index 8623b9ccb..be46dcb8f 100644 --- a/man/mlr_pipeops_copy.Rd +++ b/man/mlr_pipeops_copy.Rd @@ -113,6 +113,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -130,6 +131,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_datefeatures.Rd b/man/mlr_pipeops_datefeatures.Rd index f7d63913e..5ad3152b4 100644 --- a/man/mlr_pipeops_datefeatures.Rd +++ b/man/mlr_pipeops_datefeatures.Rd @@ -130,6 +130,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -147,6 +148,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_encode.Rd b/man/mlr_pipeops_encode.Rd index 9e2266a5b..4f8eebdf6 100644 --- a/man/mlr_pipeops_encode.Rd +++ b/man/mlr_pipeops_encode.Rd @@ -116,6 +116,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -133,6 +134,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_encodeimpact.Rd b/man/mlr_pipeops_encodeimpact.Rd index 875e3e83b..3d3e7519e 100644 --- a/man/mlr_pipeops_encodeimpact.Rd +++ b/man/mlr_pipeops_encodeimpact.Rd @@ -108,6 +108,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -125,6 +126,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_encodelmer.Rd b/man/mlr_pipeops_encodelmer.Rd index b79e35c2a..bc8f841a4 100644 --- a/man/mlr_pipeops_encodelmer.Rd +++ b/man/mlr_pipeops_encodelmer.Rd @@ -119,6 +119,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -136,6 +137,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_featureunion.Rd b/man/mlr_pipeops_featureunion.Rd index 41888cf0d..7be40aacc 100644 --- a/man/mlr_pipeops_featureunion.Rd +++ b/man/mlr_pipeops_featureunion.Rd @@ -124,6 +124,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -141,6 +142,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_filter.Rd b/man/mlr_pipeops_filter.Rd index 817b0ad20..65ccb3cc7 100644 --- a/man/mlr_pipeops_filter.Rd +++ b/man/mlr_pipeops_filter.Rd @@ -138,6 +138,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, \code{\link{mlr_pipeops_ica}}, @@ -154,6 +155,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd new file mode 100644 index 000000000..0216b2cbd --- /dev/null +++ b/man/mlr_pipeops_filterrows.Rd @@ -0,0 +1,155 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PipeOpFilterRows.R +\name{mlr_pipeops_filterrows} +\alias{mlr_pipeops_filterrows} +\alias{PipeOpFilterRows} +\title{PipeOpFilterRows} +\format{ +\code{\link{R6Class}} object inheriting from \code{\link{PipeOpTaskPreproc}}. +} +\description{ +Filter rows of the data of a task. Also directly allows for the removal of rows holding missing +values. +} +\section{Construction}{ +\preformatted{PipeOpFilterRows$new(id = "filterrows", param_vals = list()) +} +\itemize{ +\item \code{id} :: \code{character(1)}\cr +Identifier of resulting object, default \code{"filterrows"}. +\item \code{param_vals} :: named \code{list}\cr +List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise +be set during construction. Default \code{list()}. +} +} + +\section{Input and Output Channels}{ + +Input and output channels are inherited from \code{\link{PipeOpTaskPreproc}}. + +The output during training is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the +filtering (see Parameters) and (possible) rows with missing values removed. + +The output during prediction is the unchanged input \code{\link[mlr3:Task]{Task}} if the parameter +\code{skip_during_predict} is \code{TRUE}. Otherwise it is analogously handled as the output during +training. +} + +\section{State}{ + +The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}, +as well as the following elements: +\itemize{ +\item \code{na_ids} :: \code{integer}\cr +The row identifiers that had missing values during training and therefore were removed. See the +parameter \code{na_column}. +\item \code{row_ids} :: \code{integer}\cr +The row identifiers that were kept during training according to the parameters \code{filter}, +\code{na_column} and \code{invert}. +} +} + +\section{Parameters}{ + +The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}}, as well as: +\itemize{ +\item \code{filter} :: \code{NULL} | \code{character(1)} | \code{expression} | \code{integer}\cr +How the rows of the data of the input \code{\link[mlr3:Task]{Task}} should be filtered. This can be a +character vector of length 1 indicating a feature column of logicals in the data of the input +\code{\link[mlr3:Task]{Task}} which forms the basis of the filtering, i.e., all rows that are \code{TRUE} +with respect to this column are kept in the data of the output \code{\link[mlr3:Task]{Task}}. Moreover, +this can be an expression that will result in a logical vector of length \verb{$nrow} of the data of +the input \code{\link[mlr3:Task]{Task}} when evaluated withing the environment of the \verb{$data()} of the +input \code{\link[mlr3:Task]{Task}}. Finally, this can also be an integerish vector that directly +specifies the row identifiers of the rows of the data of the input \code{\link[mlr3:Task]{Task}} that +should be kept. Default is \code{NULL}, i.e., no filtering is done. +\item \code{na_column} :: \code{NULL} | \code{character}\cr +A character vector that specifies the columns of the data of the input \code{\link[mlr3:Task]{Task}} +that should be checked for missing values. If set to \code{all}, all columns of the data are used. A +row is removed if at least one missing value is found with respect to the columns specified. +Default is \code{NULL}, i.e., no removal of missing values is done. +\item \code{invert} :: \code{logical(1)}\cr +Should the filtering rule be set-theoretically inverted? Note that this happens after +(possible) missing values were removed if \code{na_column} is specified. Default is \code{FALSE}. +\item \code{skip_during_predict} :: \code{logical(1)}\cr +Should the filtering and missing value removal steps be skipped during prediction? If \code{TRUE}, +the input \code{\link[mlr3:Task]{Task}} is returned unaltered during prediction. Default is \code{FALSE}. +} +} + +\section{Internals}{ + +Uses the \code{\link[base:is.na]{is.na()}} function for the checking of missing values. +} + +\section{Methods}{ + +Only methods inherited from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. +} + +\examples{ +library("mlr3") +task = tsk("pima") +po = PipeOpFilterRows$new(param_vals = list( + filter = expression(age < median(age) & mass > 30), + na_column = "all") +) +po$train(list(task)) +po$state +} +\seealso{ +Other PipeOps: +\code{\link{PipeOpEnsemble}}, +\code{\link{PipeOpImpute}}, +\code{\link{PipeOpProxy}}, +\code{\link{PipeOpTaskPreproc}}, +\code{\link{PipeOp}}, +\code{\link{mlr_pipeops_boxcox}}, +\code{\link{mlr_pipeops_branch}}, +\code{\link{mlr_pipeops_chunk}}, +\code{\link{mlr_pipeops_classbalancing}}, +\code{\link{mlr_pipeops_classifavg}}, +\code{\link{mlr_pipeops_classweights}}, +\code{\link{mlr_pipeops_colapply}}, +\code{\link{mlr_pipeops_collapsefactors}}, +\code{\link{mlr_pipeops_copy}}, +\code{\link{mlr_pipeops_datefeatures}}, +\code{\link{mlr_pipeops_encodeimpact}}, +\code{\link{mlr_pipeops_encodelmer}}, +\code{\link{mlr_pipeops_encode}}, +\code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filter}}, +\code{\link{mlr_pipeops_fixfactors}}, +\code{\link{mlr_pipeops_histbin}}, +\code{\link{mlr_pipeops_ica}}, +\code{\link{mlr_pipeops_imputehist}}, +\code{\link{mlr_pipeops_imputemean}}, +\code{\link{mlr_pipeops_imputemedian}}, +\code{\link{mlr_pipeops_imputemode}}, +\code{\link{mlr_pipeops_imputenewlvl}}, +\code{\link{mlr_pipeops_imputesample}}, +\code{\link{mlr_pipeops_kernelpca}}, +\code{\link{mlr_pipeops_learner}}, +\code{\link{mlr_pipeops_missind}}, +\code{\link{mlr_pipeops_modelmatrix}}, +\code{\link{mlr_pipeops_mutate}}, +\code{\link{mlr_pipeops_nop}}, +\code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, +\code{\link{mlr_pipeops_quantilebin}}, +\code{\link{mlr_pipeops_regravg}}, +\code{\link{mlr_pipeops_removeconstants}}, +\code{\link{mlr_pipeops_scalemaxabs}}, +\code{\link{mlr_pipeops_scalerange}}, +\code{\link{mlr_pipeops_scale}}, +\code{\link{mlr_pipeops_select}}, +\code{\link{mlr_pipeops_smote}}, +\code{\link{mlr_pipeops_spatialsign}}, +\code{\link{mlr_pipeops_subsample}}, +\code{\link{mlr_pipeops_textvectorizer}}, +\code{\link{mlr_pipeops_threshold}}, +\code{\link{mlr_pipeops_unbranch}}, +\code{\link{mlr_pipeops_yeojohnson}}, +\code{\link{mlr_pipeops}} +} +\concept{PipeOps} diff --git a/man/mlr_pipeops_fixfactors.Rd b/man/mlr_pipeops_fixfactors.Rd index 8092c239f..df209d220 100644 --- a/man/mlr_pipeops_fixfactors.Rd +++ b/man/mlr_pipeops_fixfactors.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_histbin}}, \code{\link{mlr_pipeops_ica}}, @@ -100,6 +101,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_histbin.Rd b/man/mlr_pipeops_histbin.Rd index 40a11b330..745e722e3 100644 --- a/man/mlr_pipeops_histbin.Rd +++ b/man/mlr_pipeops_histbin.Rd @@ -96,6 +96,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_ica}}, @@ -112,6 +113,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_ica.Rd b/man/mlr_pipeops_ica.Rd index e74f40941..ce68478ba 100644 --- a/man/mlr_pipeops_ica.Rd +++ b/man/mlr_pipeops_ica.Rd @@ -122,6 +122,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -138,6 +139,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputehist.Rd b/man/mlr_pipeops_imputehist.Rd index 97cce6cd4..b21a1981b 100644 --- a/man/mlr_pipeops_imputehist.Rd +++ b/man/mlr_pipeops_imputehist.Rd @@ -83,6 +83,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -99,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputemean.Rd b/man/mlr_pipeops_imputemean.Rd index 65b93f2a9..1f045cdd6 100644 --- a/man/mlr_pipeops_imputemean.Rd +++ b/man/mlr_pipeops_imputemean.Rd @@ -83,6 +83,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -99,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputemedian.Rd b/man/mlr_pipeops_imputemedian.Rd index 5b2fd1952..9351869a1 100644 --- a/man/mlr_pipeops_imputemedian.Rd +++ b/man/mlr_pipeops_imputemedian.Rd @@ -83,6 +83,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -99,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputemode.Rd b/man/mlr_pipeops_imputemode.Rd index 14f37c460..d78dec3f4 100644 --- a/man/mlr_pipeops_imputemode.Rd +++ b/man/mlr_pipeops_imputemode.Rd @@ -90,6 +90,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -106,6 +107,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputenewlvl.Rd b/man/mlr_pipeops_imputenewlvl.Rd index 003bd11df..ef2086550 100644 --- a/man/mlr_pipeops_imputenewlvl.Rd +++ b/man/mlr_pipeops_imputenewlvl.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -100,6 +101,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_imputesample.Rd b/man/mlr_pipeops_imputesample.Rd index 191c8bc1c..d63d6dd39 100644 --- a/man/mlr_pipeops_imputesample.Rd +++ b/man/mlr_pipeops_imputesample.Rd @@ -85,6 +85,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -101,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_kernelpca.Rd b/man/mlr_pipeops_kernelpca.Rd index cb2319de8..a19625d79 100644 --- a/man/mlr_pipeops_kernelpca.Rd +++ b/man/mlr_pipeops_kernelpca.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -113,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_learner.Rd b/man/mlr_pipeops_learner.Rd index e7af54e53..95fa74c62 100644 --- a/man/mlr_pipeops_learner.Rd +++ b/man/mlr_pipeops_learner.Rd @@ -114,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -130,6 +131,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_missind.Rd b/man/mlr_pipeops_missind.Rd index 4389f6e3a..67edafe74 100644 --- a/man/mlr_pipeops_missind.Rd +++ b/man/mlr_pipeops_missind.Rd @@ -109,6 +109,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -125,6 +126,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_modelmatrix.Rd b/man/mlr_pipeops_modelmatrix.Rd index 9ba6d24c1..9e5d8dc3b 100644 --- a/man/mlr_pipeops_modelmatrix.Rd +++ b/man/mlr_pipeops_modelmatrix.Rd @@ -89,6 +89,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -105,6 +106,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_mutate.Rd b/man/mlr_pipeops_mutate.Rd index 65c6eff5b..071326a67 100644 --- a/man/mlr_pipeops_mutate.Rd +++ b/man/mlr_pipeops_mutate.Rd @@ -99,6 +99,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -115,6 +116,7 @@ Other PipeOps: \code{\link{mlr_pipeops_modelmatrix}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_nop.Rd b/man/mlr_pipeops_nop.Rd index 4595b71f4..291860ea3 100644 --- a/man/mlr_pipeops_nop.Rd +++ b/man/mlr_pipeops_nop.Rd @@ -91,6 +91,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -107,6 +108,7 @@ Other PipeOps: \code{\link{mlr_pipeops_modelmatrix}}, \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_pca.Rd b/man/mlr_pipeops_pca.Rd index 3e6fdf578..ec3c9f410 100644 --- a/man/mlr_pipeops_pca.Rd +++ b/man/mlr_pipeops_pca.Rd @@ -100,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -116,6 +117,7 @@ Other PipeOps: \code{\link{mlr_pipeops_modelmatrix}}, \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_predictionunion.Rd b/man/mlr_pipeops_predictionunion.Rd new file mode 100644 index 000000000..536cf4159 --- /dev/null +++ b/man/mlr_pipeops_predictionunion.Rd @@ -0,0 +1,151 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PipeOpPredictionUnion.R +\name{mlr_pipeops_predictionunion} +\alias{mlr_pipeops_predictionunion} +\alias{PipeOpPredictionUnion} +\title{PipeOpPredictionUnion} +\format{ +\code{\link{R6Class}} object inheriting from \code{\link{PipeOp}}. +} +\description{ +Unite predictions from all input predictions into a single +\code{\link[mlr3:Prediction]{Prediction}}. + +\code{task_type}s and \code{predict_types} must be equal across all input predictions. + +Note that predictions are combined as is, i.e., no checks for duplicated row +identifiers etc. are performed. + +Currently only supports task types \code{classif} and \code{regr} by constructing a new +\code{\link[mlr3:PredictionClassif]{PredictionClassif}} and respectively +\code{\link[mlr3:PredictionRegr]{PredictionRegr}}. +} +\section{Construction}{ +\preformatted{PipeOpPredictionUnion$new(innum = 0, id = "predictionunion", param_vals = list()) +} +\itemize{ +\item \code{innum} :: \code{numeric(1)} | \code{character}\cr +Determines the number of input channels. If \code{innum} is 0 (default), a vararg input channel is +created that can take an arbitrary number of inputs. If \code{innum} is a \code{character} vector, the +number of input channels is the length of \code{innum}. +\item \code{id} :: \code{character(1)}\cr +Identifier of the resulting object, default \code{"predictionunion"}. +\item \code{param_vals} :: named \code{list}\cr +List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise +be set during construction. Default \code{list()}. +} +} + +\section{Input and Output Channels}{ + +\code{\link{PipeOpPredictionUnion}} has multiple input channels depending on the \code{innum} construction +argument, named \code{"input1"}, \code{"input2"}, ... if \code{innum} is nonzero; if \code{innum} is 0, there is only +one \emph{vararg} input channel named \code{"..."}. All input channels take \code{NULL} during training and a +\code{\link[mlr3:Prediction]{Prediction}} during prediction. + +\code{\link{PipeOpPredictionUnion}} has one output channel named \code{"output"}, producing \code{NULL} during +training and a \code{\link[mlr3:Prediction]{Prediction}} during prediction. + +The output during prediction is a \code{\link[mlr3:Prediction]{Prediction}} constructed by combining all +input \code{\link[mlr3:Prediction]{Prediction}}s. +} + +\section{State}{ + +The \verb{$state} is left empty (\code{list()}). +} + +\section{Parameters}{ + +\code{\link{PipeOpPredictionUnion}} has no Parameters. +} + +\section{Internals}{ + +Only sets the fields \code{row_ids}, \code{truth}, \code{response} and if applicable \code{prob} and \code{se} during +construction of the output \code{\link[mlr3:Prediction]{Prediction}}. +} + +\section{Fields}{ + +Only fields inherited from \code{\link{PipeOp}}. +} + +\section{Methods}{ + +Only methods inherited from \code{\link{PipeOp}}. +} + +\examples{ +library("mlr3") + +task = tsk("iris") +filter = expression(Sepal.Length < median(Sepal.Length)) +gr = po("copy", outnum = 2) \%>>\% gunion(list( + po("filterrows", id = "filter1", + param_vals = list(filter = filter)) \%>>\% + lrn("classif.rpart", id = "learner1"), + po("filterrows", id = "filter2", + param_vals = list(filter = filter, invert = TRUE)) \%>>\% + lrn("classif.rpart", id = "learner2") +)) \%>>\% po("predictionunion") + +gr$train(task) +gr$predict(task) +} +\seealso{ +Other PipeOps: +\code{\link{PipeOpEnsemble}}, +\code{\link{PipeOpImpute}}, +\code{\link{PipeOpProxy}}, +\code{\link{PipeOpTaskPreproc}}, +\code{\link{PipeOp}}, +\code{\link{mlr_pipeops_boxcox}}, +\code{\link{mlr_pipeops_branch}}, +\code{\link{mlr_pipeops_chunk}}, +\code{\link{mlr_pipeops_classbalancing}}, +\code{\link{mlr_pipeops_classifavg}}, +\code{\link{mlr_pipeops_classweights}}, +\code{\link{mlr_pipeops_colapply}}, +\code{\link{mlr_pipeops_collapsefactors}}, +\code{\link{mlr_pipeops_copy}}, +\code{\link{mlr_pipeops_datefeatures}}, +\code{\link{mlr_pipeops_encodeimpact}}, +\code{\link{mlr_pipeops_encodelmer}}, +\code{\link{mlr_pipeops_encode}}, +\code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, +\code{\link{mlr_pipeops_filter}}, +\code{\link{mlr_pipeops_fixfactors}}, +\code{\link{mlr_pipeops_histbin}}, +\code{\link{mlr_pipeops_ica}}, +\code{\link{mlr_pipeops_imputehist}}, +\code{\link{mlr_pipeops_imputemean}}, +\code{\link{mlr_pipeops_imputemedian}}, +\code{\link{mlr_pipeops_imputemode}}, +\code{\link{mlr_pipeops_imputenewlvl}}, +\code{\link{mlr_pipeops_imputesample}}, +\code{\link{mlr_pipeops_kernelpca}}, +\code{\link{mlr_pipeops_learner}}, +\code{\link{mlr_pipeops_missind}}, +\code{\link{mlr_pipeops_modelmatrix}}, +\code{\link{mlr_pipeops_mutate}}, +\code{\link{mlr_pipeops_nop}}, +\code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_quantilebin}}, +\code{\link{mlr_pipeops_regravg}}, +\code{\link{mlr_pipeops_removeconstants}}, +\code{\link{mlr_pipeops_scalemaxabs}}, +\code{\link{mlr_pipeops_scalerange}}, +\code{\link{mlr_pipeops_scale}}, +\code{\link{mlr_pipeops_select}}, +\code{\link{mlr_pipeops_smote}}, +\code{\link{mlr_pipeops_spatialsign}}, +\code{\link{mlr_pipeops_subsample}}, +\code{\link{mlr_pipeops_textvectorizer}}, +\code{\link{mlr_pipeops_threshold}}, +\code{\link{mlr_pipeops_unbranch}}, +\code{\link{mlr_pipeops_yeojohnson}}, +\code{\link{mlr_pipeops}} +} +\concept{PipeOps} diff --git a/man/mlr_pipeops_quantilebin.Rd b/man/mlr_pipeops_quantilebin.Rd index 2e56732de..69eb391ea 100644 --- a/man/mlr_pipeops_quantilebin.Rd +++ b/man/mlr_pipeops_quantilebin.Rd @@ -88,6 +88,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -105,6 +106,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, \code{\link{mlr_pipeops_scalemaxabs}}, diff --git a/man/mlr_pipeops_regravg.Rd b/man/mlr_pipeops_regravg.Rd index f27102315..f1f5f82b7 100644 --- a/man/mlr_pipeops_regravg.Rd +++ b/man/mlr_pipeops_regravg.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -114,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_removeconstants}}, \code{\link{mlr_pipeops_scalemaxabs}}, diff --git a/man/mlr_pipeops_removeconstants.Rd b/man/mlr_pipeops_removeconstants.Rd index 4c073ce50..eb845aa8b 100644 --- a/man/mlr_pipeops_removeconstants.Rd +++ b/man/mlr_pipeops_removeconstants.Rd @@ -93,6 +93,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -110,6 +111,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_scalemaxabs}}, diff --git a/man/mlr_pipeops_scale.Rd b/man/mlr_pipeops_scale.Rd index 894966526..a1106fd1f 100644 --- a/man/mlr_pipeops_scale.Rd +++ b/man/mlr_pipeops_scale.Rd @@ -101,6 +101,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -118,6 +119,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_scalemaxabs.Rd b/man/mlr_pipeops_scalemaxabs.Rd index 7c0cb6d12..42fb8fe84 100644 --- a/man/mlr_pipeops_scalemaxabs.Rd +++ b/man/mlr_pipeops_scalemaxabs.Rd @@ -82,6 +82,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -99,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_scalerange.Rd b/man/mlr_pipeops_scalerange.Rd index ffb726879..b056a0e63 100644 --- a/man/mlr_pipeops_scalerange.Rd +++ b/man/mlr_pipeops_scalerange.Rd @@ -86,6 +86,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -103,6 +104,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_select.Rd b/man/mlr_pipeops_select.Rd index 99c8570d3..21f411009 100644 --- a/man/mlr_pipeops_select.Rd +++ b/man/mlr_pipeops_select.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -120,6 +121,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_smote.Rd b/man/mlr_pipeops_smote.Rd index 6ed6082a3..3fcf4cf4d 100644 --- a/man/mlr_pipeops_smote.Rd +++ b/man/mlr_pipeops_smote.Rd @@ -106,6 +106,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -123,6 +124,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_spatialsign.Rd b/man/mlr_pipeops_spatialsign.Rd index 02ba82df1..f4b80d9c8 100644 --- a/man/mlr_pipeops_spatialsign.Rd +++ b/man/mlr_pipeops_spatialsign.Rd @@ -82,6 +82,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -99,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_subsample.Rd b/man/mlr_pipeops_subsample.Rd index d34a6b259..aed2ac300 100644 --- a/man/mlr_pipeops_subsample.Rd +++ b/man/mlr_pipeops_subsample.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -114,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_textvectorizer.Rd b/man/mlr_pipeops_textvectorizer.Rd index 2ded507bf..75af2f761 100644 --- a/man/mlr_pipeops_textvectorizer.Rd +++ b/man/mlr_pipeops_textvectorizer.Rd @@ -182,6 +182,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -199,6 +200,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_threshold.Rd b/man/mlr_pipeops_threshold.Rd index 2d921797f..75e96953f 100644 --- a/man/mlr_pipeops_threshold.Rd +++ b/man/mlr_pipeops_threshold.Rd @@ -87,6 +87,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -104,6 +105,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_unbranch.Rd b/man/mlr_pipeops_unbranch.Rd index fc53b1934..1a04ae478 100644 --- a/man/mlr_pipeops_unbranch.Rd +++ b/man/mlr_pipeops_unbranch.Rd @@ -94,6 +94,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -111,6 +112,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_yeojohnson.Rd b/man/mlr_pipeops_yeojohnson.Rd index 04532a33f..5fa90b145 100644 --- a/man/mlr_pipeops_yeojohnson.Rd +++ b/man/mlr_pipeops_yeojohnson.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -114,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R new file mode 100644 index 000000000..abc239451 --- /dev/null +++ b/tests/testthat/test_pipeop_filterrows.R @@ -0,0 +1,90 @@ +context("PipeOpFilterRows") + +test_that("PipeOpFilterRows - basic properties", { + op = PipeOpFilterRows$new() + task = mlr_tasks$get("iris") + expect_pipeop(op) + expect_equal(train_pipeop(op, inputs = list(task))[[1L]], task) + expect_equal(predict_pipeop(op, inputs = list(task))[[1L]], task) + + expect_datapreproc_pipeop_class(PipeOpFilterRows, task = task) +}) + +test_that("PipeOpFilterRows - NA handling", { + dat = iris + dat$Sepal.Length[c(1L, 3L, 100L)] = NA + dat$Petal.Width[c(1L, 4L)] = NA + dat$Petal.Length[5] = NA + dat$Species[2L] = NA + task = TaskClassif$new("test", backend = dat, target = "Species") + + op = PipeOpFilterRows$new(param_vals = list(na_column = "Species")) + + op$param_set$values$na_column = "Sepal.Length" + train_out1 = op$train(list(task))[[1L]] + expect_equal(op$state$na_ids, c(1, 3, 100)) + + op$param_set$values$na_column = "all" + train_out2 = op$train(list(task))[[1L]] + expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) + predict_out2 = op$predict(list(task))[[1L]] + expect_equal(train_out2, predict_out2) + + op$param_set$values$skip_during_predict = TRUE + expect_equal(op$predict(list(task))[[1L]], task) +}) + +test_that("PipeOpFilterRows - filter by column name", { + set.seed(1) + dat = iris + dat$filter = sample(c(FALSE, TRUE), size = 150, replace = TRUE) + task = TaskClassif$new("test", backend = dat, target = "Species") + + op = PipeOpFilterRows$new(param_vals = list(filter = "filter")) + + train_out1 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, which(dat$filter)) + expect_equal(train_out1$data(), task$data(which(dat$filter))) + + op$param_set$values$invert = TRUE + train_out2 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, which(!dat$filter)) + expect_equal(train_out2$data(), task$data(which(!dat$filter))) +}) + +test_that("PipeOpFilterRows - filter by expression", { + task = mlr_tasks$get("iris") + task_cp = task$clone(deep = TRUE) + + op = PipeOpFilterRows$new(param_vals = list(filter = expression(Sepal.Length < 6 & Petal.Width > 1))) + train_out1 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, which(iris$Sepal.Length < 6 & iris$Petal.Width > 1)) + + # zero indices left + op$param_set$values$filter = expression(Petal.Length < 3 & Petal.Width > 1) + train_out2 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, integer(0)) + expect_equal(train_out2, task_cp$filter(integer(0))) + op$param_set$values$invert = TRUE + train_out3 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, 1:150) + expect_equal(train_out3, task) +}) + +test_that("PipeOpFilterRows - filter by ids", { + # in combination with NA + dat = iris + dat$Sepal.Length[c(1L, 3L, 100L)] = NA + dat$Petal.Width[c(1L, 4L)] = NA + dat$Petal.Length[5] = NA + dat$Species[2L] = NA + task = TaskClassif$new("test", backend = dat, target = "Species") + + op = PipeOpFilterRows$new(param_vals = list(na_column = "all", filter = 1:10)) + train_out1 = op$train(list(task))[[1L]] + expect_equal(op$state$row_ids, setdiff(1:10, c(1, 2, 3, 4, 5))) + op$param_set$values$invert = TRUE + train_out2 = op$train(list(task))[[1L]] + expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) + expect_true(!(100 %in% op$state$row_ids)) +}) diff --git a/tests/testthat/test_pipeop_predictionunion.R b/tests/testthat/test_pipeop_predictionunion.R new file mode 100644 index 000000000..ecacf0d60 --- /dev/null +++ b/tests/testthat/test_pipeop_predictionunion.R @@ -0,0 +1,107 @@ +context("PipeOpPredictionUnion") + +test_that("PipeOpPredictionUnion - basic properties", { + po = PipeOpPredictionUnion$new(3) + expect_pipeop(po) + expect_data_table(po$input, nrows = 3) + expect_data_table(po$output, nrows = 1) + + expect_pipeop_class(PipeOpFeatureUnion, list(1)) + expect_pipeop_class(PipeOpFeatureUnion, list(3)) + + po = PipeOpFeatureUnion$new() + expect_pipeop(po) + expect_data_table(po$input, nrows = 1) + expect_data_table(po$output, nrows = 1) +}) + +test_that("PipeOpPredictionUnion - train and predict classif", { + set.seed(1) + ids = sample(150, size = 75) + task1 = mlr_tasks$get("iris") + task2 = task1$clone(deep = TRUE) + task1$filter(ids) + learner1 = mlr_learners$get("classif.rpart") + learner2 = learner1$clone(deep = TRUE) + learner1$train(task1) + learner2$train(task2) + + po = PipeOpPredictionUnion$new(2) + expect_null(po$train(list(NULL, NULL))[[1L]]) + + predict_out1 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] + expect_equal(predict_out1$row_ids, c(task1$row_ids, task2$row_ids)) + expect_equal(predict_out1$truth, + unlist(list(learner1$predict(task1)$truth, learner2$predict(task2)$truth), use.names = FALSE)) + expect_equal(predict_out1$response, + unlist(list(learner1$predict(task1)$response, learner2$predict(task2)$response), use.names = FALSE)) + + learner1$predict_type = "prob" + learner2$predict_type = "prob" + + predict_out2 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] + expect_equal(predict_out2$prob, + do.call(rbind, list(learner1$predict(task1)$prob, learner2$predict(task2)$prob))) + + learner1$predict_type = "response" + expect_error(po$predict(list(learner1$predict(task1), learner2$predict(task2))), regexp = "same task type and predict types") +}) + +test_that("PipeOpPredictionUnion - train and predict regr", { + library(mlr3learners) + set.seed(1) + ids = sample(32, size = 15) + task1 = mlr_tasks$get("mtcars") + task2 = task1$clone(deep = TRUE) + task1$filter(ids) + learner1 = mlr_learners$get("regr.lm") + learner2 = learner1$clone(deep = TRUE) + learner1$train(task1) + learner2$train(task2) + + po = PipeOpPredictionUnion$new(2) + expect_null(po$train(list(NULL, NULL))[[1L]]) + + predict_out1 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] + expect_equal(predict_out1$row_ids, c(task1$row_ids, task2$row_ids)) + expect_equal(predict_out1$truth, + unlist(list(learner1$predict(task1)$truth, learner2$predict(task2)$truth), use.names = FALSE)) + expect_equal(predict_out1$response, + unlist(list(learner1$predict(task1)$response, learner2$predict(task2)$response), use.names = FALSE)) + + learner1$predict_type = "se" + learner2$predict_type = "se" + + predict_out2 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] + expect_equal(predict_out2$se, + unlist(list(learner1$predict(task1)$se, learner2$predict(task2)$se), use.names = FALSE)) + + learner1$predict_type = "response" + expect_error(po$predict(list(learner1$predict(task1), learner2$predict(task2))), regexp = "same task type and predict types") +}) + +test_that("PipeOpFilterRows and PipeOpPredictionUnion - use case", { + task = mlr_tasks$get("pima") + age_ids = which(task$data(cols = "age")[[1L]] < median(task$data(cols = "age")[[1L]])) + na_ids = which(rowSums(is.na(task$data())) > 0L) + filter = expression(age < median(age)) + + g = PipeOpCopy$new(2) %>>% + gunion(list( + PipeOpFilterRows$new("filter1", param_vals = list(filter = filter, na_column = "all")) %>>% + PipeOpLearner$new(LearnerClassifRpart$new(), "learner1"), + PipeOpFilterRows$new("filter2", param_vals = list(filter = filter, na_column = "all", invert = TRUE)) %>>% + PipeOpLearner$new(LearnerClassifRpart$new(), "learner2")) + ) %>>% + PipeOpPredictionUnion$new() + + expect_null(g$train(task)[[1L]]) + expect_equal(g$state$filter1$na_ids, na_ids) + expect_equal(g$state$filter2$na_ids, na_ids) + expect_equal(g$state$filter1$row_ids, age_ids[age_ids %nin% na_ids]) + expect_equal(g$state$filter2$row_ids, setdiff(1:768, age_ids)[setdiff(1:768, age_ids) %nin% na_ids]) + + predict_out = g$predict(task)[[1L]] + expect_prediction(predict_out) + expect_setequal(predict_out$row_ids, setdiff(1:768, na_ids)) +}) From 92c9fa767d20bbae04d41c7ad3994db8bf8ac7fc Mon Sep 17 00:00:00 2001 From: sumny Date: Wed, 22 Apr 2020 13:22:47 +0200 Subject: [PATCH 02/12] adjust defaults, touch up docs, add more tests --- R/PipeOpFilterRows.R | 29 ++++++++++---------- man/mlr_pipeops_filterrows.Rd | 15 +++++----- tests/testthat/test_pipeop_filterrows.R | 20 ++++++++++++-- tests/testthat/test_pipeop_predictionunion.R | 19 +++++++++---- 4 files changed, 54 insertions(+), 29 deletions(-) diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index cd95e60c6..bce9adc18 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -6,7 +6,8 @@ #' #' @description #' Filter rows of the data of a task. Also directly allows for the removal of rows holding missing -#' values. +#' values. If both filtering and missing value removal is performed, filtering is done after missing +#' value removal. #' #' @section Construction: #' ``` @@ -51,17 +52,17 @@ #' input [`Task`][mlr3::Task]. Finally, this can also be an integerish vector that directly #' specifies the row identifiers of the rows of the data of the input [`Task`][mlr3::Task] that #' should be kept. Default is `NULL`, i.e., no filtering is done. -#' * `na_column` :: `NULL` | `character`\cr +#' * `na_column` :: `character`\cr #' A character vector that specifies the columns of the data of the input [`Task`][mlr3::Task] -#' that should be checked for missing values. If set to `all`, all columns of the data are used. A +#' that should be checked for missing values. If set to `_all_`, all columns of the data are used. A #' row is removed if at least one missing value is found with respect to the columns specified. -#' Default is `NULL`, i.e., no removal of missing values is done. +#' Default is `character(0)`, i.e., no removal of missing values is done. #' * `invert` :: `logical(1)`\cr #' Should the filtering rule be set-theoretically inverted? Note that this happens after #' (possible) missing values were removed if `na_column` is specified. Default is `FALSE`. #' * `skip_during_predict` :: `logical(1)`\cr -#' Should the filtering and missing value removal steps be skipped during prediction? If `TRUE`, -#' the input [`Task`][mlr3::Task] is returned unaltered during prediction. Default is `FALSE`. +#' Should the filtering and missing value removal steps be skipped during prediction? Default is +#' `TRUE`, i.e., the input [`Task`][mlr3::Task] is returned unaltered during prediction. #' #' @section Internals: #' Uses the [`is.na()`][base::is.na] function for the checking of missing values. @@ -74,7 +75,7 @@ #' task = tsk("pima") #' po = PipeOpFilterRows$new(param_vals = list( #' filter = expression(age < median(age) & mass > 30), -#' na_column = "all") +#' na_column = "_all_") #' ) #' po$train(list(task)) #' po$state @@ -94,13 +95,13 @@ PipeOpFilterRows = R6Class("PipeOpFilterRows", if (!ok) return("Must either be a character vector of length 1, an expression, or an integerish object of row ids") TRUE }), - ParamUty$new("na_column", default = NULL, tags = c("train", "predict"), custom_check = function(x) { - check_character(x, any.missing = FALSE, min.len = 1L, null.ok = TRUE) + ParamUty$new("na_column", default = character(0L), tags = c("train", "predict"), custom_check = function(x) { + check_character(x, any.missing = FALSE, null.ok = TRUE) }), ParamLgl$new("invert", default = FALSE, tags = c("train", "predict")), - ParamLgl$new("skip_during_predict", default = FALSE, tags = "predict")) + ParamLgl$new("skip_during_predict", default = TRUE, tags = "predict")) ) - ps$values = list(filter = NULL, na_column = NULL, invert = FALSE, skip_during_predict = FALSE) + ps$values = list(filter = NULL, na_column = character(0L), invert = FALSE, skip_during_predict = TRUE) super$initialize(id, param_set = ps, param_vals = param_vals) } ), @@ -114,9 +115,9 @@ PipeOpFilterRows = R6Class("PipeOpFilterRows", # NA column(s) handling na = self$param_set$values$na_column - if (!is.null(na)) { - assert_subset(na, choices = c("all", colnames(task$data()))) - if (na == "all") na = colnames(task$data()) + if (length(na)) { + assert_subset(na, choices = c("_all_", colnames(task$data()))) + if (na == "_all_") na = colnames(task$data()) na_ids = which(rowSums(is.na(task$data(cols = na))) > 0L) row_ids = setdiff(row_ids, na_ids) } else { diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index 0216b2cbd..ee5c87335 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -9,7 +9,8 @@ } \description{ Filter rows of the data of a task. Also directly allows for the removal of rows holding missing -values. +values. If both filtering and missing value removal is performed, filtering is done after missing +value removal. } \section{Construction}{ \preformatted{PipeOpFilterRows$new(id = "filterrows", param_vals = list()) @@ -63,17 +64,17 @@ the input \code{\link[mlr3:Task]{Task}} when evaluated withing the environment o input \code{\link[mlr3:Task]{Task}}. Finally, this can also be an integerish vector that directly specifies the row identifiers of the rows of the data of the input \code{\link[mlr3:Task]{Task}} that should be kept. Default is \code{NULL}, i.e., no filtering is done. -\item \code{na_column} :: \code{NULL} | \code{character}\cr +\item \code{na_column} :: \code{character}\cr A character vector that specifies the columns of the data of the input \code{\link[mlr3:Task]{Task}} -that should be checked for missing values. If set to \code{all}, all columns of the data are used. A +that should be checked for missing values. If set to \verb{_all_}, all columns of the data are used. A row is removed if at least one missing value is found with respect to the columns specified. -Default is \code{NULL}, i.e., no removal of missing values is done. +Default is \code{character(0)}, i.e., no removal of missing values is done. \item \code{invert} :: \code{logical(1)}\cr Should the filtering rule be set-theoretically inverted? Note that this happens after (possible) missing values were removed if \code{na_column} is specified. Default is \code{FALSE}. \item \code{skip_during_predict} :: \code{logical(1)}\cr -Should the filtering and missing value removal steps be skipped during prediction? If \code{TRUE}, -the input \code{\link[mlr3:Task]{Task}} is returned unaltered during prediction. Default is \code{FALSE}. +Should the filtering and missing value removal steps be skipped during prediction? Default is +\code{TRUE}, i.e., the input \code{\link[mlr3:Task]{Task}} is returned unaltered during prediction. } } @@ -92,7 +93,7 @@ library("mlr3") task = tsk("pima") po = PipeOpFilterRows$new(param_vals = list( filter = expression(age < median(age) & mass > 30), - na_column = "all") + na_column = "_all_") ) po$train(list(task)) po$state diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index abc239451..660bc2977 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -8,6 +8,8 @@ test_that("PipeOpFilterRows - basic properties", { expect_equal(predict_pipeop(op, inputs = list(task))[[1L]], task) expect_datapreproc_pipeop_class(PipeOpFilterRows, task = task) + + expect_error(PipeOpFilterRows$new(param_vals = list(filter = list()))) }) test_that("PipeOpFilterRows - NA handling", { @@ -24,7 +26,8 @@ test_that("PipeOpFilterRows - NA handling", { train_out1 = op$train(list(task))[[1L]] expect_equal(op$state$na_ids, c(1, 3, 100)) - op$param_set$values$na_column = "all" + op$param_set$values$na_column = "_all_" + op$param_set$values$skip_during_predict = FALSE train_out2 = op$train(list(task))[[1L]] expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) predict_out2 = op$predict(list(task))[[1L]] @@ -80,7 +83,7 @@ test_that("PipeOpFilterRows - filter by ids", { dat$Species[2L] = NA task = TaskClassif$new("test", backend = dat, target = "Species") - op = PipeOpFilterRows$new(param_vals = list(na_column = "all", filter = 1:10)) + op = PipeOpFilterRows$new(param_vals = list(na_column = "_all_", filter = 1:10)) train_out1 = op$train(list(task))[[1L]] expect_equal(op$state$row_ids, setdiff(1:10, c(1, 2, 3, 4, 5))) op$param_set$values$invert = TRUE @@ -88,3 +91,16 @@ test_that("PipeOpFilterRows - filter by ids", { expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) expect_true(!(100 %in% op$state$row_ids)) }) + +test_that("PipeOpFilterRows - use case iris", { + task = mlr_tasks$get("iris") + + g = PipeOpFilterRows$new(param_vals = list(filter = expression(Sepal.Length < median(Sepal.Length)))) %>>% + PipeOpLearnerCV$new(LearnerClassifRpart$new()) + + train_out = g$train(task)[[1L]] + predict_out = g$predict(task)[[1L]] + expect_equal(g$state$filterrows$row_ids, which(with(task$data(), Sepal.Length < median(Sepal.Length)))) + expect_equal(g$state$filterrows$row_ids, train_out$row_ids) + expect_equal(task$row_ids, predict_out$row_ids) +}) diff --git a/tests/testthat/test_pipeop_predictionunion.R b/tests/testthat/test_pipeop_predictionunion.R index ecacf0d60..73f32578a 100644 --- a/tests/testthat/test_pipeop_predictionunion.R +++ b/tests/testthat/test_pipeop_predictionunion.R @@ -6,13 +6,20 @@ test_that("PipeOpPredictionUnion - basic properties", { expect_data_table(po$input, nrows = 3) expect_data_table(po$output, nrows = 1) - expect_pipeop_class(PipeOpFeatureUnion, list(1)) - expect_pipeop_class(PipeOpFeatureUnion, list(3)) + expect_pipeop_class(PipeOpPredictionUnion, list(1)) + expect_pipeop_class(PipeOpPredictionUnion, list(3)) - po = PipeOpFeatureUnion$new() + po = PipeOpPredictionUnion$new() expect_pipeop(po) expect_data_table(po$input, nrows = 1) expect_data_table(po$output, nrows = 1) + + po = PipeOpPredictionUnion$new(innum = "test") + expect_equal(po$input$name, "input1") + + prediction = PredictionRegr$new(row_ids = 1, truth = 1, response = 1) + prediction$task_type = "test" + expect_error(po$predict(list(prediction)), regexp = "task types") }) test_that("PipeOpPredictionUnion - train and predict classif", { @@ -80,7 +87,7 @@ test_that("PipeOpPredictionUnion - train and predict regr", { expect_error(po$predict(list(learner1$predict(task1), learner2$predict(task2))), regexp = "same task type and predict types") }) -test_that("PipeOpFilterRows and PipeOpPredictionUnion - use case", { +test_that("PipeOpFilterRows and PipeOpPredictionUnion - use case pima", { task = mlr_tasks$get("pima") age_ids = which(task$data(cols = "age")[[1L]] < median(task$data(cols = "age")[[1L]])) na_ids = which(rowSums(is.na(task$data())) > 0L) @@ -88,9 +95,9 @@ test_that("PipeOpFilterRows and PipeOpPredictionUnion - use case", { g = PipeOpCopy$new(2) %>>% gunion(list( - PipeOpFilterRows$new("filter1", param_vals = list(filter = filter, na_column = "all")) %>>% + PipeOpFilterRows$new("filter1", param_vals = list(filter = filter, na_column = "_all_", skip_during_predict = FALSE)) %>>% PipeOpLearner$new(LearnerClassifRpart$new(), "learner1"), - PipeOpFilterRows$new("filter2", param_vals = list(filter = filter, na_column = "all", invert = TRUE)) %>>% + PipeOpFilterRows$new("filter2", param_vals = list(filter = filter, na_column = "_all_", invert = TRUE, skip_during_predict = FALSE)) %>>% PipeOpLearner$new(LearnerClassifRpart$new(), "learner2")) ) %>>% PipeOpPredictionUnion$new() From f7798ad78ca11ae5a90897ee7f386d1a72074cf4 Mon Sep 17 00:00:00 2001 From: sumny Date: Sun, 7 Jun 2020 23:17:16 +0200 Subject: [PATCH 03/12] Merge remote-tracking branch 'origin/master' into pipeop_filterrows --- man/PipeOpTargetTrafo.Rd | 2 ++ man/mlr_pipeops_filterrows.Rd | 5 +++++ man/mlr_pipeops_predictionunion.Rd | 5 +++++ man/mlr_pipeops_targetinverter.Rd | 2 ++ man/mlr_pipeops_targettrafoscalerange.Rd | 4 +++- man/mlr_pipeops_targettrafosimple.Rd | 2 ++ man/mlr_pipeops_updatetarget.Rd | 2 ++ 7 files changed, 21 insertions(+), 1 deletion(-) diff --git a/man/PipeOpTargetTrafo.Rd b/man/PipeOpTargetTrafo.Rd index 1f89ef2e2..1e4b0e5b2 100644 --- a/man/PipeOpTargetTrafo.Rd +++ b/man/PipeOpTargetTrafo.Rd @@ -147,6 +147,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -164,6 +165,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index ee5c87335..62a14e584 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{PipeOpEnsemble}}, \code{\link{PipeOpImpute}}, \code{\link{PipeOpProxy}}, +\code{\link{PipeOpTargetTrafo}}, \code{\link{PipeOpTaskPreproc}}, \code{\link{PipeOp}}, \code{\link{mlr_pipeops_boxcox}}, @@ -147,9 +148,13 @@ Other PipeOps: \code{\link{mlr_pipeops_smote}}, \code{\link{mlr_pipeops_spatialsign}}, \code{\link{mlr_pipeops_subsample}}, +\code{\link{mlr_pipeops_targetinverter}}, +\code{\link{mlr_pipeops_targettrafoscalerange}}, +\code{\link{mlr_pipeops_targettrafosimple}}, \code{\link{mlr_pipeops_textvectorizer}}, \code{\link{mlr_pipeops_threshold}}, \code{\link{mlr_pipeops_unbranch}}, +\code{\link{mlr_pipeops_updatetarget}}, \code{\link{mlr_pipeops_yeojohnson}}, \code{\link{mlr_pipeops}} } diff --git a/man/mlr_pipeops_predictionunion.Rd b/man/mlr_pipeops_predictionunion.Rd index 536cf4159..df5969aca 100644 --- a/man/mlr_pipeops_predictionunion.Rd +++ b/man/mlr_pipeops_predictionunion.Rd @@ -98,6 +98,7 @@ Other PipeOps: \code{\link{PipeOpEnsemble}}, \code{\link{PipeOpImpute}}, \code{\link{PipeOpProxy}}, +\code{\link{PipeOpTargetTrafo}}, \code{\link{PipeOpTaskPreproc}}, \code{\link{PipeOp}}, \code{\link{mlr_pipeops_boxcox}}, @@ -142,9 +143,13 @@ Other PipeOps: \code{\link{mlr_pipeops_smote}}, \code{\link{mlr_pipeops_spatialsign}}, \code{\link{mlr_pipeops_subsample}}, +\code{\link{mlr_pipeops_targetinverter}}, +\code{\link{mlr_pipeops_targettrafoscalerange}}, +\code{\link{mlr_pipeops_targettrafosimple}}, \code{\link{mlr_pipeops_textvectorizer}}, \code{\link{mlr_pipeops_threshold}}, \code{\link{mlr_pipeops_unbranch}}, +\code{\link{mlr_pipeops_updatetarget}}, \code{\link{mlr_pipeops_yeojohnson}}, \code{\link{mlr_pipeops}} } diff --git a/man/mlr_pipeops_targetinverter.Rd b/man/mlr_pipeops_targetinverter.Rd index e71385564..2919d81a2 100644 --- a/man/mlr_pipeops_targetinverter.Rd +++ b/man/mlr_pipeops_targetinverter.Rd @@ -77,6 +77,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -94,6 +95,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_targettrafoscalerange.Rd b/man/mlr_pipeops_targettrafoscalerange.Rd index 274bb598e..7605e795d 100644 --- a/man/mlr_pipeops_targettrafoscalerange.Rd +++ b/man/mlr_pipeops_targettrafoscalerange.Rd @@ -42,7 +42,7 @@ The \verb{$state} is a named \code{list} with a vector of the two transformation The parameters are the parameters inherited from \code{\link{PipeOpTargetTrafo}}, as well as: \itemize{ -\item \code{lower} :: \code{numeric(1)} \cr +\item \code{lower} :: \code{numeric(1)} \cr Target value of smallest item of input target. Default is 0. \item \code{upper} :: \code{numeric(1)} \cr Target value of greatest item of input target. Default is 1. @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -114,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_targettrafosimple.Rd b/man/mlr_pipeops_targettrafosimple.Rd index de7af4af6..e748f9efb 100644 --- a/man/mlr_pipeops_targettrafosimple.Rd +++ b/man/mlr_pipeops_targettrafosimple.Rd @@ -116,6 +116,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -133,6 +134,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, diff --git a/man/mlr_pipeops_updatetarget.Rd b/man/mlr_pipeops_updatetarget.Rd index 5af684f58..fe7e1901f 100644 --- a/man/mlr_pipeops_updatetarget.Rd +++ b/man/mlr_pipeops_updatetarget.Rd @@ -101,6 +101,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, @@ -118,6 +119,7 @@ Other PipeOps: \code{\link{mlr_pipeops_mutate}}, \code{\link{mlr_pipeops_nop}}, \code{\link{mlr_pipeops_pca}}, +\code{\link{mlr_pipeops_predictionunion}}, \code{\link{mlr_pipeops_quantilebin}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, From aca5d5e4beb407828e6f28d9ad8601855ad03683 Mon Sep 17 00:00:00 2001 From: sumny Date: Sat, 10 Oct 2020 14:46:11 +0200 Subject: [PATCH 04/12] drop prediction union - provide separate PR later --- R/PipeOpPredictionUnion.R | 136 ---------------- man/mlr_pipeops_predictionunion.Rd | 156 ------------------- tests/testthat/test_pipeop_predictionunion.R | 114 -------------- 3 files changed, 406 deletions(-) delete mode 100644 R/PipeOpPredictionUnion.R delete mode 100644 man/mlr_pipeops_predictionunion.Rd delete mode 100644 tests/testthat/test_pipeop_predictionunion.R diff --git a/R/PipeOpPredictionUnion.R b/R/PipeOpPredictionUnion.R deleted file mode 100644 index 3fb6ef034..000000000 --- a/R/PipeOpPredictionUnion.R +++ /dev/null @@ -1,136 +0,0 @@ -#' @title PipeOpPredictionUnion -#' -#' @usage NULL -#' @name mlr_pipeops_predictionunion -#' @format [`R6Class`] object inheriting from [`PipeOp`]. -#' -#' @description -#' Unite predictions from all input predictions into a single -#' [`Prediction`][mlr3::Prediction]. -#' -#' `task_type`s and `predict_types` must be equal across all input predictions. -#' -#' Note that predictions are combined as is, i.e., no checks for duplicated row -#' identifiers etc. are performed. -#' -#' Currently only supports task types `classif` and `regr` by constructing a new -#' [`PredictionClassif`][mlr3::PredictionClassif] and respectively -#' [`PredictionRegr`][mlr3::PredictionRegr]. -#' -#' @section Construction: -#' ``` -#' PipeOpPredictionUnion$new(innum = 0, id = "predictionunion", param_vals = list()) -#' ``` -#' -#' * `innum` :: `numeric(1)` | `character`\cr -#' Determines the number of input channels. If `innum` is 0 (default), a vararg input channel is -#' created that can take an arbitrary number of inputs. If `innum` is a `character` vector, the -#' number of input channels is the length of `innum`. -#' * `id` :: `character(1)`\cr -#' Identifier of the resulting object, default `"predictionunion"`. -#' * `param_vals` :: named `list`\cr -#' List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise -#' be set during construction. Default `list()`. -#' -#' @section Input and Output Channels: -#' [`PipeOpPredictionUnion`] has multiple input channels depending on the `innum` construction -#' argument, named `"input1"`, `"input2"`, ... if `innum` is nonzero; if `innum` is 0, there is only -#' one *vararg* input channel named `"..."`. All input channels take `NULL` during training and a -#' [`Prediction`][mlr3::Prediction] during prediction. -#' -#' [`PipeOpPredictionUnion`] has one output channel named `"output"`, producing `NULL` during -#' training and a [`Prediction`][mlr3::Prediction] during prediction. -#' -#' The output during prediction is a [`Prediction`][mlr3::Prediction] constructed by combining all -#' input [`Prediction`][mlr3::Prediction]s. -#' -#' @section State: -#' The `$state` is left empty (`list()`). -#' -#' @section Parameters: -#' [`PipeOpPredictionUnion`] has no Parameters. -#' -#' @section Internals: -#' Only sets the fields `row_ids`, `truth`, `response` and if applicable `prob` and `se` during -#' construction of the output [`Prediction`][mlr3::Prediction]. -#' -#' @section Fields: -#' Only fields inherited from [`PipeOp`]. -#' -#' @section Methods: -#' Only methods inherited from [`PipeOp`]. -#' -#' @family PipeOps -#' @include PipeOp.R -#' @export -#' @examples -#' library("mlr3") -#' -#' task = tsk("iris") -#' filter = expression(Sepal.Length < median(Sepal.Length)) -#' gr = po("copy", outnum = 2) %>>% gunion(list( -#' po("filterrows", id = "filter1", -#' param_vals = list(filter = filter)) %>>% -#' lrn("classif.rpart", id = "learner1"), -#' po("filterrows", id = "filter2", -#' param_vals = list(filter = filter, invert = TRUE)) %>>% -#' lrn("classif.rpart", id = "learner2") -#' )) %>>% po("predictionunion") -#' -#' gr$train(task) -#' gr$predict(task) -PipeOpPredictionUnion = R6Class("PipeOpPredictionUnion", - inherit = PipeOp, - public = list( - initialize = function(innum = 0L, id = "predictionunion", param_vals = list()) { - assert( - check_int(innum, lower = 0L), - check_character(innum, min.len = 1L, any.missing = FALSE) - ) - if (!is.numeric(innum)) { - innum = length(innum) - } - inname = if (innum) rep_suffix("input", innum) else "..." - super$initialize(id, param_vals = param_vals, - input = data.table(name = inname, train = "NULL", predict = "Prediction"), - output = data.table(name = "output", train = "NULL", predict = "Prediction")) - } - ), - private = list( - .train = function(inputs) { - self$state = list() - list(NULL) - }, - .predict = function(inputs) { - # currently only works for task_type "classif" or "regr" - check = all((unlist(map(inputs[-1L], .f = `[[`, "task_type")) == inputs[[1L]]$task_type) & - unlist(map(inputs[-1L], .f = `[[`, "predict_types")) == inputs[[1L]]$predict_types) - if (!check) { - stopf("Can only unite predictions of the same task type and predict types.") - } - - type = inputs[[1L]]$task_type - if (type %nin% c("classif", "regr")) { - stopf("Currently only supports task types `classif` and `regr`.") - } - - row_ids = unlist(map(inputs, .f = `[[`, "row_ids"), use.names = FALSE) - truth = unlist(map(inputs, .f = `[[`, "truth"), use.names = FALSE) - response = unlist(map(inputs, .f = `[[`, "response"), use.names = FALSE) - - prediction = - if(type == "classif") { - prob = do.call(rbind, map(inputs, .f = `[[`, "prob")) - PredictionClassif$new(row_ids = row_ids, truth = truth, response = response, prob = prob) - } else { - se = unlist(map(inputs, .f = `[[`, "se"), use.names = FALSE) - if (length(se) == 0L) se = NULL - PredictionRegr$new(row_ids = row_ids, truth = truth, response = response, se = se) - } - - list(prediction) - } - ) -) - -mlr_pipeops$add("predictionunion", PipeOpPredictionUnion) diff --git a/man/mlr_pipeops_predictionunion.Rd b/man/mlr_pipeops_predictionunion.Rd deleted file mode 100644 index df5969aca..000000000 --- a/man/mlr_pipeops_predictionunion.Rd +++ /dev/null @@ -1,156 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/PipeOpPredictionUnion.R -\name{mlr_pipeops_predictionunion} -\alias{mlr_pipeops_predictionunion} -\alias{PipeOpPredictionUnion} -\title{PipeOpPredictionUnion} -\format{ -\code{\link{R6Class}} object inheriting from \code{\link{PipeOp}}. -} -\description{ -Unite predictions from all input predictions into a single -\code{\link[mlr3:Prediction]{Prediction}}. - -\code{task_type}s and \code{predict_types} must be equal across all input predictions. - -Note that predictions are combined as is, i.e., no checks for duplicated row -identifiers etc. are performed. - -Currently only supports task types \code{classif} and \code{regr} by constructing a new -\code{\link[mlr3:PredictionClassif]{PredictionClassif}} and respectively -\code{\link[mlr3:PredictionRegr]{PredictionRegr}}. -} -\section{Construction}{ -\preformatted{PipeOpPredictionUnion$new(innum = 0, id = "predictionunion", param_vals = list()) -} -\itemize{ -\item \code{innum} :: \code{numeric(1)} | \code{character}\cr -Determines the number of input channels. If \code{innum} is 0 (default), a vararg input channel is -created that can take an arbitrary number of inputs. If \code{innum} is a \code{character} vector, the -number of input channels is the length of \code{innum}. -\item \code{id} :: \code{character(1)}\cr -Identifier of the resulting object, default \code{"predictionunion"}. -\item \code{param_vals} :: named \code{list}\cr -List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise -be set during construction. Default \code{list()}. -} -} - -\section{Input and Output Channels}{ - -\code{\link{PipeOpPredictionUnion}} has multiple input channels depending on the \code{innum} construction -argument, named \code{"input1"}, \code{"input2"}, ... if \code{innum} is nonzero; if \code{innum} is 0, there is only -one \emph{vararg} input channel named \code{"..."}. All input channels take \code{NULL} during training and a -\code{\link[mlr3:Prediction]{Prediction}} during prediction. - -\code{\link{PipeOpPredictionUnion}} has one output channel named \code{"output"}, producing \code{NULL} during -training and a \code{\link[mlr3:Prediction]{Prediction}} during prediction. - -The output during prediction is a \code{\link[mlr3:Prediction]{Prediction}} constructed by combining all -input \code{\link[mlr3:Prediction]{Prediction}}s. -} - -\section{State}{ - -The \verb{$state} is left empty (\code{list()}). -} - -\section{Parameters}{ - -\code{\link{PipeOpPredictionUnion}} has no Parameters. -} - -\section{Internals}{ - -Only sets the fields \code{row_ids}, \code{truth}, \code{response} and if applicable \code{prob} and \code{se} during -construction of the output \code{\link[mlr3:Prediction]{Prediction}}. -} - -\section{Fields}{ - -Only fields inherited from \code{\link{PipeOp}}. -} - -\section{Methods}{ - -Only methods inherited from \code{\link{PipeOp}}. -} - -\examples{ -library("mlr3") - -task = tsk("iris") -filter = expression(Sepal.Length < median(Sepal.Length)) -gr = po("copy", outnum = 2) \%>>\% gunion(list( - po("filterrows", id = "filter1", - param_vals = list(filter = filter)) \%>>\% - lrn("classif.rpart", id = "learner1"), - po("filterrows", id = "filter2", - param_vals = list(filter = filter, invert = TRUE)) \%>>\% - lrn("classif.rpart", id = "learner2") -)) \%>>\% po("predictionunion") - -gr$train(task) -gr$predict(task) -} -\seealso{ -Other PipeOps: -\code{\link{PipeOpEnsemble}}, -\code{\link{PipeOpImpute}}, -\code{\link{PipeOpProxy}}, -\code{\link{PipeOpTargetTrafo}}, -\code{\link{PipeOpTaskPreproc}}, -\code{\link{PipeOp}}, -\code{\link{mlr_pipeops_boxcox}}, -\code{\link{mlr_pipeops_branch}}, -\code{\link{mlr_pipeops_chunk}}, -\code{\link{mlr_pipeops_classbalancing}}, -\code{\link{mlr_pipeops_classifavg}}, -\code{\link{mlr_pipeops_classweights}}, -\code{\link{mlr_pipeops_colapply}}, -\code{\link{mlr_pipeops_collapsefactors}}, -\code{\link{mlr_pipeops_copy}}, -\code{\link{mlr_pipeops_datefeatures}}, -\code{\link{mlr_pipeops_encodeimpact}}, -\code{\link{mlr_pipeops_encodelmer}}, -\code{\link{mlr_pipeops_encode}}, -\code{\link{mlr_pipeops_featureunion}}, -\code{\link{mlr_pipeops_filterrows}}, -\code{\link{mlr_pipeops_filter}}, -\code{\link{mlr_pipeops_fixfactors}}, -\code{\link{mlr_pipeops_histbin}}, -\code{\link{mlr_pipeops_ica}}, -\code{\link{mlr_pipeops_imputehist}}, -\code{\link{mlr_pipeops_imputemean}}, -\code{\link{mlr_pipeops_imputemedian}}, -\code{\link{mlr_pipeops_imputemode}}, -\code{\link{mlr_pipeops_imputenewlvl}}, -\code{\link{mlr_pipeops_imputesample}}, -\code{\link{mlr_pipeops_kernelpca}}, -\code{\link{mlr_pipeops_learner}}, -\code{\link{mlr_pipeops_missind}}, -\code{\link{mlr_pipeops_modelmatrix}}, -\code{\link{mlr_pipeops_mutate}}, -\code{\link{mlr_pipeops_nop}}, -\code{\link{mlr_pipeops_pca}}, -\code{\link{mlr_pipeops_quantilebin}}, -\code{\link{mlr_pipeops_regravg}}, -\code{\link{mlr_pipeops_removeconstants}}, -\code{\link{mlr_pipeops_scalemaxabs}}, -\code{\link{mlr_pipeops_scalerange}}, -\code{\link{mlr_pipeops_scale}}, -\code{\link{mlr_pipeops_select}}, -\code{\link{mlr_pipeops_smote}}, -\code{\link{mlr_pipeops_spatialsign}}, -\code{\link{mlr_pipeops_subsample}}, -\code{\link{mlr_pipeops_targetinverter}}, -\code{\link{mlr_pipeops_targettrafoscalerange}}, -\code{\link{mlr_pipeops_targettrafosimple}}, -\code{\link{mlr_pipeops_textvectorizer}}, -\code{\link{mlr_pipeops_threshold}}, -\code{\link{mlr_pipeops_unbranch}}, -\code{\link{mlr_pipeops_updatetarget}}, -\code{\link{mlr_pipeops_yeojohnson}}, -\code{\link{mlr_pipeops}} -} -\concept{PipeOps} diff --git a/tests/testthat/test_pipeop_predictionunion.R b/tests/testthat/test_pipeop_predictionunion.R deleted file mode 100644 index 73f32578a..000000000 --- a/tests/testthat/test_pipeop_predictionunion.R +++ /dev/null @@ -1,114 +0,0 @@ -context("PipeOpPredictionUnion") - -test_that("PipeOpPredictionUnion - basic properties", { - po = PipeOpPredictionUnion$new(3) - expect_pipeop(po) - expect_data_table(po$input, nrows = 3) - expect_data_table(po$output, nrows = 1) - - expect_pipeop_class(PipeOpPredictionUnion, list(1)) - expect_pipeop_class(PipeOpPredictionUnion, list(3)) - - po = PipeOpPredictionUnion$new() - expect_pipeop(po) - expect_data_table(po$input, nrows = 1) - expect_data_table(po$output, nrows = 1) - - po = PipeOpPredictionUnion$new(innum = "test") - expect_equal(po$input$name, "input1") - - prediction = PredictionRegr$new(row_ids = 1, truth = 1, response = 1) - prediction$task_type = "test" - expect_error(po$predict(list(prediction)), regexp = "task types") -}) - -test_that("PipeOpPredictionUnion - train and predict classif", { - set.seed(1) - ids = sample(150, size = 75) - task1 = mlr_tasks$get("iris") - task2 = task1$clone(deep = TRUE) - task1$filter(ids) - learner1 = mlr_learners$get("classif.rpart") - learner2 = learner1$clone(deep = TRUE) - learner1$train(task1) - learner2$train(task2) - - po = PipeOpPredictionUnion$new(2) - expect_null(po$train(list(NULL, NULL))[[1L]]) - - predict_out1 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] - expect_equal(predict_out1$row_ids, c(task1$row_ids, task2$row_ids)) - expect_equal(predict_out1$truth, - unlist(list(learner1$predict(task1)$truth, learner2$predict(task2)$truth), use.names = FALSE)) - expect_equal(predict_out1$response, - unlist(list(learner1$predict(task1)$response, learner2$predict(task2)$response), use.names = FALSE)) - - learner1$predict_type = "prob" - learner2$predict_type = "prob" - - predict_out2 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] - expect_equal(predict_out2$prob, - do.call(rbind, list(learner1$predict(task1)$prob, learner2$predict(task2)$prob))) - - learner1$predict_type = "response" - expect_error(po$predict(list(learner1$predict(task1), learner2$predict(task2))), regexp = "same task type and predict types") -}) - -test_that("PipeOpPredictionUnion - train and predict regr", { - library(mlr3learners) - set.seed(1) - ids = sample(32, size = 15) - task1 = mlr_tasks$get("mtcars") - task2 = task1$clone(deep = TRUE) - task1$filter(ids) - learner1 = mlr_learners$get("regr.lm") - learner2 = learner1$clone(deep = TRUE) - learner1$train(task1) - learner2$train(task2) - - po = PipeOpPredictionUnion$new(2) - expect_null(po$train(list(NULL, NULL))[[1L]]) - - predict_out1 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] - expect_equal(predict_out1$row_ids, c(task1$row_ids, task2$row_ids)) - expect_equal(predict_out1$truth, - unlist(list(learner1$predict(task1)$truth, learner2$predict(task2)$truth), use.names = FALSE)) - expect_equal(predict_out1$response, - unlist(list(learner1$predict(task1)$response, learner2$predict(task2)$response), use.names = FALSE)) - - learner1$predict_type = "se" - learner2$predict_type = "se" - - predict_out2 = po$predict(list(learner1$predict(task1), learner2$predict(task2)))[[1L]] - expect_equal(predict_out2$se, - unlist(list(learner1$predict(task1)$se, learner2$predict(task2)$se), use.names = FALSE)) - - learner1$predict_type = "response" - expect_error(po$predict(list(learner1$predict(task1), learner2$predict(task2))), regexp = "same task type and predict types") -}) - -test_that("PipeOpFilterRows and PipeOpPredictionUnion - use case pima", { - task = mlr_tasks$get("pima") - age_ids = which(task$data(cols = "age")[[1L]] < median(task$data(cols = "age")[[1L]])) - na_ids = which(rowSums(is.na(task$data())) > 0L) - filter = expression(age < median(age)) - - g = PipeOpCopy$new(2) %>>% - gunion(list( - PipeOpFilterRows$new("filter1", param_vals = list(filter = filter, na_column = "_all_", skip_during_predict = FALSE)) %>>% - PipeOpLearner$new(LearnerClassifRpart$new(), "learner1"), - PipeOpFilterRows$new("filter2", param_vals = list(filter = filter, na_column = "_all_", invert = TRUE, skip_during_predict = FALSE)) %>>% - PipeOpLearner$new(LearnerClassifRpart$new(), "learner2")) - ) %>>% - PipeOpPredictionUnion$new() - - expect_null(g$train(task)[[1L]]) - expect_equal(g$state$filter1$na_ids, na_ids) - expect_equal(g$state$filter2$na_ids, na_ids) - expect_equal(g$state$filter1$row_ids, age_ids[age_ids %nin% na_ids]) - expect_equal(g$state$filter2$row_ids, setdiff(1:768, age_ids)[setdiff(1:768, age_ids) %nin% na_ids]) - - predict_out = g$predict(task)[[1L]] - expect_prediction(predict_out) - expect_setequal(predict_out$row_ids, setdiff(1:768, na_ids)) -}) From a5744778f62b68724aec050ca0142a72c413fe7b Mon Sep 17 00:00:00 2001 From: sumny Date: Sat, 10 Oct 2020 16:44:51 +0200 Subject: [PATCH 05/12] rework PipeOpFilterRows, update docs and tests --- tests/testthat/test_pipeop_filterrows.R | 157 +++++++++++------------- 1 file changed, 73 insertions(+), 84 deletions(-) diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index 660bc2977..acc496035 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -2,105 +2,94 @@ context("PipeOpFilterRows") test_that("PipeOpFilterRows - basic properties", { op = PipeOpFilterRows$new() - task = mlr_tasks$get("iris") + task = mlr_tasks$get("pima") expect_pipeop(op) expect_equal(train_pipeop(op, inputs = list(task))[[1L]], task) expect_equal(predict_pipeop(op, inputs = list(task))[[1L]], task) - expect_datapreproc_pipeop_class(PipeOpFilterRows, task = task) - - expect_error(PipeOpFilterRows$new(param_vals = list(filter = list()))) + expect_datapreproc_pipeop_class(PipeOpFilterRows, + constargs = list(param_vals = list(filter_formula = ~ age < median(age), + na_selector = selector_all())), + task = task) }) -test_that("PipeOpFilterRows - NA handling", { - dat = iris - dat$Sepal.Length[c(1L, 3L, 100L)] = NA - dat$Petal.Width[c(1L, 4L)] = NA - dat$Petal.Length[5] = NA - dat$Species[2L] = NA - task = TaskClassif$new("test", backend = dat, target = "Species") - - op = PipeOpFilterRows$new(param_vals = list(na_column = "Species")) - - op$param_set$values$na_column = "Sepal.Length" - train_out1 = op$train(list(task))[[1L]] - expect_equal(op$state$na_ids, c(1, 3, 100)) - - op$param_set$values$na_column = "_all_" - op$param_set$values$skip_during_predict = FALSE - train_out2 = op$train(list(task))[[1L]] - expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) - predict_out2 = op$predict(list(task))[[1L]] - expect_equal(train_out2, predict_out2) - - op$param_set$values$skip_during_predict = TRUE - expect_equal(op$predict(list(task))[[1L]], task) -}) - -test_that("PipeOpFilterRows - filter by column name", { +test_that("PipeOpFilterRows - filtering", { set.seed(1) - dat = iris - dat$filter = sample(c(FALSE, TRUE), size = 150, replace = TRUE) - task = TaskClassif$new("test", backend = dat, target = "Species") + task = tsk("pima") + train_ids = sample(task$row_ids, size = 200) + task_train = task$clone(deep = TRUE)$filter(train_ids) + task_predict = task$clone(deep = TRUE)$filter(setdiff(task$row_ids, train_ids)) + dt_train = task_train$data(cols = task_train$feature_names) + dt_predict = task_predict$data(cols = task_predict$feature_names) - op = PipeOpFilterRows$new(param_vals = list(filter = "filter")) + op = PipeOpFilterRows$new(param_vals = list( + filter_formula = ~ (age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree))) - train_out1 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, which(dat$filter)) - expect_equal(train_out1$data(), task$data(which(dat$filter))) + train_out = op$train(list(task_train))[[1L]] - op$param_set$values$invert = TRUE - train_out2 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, which(!dat$filter)) - expect_equal(train_out2$data(), task$data(which(!dat$filter))) -}) + expect_equal(dt_train[(age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree), ], + train_out$data(cols = task_train$feature_names)) -test_that("PipeOpFilterRows - filter by expression", { - task = mlr_tasks$get("iris") - task_cp = task$clone(deep = TRUE) - - op = PipeOpFilterRows$new(param_vals = list(filter = expression(Sepal.Length < 6 & Petal.Width > 1))) - train_out1 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, which(iris$Sepal.Length < 6 & iris$Petal.Width > 1)) - - # zero indices left - op$param_set$values$filter = expression(Petal.Length < 3 & Petal.Width > 1) - train_out2 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, integer(0)) - expect_equal(train_out2, task_cp$filter(integer(0))) - op$param_set$values$invert = TRUE - train_out3 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, 1:150) - expect_equal(train_out3, task) + predict_out = op$predict(list(task_predict))[[1L]] + + expect_equal(dt_predict[(age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree), ], + predict_out$data(cols = task_predict$feature_names)) + + # Works with variables from an env + some_test_val = 7 + filter_formula = ~ pregnant == some_test_val + op$param_set$values$filter_formula = filter_formula + expect_true(all(op$train(list(task))[[1L]]$data(cols = "pregnant")[[1L]] == 7L)) }) -test_that("PipeOpFilterRows - filter by ids", { - # in combination with NA - dat = iris - dat$Sepal.Length[c(1L, 3L, 100L)] = NA - dat$Petal.Width[c(1L, 4L)] = NA - dat$Petal.Length[5] = NA - dat$Species[2L] = NA - task = TaskClassif$new("test", backend = dat, target = "Species") - - op = PipeOpFilterRows$new(param_vals = list(na_column = "_all_", filter = 1:10)) - train_out1 = op$train(list(task))[[1L]] - expect_equal(op$state$row_ids, setdiff(1:10, c(1, 2, 3, 4, 5))) - op$param_set$values$invert = TRUE - train_out2 = op$train(list(task))[[1L]] - expect_equal(op$state$na_ids, c(1, 2, 3, 4, 5, 100)) - expect_true(!(100 %in% op$state$row_ids)) +test_that("PipeOpFilterRows - missing values removal", { + set.seed(2) + task = tsk("pima") + train_ids = sample(task$row_ids, size = 200) + task_train = task$clone(deep = TRUE)$filter(train_ids) + task_predict = task$clone(deep = TRUE)$filter(setdiff(task$row_ids, train_ids)) + dt_train = task_train$data(cols = task_train$feature_names) + dt_predict = task_predict$data(cols = task_predict$feature_names) + + op = PipeOpFilterRows$new(param_vals = list(na_selector = selector_name("insulin"))) + + train_out = op$train(list(task_train))[[1L]] + + expect_equal(dt_train[!is.na(insulin), ], + train_out$data(cols = task_train$feature_names)) + + predict_out = op$predict(list(task_predict))[[1L]] + + expect_equal(dt_predict[!is.na(insulin), ], + predict_out$data(cols = task_predict$feature_names)) }) -test_that("PipeOpFilterRows - use case iris", { - task = mlr_tasks$get("iris") - g = PipeOpFilterRows$new(param_vals = list(filter = expression(Sepal.Length < median(Sepal.Length)))) %>>% - PipeOpLearnerCV$new(LearnerClassifRpart$new()) +test_that("PipeOpFilterRows - filtering and missing values removal", { + set.seed(3) + task = tsk("pima") + train_ids = sample(task$row_ids, size = 200) + task_train = task$clone(deep = TRUE)$filter(train_ids) + task_predict = task$clone(deep = TRUE)$filter(setdiff(task$row_ids, train_ids)) + dt_train = task_train$data(cols = task_train$feature_names) + dt_predict = task_predict$data(cols = task_predict$feature_names) + + op = PipeOpFilterRows$new(param_vals = list(filter_formula = ~ age > median(age), + na_selector = selector_all())) + + train_out = op$train(list(task_train))[[1L]] + + expect_equal(na.omit(dt_train)[age > median(age)], + train_out$data(cols = task_train$feature_names)) + + predict_out = op$predict(list(task_predict))[[1L]] + + expect_equal(na.omit(dt_predict)[age > median(age)], + predict_out$data(cols = task_predict$feature_names)) +}) - train_out = g$train(task)[[1L]] - predict_out = g$predict(task)[[1L]] - expect_equal(g$state$filterrows$row_ids, which(with(task$data(), Sepal.Length < median(Sepal.Length)))) - expect_equal(g$state$filterrows$row_ids, train_out$row_ids) - expect_equal(task$row_ids, predict_out$row_ids) +test_that("PipeOpFilterRows - check_filter_formulae", { + expect_true(check_filter_formulae(NULL)) + expect_true(check_filter_formulae(~ age < 1)) + expect_character(check_filter_formulae(y ~ x)) }) From 2f3ee4e4d296fd486c6dee5161a1995025a7d449 Mon Sep 17 00:00:00 2001 From: sumny Date: Sat, 10 Oct 2020 17:07:51 +0200 Subject: [PATCH 06/12] update docs, fix tests, update NEWS --- DESCRIPTION | 1 + NAMESPACE | 1 + NEWS.md | 2 + R/PipeOpFilterRows.R | 169 ++++++++--------------- man/PipeOp.Rd | 1 + man/PipeOpEnsemble.Rd | 1 + man/PipeOpImpute.Rd | 1 + man/PipeOpTargetTrafo.Rd | 1 + man/PipeOpTaskPreproc.Rd | 1 + man/PipeOpTaskPreprocSimple.Rd | 1 + man/mlr_pipeops.Rd | 1 + man/mlr_pipeops_boxcox.Rd | 1 + man/mlr_pipeops_branch.Rd | 1 + man/mlr_pipeops_chunk.Rd | 1 + man/mlr_pipeops_classbalancing.Rd | 1 + man/mlr_pipeops_classifavg.Rd | 1 + man/mlr_pipeops_classweights.Rd | 1 + man/mlr_pipeops_colapply.Rd | 1 + man/mlr_pipeops_collapsefactors.Rd | 1 + man/mlr_pipeops_colroles.Rd | 1 + man/mlr_pipeops_copy.Rd | 1 + man/mlr_pipeops_datefeatures.Rd | 1 + man/mlr_pipeops_encode.Rd | 1 + man/mlr_pipeops_encodeimpact.Rd | 1 + man/mlr_pipeops_encodelmer.Rd | 1 + man/mlr_pipeops_featureunion.Rd | 1 + man/mlr_pipeops_filter.Rd | 1 + man/mlr_pipeops_filterrows.Rd | 100 +++++++------- man/mlr_pipeops_fixfactors.Rd | 1 + man/mlr_pipeops_histbin.Rd | 1 + man/mlr_pipeops_ica.Rd | 1 + man/mlr_pipeops_imputeconstant.Rd | 1 + man/mlr_pipeops_imputehist.Rd | 1 + man/mlr_pipeops_imputelearner.Rd | 1 + man/mlr_pipeops_imputemean.Rd | 1 + man/mlr_pipeops_imputemedian.Rd | 1 + man/mlr_pipeops_imputemode.Rd | 1 + man/mlr_pipeops_imputeoor.Rd | 1 + man/mlr_pipeops_imputesample.Rd | 1 + man/mlr_pipeops_kernelpca.Rd | 1 + man/mlr_pipeops_learner.Rd | 1 + man/mlr_pipeops_missind.Rd | 1 + man/mlr_pipeops_modelmatrix.Rd | 1 + man/mlr_pipeops_multiplicityexply.Rd | 1 + man/mlr_pipeops_multiplicityimply.Rd | 1 + man/mlr_pipeops_mutate.Rd | 1 + man/mlr_pipeops_nmf.Rd | 1 + man/mlr_pipeops_nop.Rd | 1 + man/mlr_pipeops_ovrsplit.Rd | 1 + man/mlr_pipeops_ovrunite.Rd | 1 + man/mlr_pipeops_pca.Rd | 1 + man/mlr_pipeops_proxy.Rd | 1 + man/mlr_pipeops_quantilebin.Rd | 1 + man/mlr_pipeops_randomprojection.Rd | 1 + man/mlr_pipeops_randomresponse.Rd | 1 + man/mlr_pipeops_regravg.Rd | 1 + man/mlr_pipeops_removeconstants.Rd | 1 + man/mlr_pipeops_renamecolumns.Rd | 1 + man/mlr_pipeops_replicate.Rd | 1 + man/mlr_pipeops_scale.Rd | 1 + man/mlr_pipeops_scalemaxabs.Rd | 1 + man/mlr_pipeops_scalerange.Rd | 1 + man/mlr_pipeops_select.Rd | 1 + man/mlr_pipeops_smote.Rd | 1 + man/mlr_pipeops_spatialsign.Rd | 1 + man/mlr_pipeops_subsample.Rd | 1 + man/mlr_pipeops_targetinvert.Rd | 1 + man/mlr_pipeops_targetmutate.Rd | 1 + man/mlr_pipeops_targettrafoscalerange.Rd | 1 + man/mlr_pipeops_textvectorizer.Rd | 1 + man/mlr_pipeops_threshold.Rd | 1 + man/mlr_pipeops_tunethreshold.Rd | 1 + man/mlr_pipeops_unbranch.Rd | 1 + man/mlr_pipeops_updatetarget.Rd | 1 + man/mlr_pipeops_vtreat.Rd | 1 + man/mlr_pipeops_yeojohnson.Rd | 1 + tests/testthat/test_pipeop_filterrows.R | 4 - 77 files changed, 184 insertions(+), 164 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index a9774f547..5531ce671 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -117,6 +117,7 @@ Collate: 'PipeOpEncodeLmer.R' 'PipeOpFeatureUnion.R' 'PipeOpFilter.R' + 'PipeOpFilterRows.R' 'PipeOpFixFactors.R' 'PipeOpHistBin.R' 'PipeOpICA.R' diff --git a/NAMESPACE b/NAMESPACE index f4c424ba8..318c490af 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -44,6 +44,7 @@ export(PipeOpEncodeLmer) export(PipeOpEnsemble) export(PipeOpFeatureUnion) export(PipeOpFilter) +export(PipeOpFilterRows) export(PipeOpFixFactors) export(PipeOpHistBin) export(PipeOpICA) diff --git a/NEWS.md b/NEWS.md index 2b03fdcb9..96219ae53 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,6 @@ # mlr3pipelines 0.3.0-9000 +* New PipeOps: + - PipeOpFilterRows # mlr3pipelines 0.3.0 diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index bce9adc18..47816b65c 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -2,80 +2,66 @@ #' #' @usage NULL #' @name mlr_pipeops_filterrows -#' @format [`R6Class`] object inheriting from [`PipeOpTaskPreproc`]. +#' @format [`R6Class`] object inheriting from [`PipeOpTaskPreprocSimple`]/[`PipeOpTaskPreproc`]/[`PipeOp`]. #' #' @description -#' Filter rows of the data of a task. Also directly allows for the removal of rows holding missing -#' values. If both filtering and missing value removal is performed, filtering is done after missing -#' value removal. +#' Filter rows of the data of a [`Task`][mlr3::Task]. +#' Also directly allows for the removal of rows with missing values with respect to some user-defined features. +#' If both row filtering and missing value removal is performed, filtering is done after missing value removal. #' #' @section Construction: #' ``` #' PipeOpFilterRows$new(id = "filterrows", param_vals = list()) #' ``` #' -#' * `id` :: `character(1)`\cr +#' * `id` :: `character(1)` \cr #' Identifier of resulting object, default `"filterrows"`. -#' * `param_vals` :: named `list`\cr +#' * `param_vals` :: named `list` \cr #' List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise #' be set during construction. Default `list()`. #' #' @section Input and Output Channels: #' Input and output channels are inherited from [`PipeOpTaskPreproc`]. #' -#' The output during training is the input [`Task`][mlr3::Task] with rows kept according to the -#' filtering (see Parameters) and (possible) rows with missing values removed. -#' -#' The output during prediction is the unchanged input [`Task`][mlr3::Task] if the parameter -#' `skip_during_predict` is `TRUE`. Otherwise it is analogously handled as the output during -#' training. +#' The output is the input [`Task`][mlr3::Task] with rows kept according to the filtering expression and +#' rows with missing values with respect to the user-defined features removed. #' #' @section State: -#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpTaskPreproc`], -#' as well as the following elements: -#' * `na_ids` :: `integer`\cr -#' The row identifiers that had missing values during training and therefore were removed. See the -#' parameter `na_column`. -#' * `row_ids` :: `integer`\cr -#' The row identifiers that were kept during training according to the parameters `filter`, -#' `na_column` and `invert`. +#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpTaskPreproc`], as well as: +#' * `na_selection` :: `character` \cr +#' A `character` vector of all feature names that are checked for missing values in the [`Task`][mlr3::Task]. +#' Initialized to [`selector_none()`]. #' #' @section Parameters: #' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as: -#' * `filter` :: `NULL` | `character(1)` | `expression` | `integer`\cr -#' How the rows of the data of the input [`Task`][mlr3::Task] should be filtered. This can be a -#' character vector of length 1 indicating a feature column of logicals in the data of the input -#' [`Task`][mlr3::Task] which forms the basis of the filtering, i.e., all rows that are `TRUE` -#' with respect to this column are kept in the data of the output [`Task`][mlr3::Task]. Moreover, -#' this can be an expression that will result in a logical vector of length `$nrow` of the data of -#' the input [`Task`][mlr3::Task] when evaluated withing the environment of the `$data()` of the -#' input [`Task`][mlr3::Task]. Finally, this can also be an integerish vector that directly -#' specifies the row identifiers of the rows of the data of the input [`Task`][mlr3::Task] that -#' should be kept. Default is `NULL`, i.e., no filtering is done. -#' * `na_column` :: `character`\cr -#' A character vector that specifies the columns of the data of the input [`Task`][mlr3::Task] -#' that should be checked for missing values. If set to `_all_`, all columns of the data are used. A -#' row is removed if at least one missing value is found with respect to the columns specified. -#' Default is `character(0)`, i.e., no removal of missing values is done. -#' * `invert` :: `logical(1)`\cr -#' Should the filtering rule be set-theoretically inverted? Note that this happens after -#' (possible) missing values were removed if `na_column` is specified. Default is `FALSE`. -#' * `skip_during_predict` :: `logical(1)`\cr -#' Should the filtering and missing value removal steps be skipped during prediction? Default is -#' `TRUE`, i.e., the input [`Task`][mlr3::Task] is returned unaltered during prediction. +#' * `filter_formula` :: `NULL` | `formula` \cr +#' Expression of the filtering to be performed, in the form of a `formula` that evaluates to `TRUE` or `FALSE` +#' for each row within the data of the [`Task`][mlr3::Task]. +#' Rows for which the evaluation is `TRUE` are kept, others are removed. +#' Initialized to `NULL`, i.e., no filtering is performed and all rows are kept. +#' * `na_selector` :: `function` | [`Selector`] \cr +#' [`Selector`] function, takes a [`Task`][mlr3::Task] as an argument and returns a `character` vector of features +#' to check for missing values. +#' Rows with missing values with respect to these features are removed. +#' See [`Selector`] for example functions. +#' Initialized to `selector_none()`, i.e., no missing value removal is performed. #' #' @section Internals: +#' A `formula` created using the `~` operator always contains a reference to the `environment` in which +#' the `formula` is created. This makes it possible to use variables in the `~`-expressions that both +#' reference either column names or variable names. +#' #' Uses the [`is.na()`][base::is.na] function for the checking of missing values. #' #' @section Methods: -#' Only methods inherited from [`PipeOpTaskPreproc`]/[`PipeOp`]. +#' Only methods inherited from [`PipeOpTaskPreprocSimple`]/[`PipeOpTaskPreproc`]/[`PipeOp`]. #' #' @examples #' library("mlr3") #' task = tsk("pima") #' po = PipeOpFilterRows$new(param_vals = list( -#' filter = expression(age < median(age) & mass > 30), -#' na_column = "_all_") +#' filter_formula = ~ age < 31 & glucose > median(glucose), +#' na_selector = selector_all()) #' ) #' po$train(list(task)) #' po$state @@ -83,90 +69,53 @@ #' @include PipeOpTaskPreproc.R #' @export PipeOpFilterRows = R6Class("PipeOpFilterRows", - inherit = PipeOpTaskPreproc, + inherit = PipeOpTaskPreprocSimple, public = list( initialize = function(id = "filterrows", param_vals = list()) { ps = ParamSet$new(params = list( - ParamUty$new("filter", default = NULL, tags = c("train", "predict"), custom_check = function(x) { - ok = test_character(x, any.missing = FALSE, len = 1L) || - is.expression(x) || - test_integerish(x, lower = 1, min.len = 1L) || - is.null(x) - if (!ok) return("Must either be a character vector of length 1, an expression, or an integerish object of row ids") - TRUE - }), - ParamUty$new("na_column", default = character(0L), tags = c("train", "predict"), custom_check = function(x) { - check_character(x, any.missing = FALSE, null.ok = TRUE) - }), - ParamLgl$new("invert", default = FALSE, tags = c("train", "predict")), - ParamLgl$new("skip_during_predict", default = TRUE, tags = "predict")) - ) - ps$values = list(filter = NULL, na_column = character(0L), invert = FALSE, skip_during_predict = TRUE) + ParamUty$new("filter_formula", tags = c("train", "predict"), custom_check = check_filter_formulae), + ParamUty$new("na_selector", tags = c("train", "required"), custom_check = check_function) + )) + ps$values = list(filter_formula = NULL, na_selector = selector_none()) super$initialize(id, param_set = ps, param_vals = param_vals) } ), private = list( - .na_and_filter = function(task, skip, set_state) { - if (skip) { - return(task) # early exit if skipped (if skip_during_predict) - } + .get_state = function(task) { + na_selection = self$param_set$values$na_selector(task) + assert_subset(na_selection, task$feature_names) + list(na_selection = na_selection) + }, + .transform = function(task) { row_ids = task$row_ids - # NA column(s) handling - na = self$param_set$values$na_column - if (length(na)) { - assert_subset(na, choices = c("_all_", colnames(task$data()))) - if (na == "_all_") na = colnames(task$data()) - na_ids = which(rowSums(is.na(task$data(cols = na))) > 0L) - row_ids = setdiff(row_ids, na_ids) - } else { - na_ids = integer(0L) - } - - # filtering - filter = self$param_set$values$filter - filter_ids = - if (is.null(filter)) { - row_ids - } else if (is.character(filter)) { - assert_subset(filter, choices = task$feature_names) - filter_column = task$data(cols = filter)[[1L]] - assert_logical(filter_column) - which(filter_column) - } else if(is.expression(filter)) { - filter_expression = eval(filter, envir = task$data()) - assert_logical(filter_expression, len = task$nrow) - which(filter_expression) + na_ids = if (length(self$state$na_selection)) { + row_ids[which(rowSums(is.na(task$data(cols = self$state$na_selection))) > 0L)] } else { - filter = as.integer(filter) - assert_subset(filter, choices = task$row_ids) - filter + integer(0L) } + row_ids = setdiff(row_ids, na_ids) - row_ids = if (self$param_set$values$invert) { - setdiff(row_ids, filter_ids) - } else { - intersect(row_ids, filter_ids) - } - - # only set the state if required (during training) - if (set_state) { - self$state$na_ids = na_ids - self$state$row_ids = row_ids + if (length(self$param_set$values$filter_formula)) { + row_ids = row_ids[which(eval(self$param_set$values$filter_formula[[2L]], envir = task$data(row_ids, cols = task$feature_names)))] } task$filter(row_ids) - }, - - .train_task = function(task) { - private$.na_and_filter(task, skip = FALSE, set_state = TRUE) - }, - - .predict_task = function(task) { - private$.na_and_filter(task, skip = self$param_set$values$skip_during_predict, set_state = FALSE) } ) ) +# check the `filter_formula` parameter of PipeOpFilterRows +# @param x [formula] whatever `filter_formula` is being set to +# checks that `filter_formula` is `formula` with only a rhs (or NULL) +check_filter_formulae = function(x) { + check_formula(x, null.ok = TRUE) %check&&% + if (!is.null(x) && length(x) != 2L) { + sprintf("formula %s must not have a left hand side.", deparse(x, nlines = 1L, width.cutoff = 500)) + } else { + TRUE + } +} + mlr_pipeops$add("filterrows", PipeOpFilterRows) diff --git a/man/PipeOp.Rd b/man/PipeOp.Rd index e3fa48e09..e29573e42 100644 --- a/man/PipeOp.Rd +++ b/man/PipeOp.Rd @@ -238,6 +238,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/PipeOpEnsemble.Rd b/man/PipeOpEnsemble.Rd index 4e22cbac2..8b4add6b8 100644 --- a/man/PipeOpEnsemble.Rd +++ b/man/PipeOpEnsemble.Rd @@ -115,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/PipeOpImpute.Rd b/man/PipeOpImpute.Rd index 2a0269cc3..50949de0d 100644 --- a/man/PipeOpImpute.Rd +++ b/man/PipeOpImpute.Rd @@ -145,6 +145,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/PipeOpTargetTrafo.Rd b/man/PipeOpTargetTrafo.Rd index 933bb9e5f..808289ab7 100644 --- a/man/PipeOpTargetTrafo.Rd +++ b/man/PipeOpTargetTrafo.Rd @@ -156,6 +156,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/PipeOpTaskPreproc.Rd b/man/PipeOpTaskPreproc.Rd index 2a6b2cffc..0e4496466 100644 --- a/man/PipeOpTaskPreproc.Rd +++ b/man/PipeOpTaskPreproc.Rd @@ -204,6 +204,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/PipeOpTaskPreprocSimple.Rd b/man/PipeOpTaskPreprocSimple.Rd index ea26e7a64..5a7293df4 100644 --- a/man/PipeOpTaskPreprocSimple.Rd +++ b/man/PipeOpTaskPreprocSimple.Rd @@ -148,6 +148,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops.Rd b/man/mlr_pipeops.Rd index 156975a4d..e1b72dc5a 100644 --- a/man/mlr_pipeops.Rd +++ b/man/mlr_pipeops.Rd @@ -88,6 +88,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_boxcox.Rd b/man/mlr_pipeops_boxcox.Rd index 85ee439c2..dcddf42c0 100644 --- a/man/mlr_pipeops_boxcox.Rd +++ b/man/mlr_pipeops_boxcox.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_branch.Rd b/man/mlr_pipeops_branch.Rd index 9baa39ef7..b5754fc1b 100644 --- a/man/mlr_pipeops_branch.Rd +++ b/man/mlr_pipeops_branch.Rd @@ -117,6 +117,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_chunk.Rd b/man/mlr_pipeops_chunk.Rd index c1a3de8c7..f6163c010 100644 --- a/man/mlr_pipeops_chunk.Rd +++ b/man/mlr_pipeops_chunk.Rd @@ -96,6 +96,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_classbalancing.Rd b/man/mlr_pipeops_classbalancing.Rd index d408b5579..0e9169613 100644 --- a/man/mlr_pipeops_classbalancing.Rd +++ b/man/mlr_pipeops_classbalancing.Rd @@ -137,6 +137,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_classifavg.Rd b/man/mlr_pipeops_classifavg.Rd index 1a9d00ab7..e86ee80e4 100644 --- a/man/mlr_pipeops_classifavg.Rd +++ b/man/mlr_pipeops_classifavg.Rd @@ -111,6 +111,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_classweights.Rd b/man/mlr_pipeops_classweights.Rd index 38e168358..d9c3e381a 100644 --- a/man/mlr_pipeops_classweights.Rd +++ b/man/mlr_pipeops_classweights.Rd @@ -105,6 +105,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_colapply.Rd b/man/mlr_pipeops_colapply.Rd index 73e27d840..fc9fcbfd0 100644 --- a/man/mlr_pipeops_colapply.Rd +++ b/man/mlr_pipeops_colapply.Rd @@ -126,6 +126,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_collapsefactors.Rd b/man/mlr_pipeops_collapsefactors.Rd index eab323975..057455f23 100644 --- a/man/mlr_pipeops_collapsefactors.Rd +++ b/man/mlr_pipeops_collapsefactors.Rd @@ -93,6 +93,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_colroles.Rd b/man/mlr_pipeops_colroles.Rd index 30eb5efd0..7734a69f1 100644 --- a/man/mlr_pipeops_colroles.Rd +++ b/man/mlr_pipeops_colroles.Rd @@ -85,6 +85,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_copy.Rd b/man/mlr_pipeops_copy.Rd index 3b12f3798..eca49c128 100644 --- a/man/mlr_pipeops_copy.Rd +++ b/man/mlr_pipeops_copy.Rd @@ -115,6 +115,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_datefeatures.Rd b/man/mlr_pipeops_datefeatures.Rd index 29756ecb0..c434e5a2c 100644 --- a/man/mlr_pipeops_datefeatures.Rd +++ b/man/mlr_pipeops_datefeatures.Rd @@ -132,6 +132,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_encode.Rd b/man/mlr_pipeops_encode.Rd index ce51d7194..db73d21d3 100644 --- a/man/mlr_pipeops_encode.Rd +++ b/man/mlr_pipeops_encode.Rd @@ -118,6 +118,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_encodeimpact.Rd b/man/mlr_pipeops_encodeimpact.Rd index 45907d0d2..7dcd90de8 100644 --- a/man/mlr_pipeops_encodeimpact.Rd +++ b/man/mlr_pipeops_encodeimpact.Rd @@ -110,6 +110,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_encodelmer.Rd b/man/mlr_pipeops_encodelmer.Rd index 731b93a64..00fa339b2 100644 --- a/man/mlr_pipeops_encodelmer.Rd +++ b/man/mlr_pipeops_encodelmer.Rd @@ -121,6 +121,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_featureunion.Rd b/man/mlr_pipeops_featureunion.Rd index f75ab6865..5b6607abc 100644 --- a/man/mlr_pipeops_featureunion.Rd +++ b/man/mlr_pipeops_featureunion.Rd @@ -130,6 +130,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodeimpact}}, \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_filter.Rd b/man/mlr_pipeops_filter.Rd index 7461dd5e6..cc3b7ea21 100644 --- a/man/mlr_pipeops_filter.Rd +++ b/man/mlr_pipeops_filter.Rd @@ -140,6 +140,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, \code{\link{mlr_pipeops_ica}}, diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index 62a14e584..9300abb96 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -5,20 +5,20 @@ \alias{PipeOpFilterRows} \title{PipeOpFilterRows} \format{ -\code{\link{R6Class}} object inheriting from \code{\link{PipeOpTaskPreproc}}. +\code{\link{R6Class}} object inheriting from \code{\link{PipeOpTaskPreprocSimple}}/\code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. } \description{ -Filter rows of the data of a task. Also directly allows for the removal of rows holding missing -values. If both filtering and missing value removal is performed, filtering is done after missing -value removal. +Filter rows of the data of a \code{\link[mlr3:Task]{Task}}. +Also directly allows for the removal of rows with missing values with respect to some user-defined features. +If both row filtering and missing value removal is performed, filtering is done after missing value removal. } \section{Construction}{ \preformatted{PipeOpFilterRows$new(id = "filterrows", param_vals = list()) } \itemize{ -\item \code{id} :: \code{character(1)}\cr +\item \code{id} :: \code{character(1)} \cr Identifier of resulting object, default \code{"filterrows"}. -\item \code{param_vals} :: named \code{list}\cr +\item \code{param_vals} :: named \code{list} \cr List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default \code{list()}. } @@ -28,25 +28,17 @@ be set during construction. Default \code{list()}. Input and output channels are inherited from \code{\link{PipeOpTaskPreproc}}. -The output during training is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the -filtering (see Parameters) and (possible) rows with missing values removed. - -The output during prediction is the unchanged input \code{\link[mlr3:Task]{Task}} if the parameter -\code{skip_during_predict} is \code{TRUE}. Otherwise it is analogously handled as the output during -training. +The output is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the filtering expression and +rows with missing values with respect to the user-defined features removed. } \section{State}{ -The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}, -as well as the following elements: +The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}, as well as: \itemize{ -\item \code{na_ids} :: \code{integer}\cr -The row identifiers that had missing values during training and therefore were removed. See the -parameter \code{na_column}. -\item \code{row_ids} :: \code{integer}\cr -The row identifiers that were kept during training according to the parameters \code{filter}, -\code{na_column} and \code{invert}. +\item \code{na_selection} :: \code{character} \cr +A \code{character} vector of all feature names that are checked for missing values in the \code{\link[mlr3:Task]{Task}}. +Initialized to \code{\link[=selector_none]{selector_none()}}. } } @@ -54,46 +46,40 @@ The row identifiers that were kept during training according to the parameters \ The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}}, as well as: \itemize{ -\item \code{filter} :: \code{NULL} | \code{character(1)} | \code{expression} | \code{integer}\cr -How the rows of the data of the input \code{\link[mlr3:Task]{Task}} should be filtered. This can be a -character vector of length 1 indicating a feature column of logicals in the data of the input -\code{\link[mlr3:Task]{Task}} which forms the basis of the filtering, i.e., all rows that are \code{TRUE} -with respect to this column are kept in the data of the output \code{\link[mlr3:Task]{Task}}. Moreover, -this can be an expression that will result in a logical vector of length \verb{$nrow} of the data of -the input \code{\link[mlr3:Task]{Task}} when evaluated withing the environment of the \verb{$data()} of the -input \code{\link[mlr3:Task]{Task}}. Finally, this can also be an integerish vector that directly -specifies the row identifiers of the rows of the data of the input \code{\link[mlr3:Task]{Task}} that -should be kept. Default is \code{NULL}, i.e., no filtering is done. -\item \code{na_column} :: \code{character}\cr -A character vector that specifies the columns of the data of the input \code{\link[mlr3:Task]{Task}} -that should be checked for missing values. If set to \verb{_all_}, all columns of the data are used. A -row is removed if at least one missing value is found with respect to the columns specified. -Default is \code{character(0)}, i.e., no removal of missing values is done. -\item \code{invert} :: \code{logical(1)}\cr -Should the filtering rule be set-theoretically inverted? Note that this happens after -(possible) missing values were removed if \code{na_column} is specified. Default is \code{FALSE}. -\item \code{skip_during_predict} :: \code{logical(1)}\cr -Should the filtering and missing value removal steps be skipped during prediction? Default is -\code{TRUE}, i.e., the input \code{\link[mlr3:Task]{Task}} is returned unaltered during prediction. +\item \code{filter_formula} :: \code{NULL} | \code{formula} \cr +Expression of the filtering to be performed, in the form of a \code{formula} that evaluates to \code{TRUE} or \code{FALSE} +for each row within the data of the \code{\link[mlr3:Task]{Task}}. +Rows for which the evaluation is \code{TRUE} are kept, others are removed. +Initialized to \code{NULL}, i.e., no filtering is performed and all rows are kept. +\item \code{na_selector} :: \code{function} | \code{\link{Selector}} \cr +\code{\link{Selector}} function, takes a \code{\link[mlr3:Task]{Task}} as an argument and returns a \code{character} vector of features +to check for missing values. +Rows with missing values with respect to these features are removed. +See \code{\link{Selector}} for example functions. +Initialized to \code{selector_none()}, i.e., no missing value removal is performed. } } \section{Internals}{ -Uses the \code{\link[base:is.na]{is.na()}} function for the checking of missing values. +A \code{formula} created using the \code{~} operator always contains a reference to the \code{environment} in which +the \code{formula} is created. This makes it possible to use variables in the \code{~}-expressions that both +reference either column names or variable names. + +Uses the \code{\link[base:NA]{is.na()}} function for the checking of missing values. } \section{Methods}{ -Only methods inherited from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. +Only methods inherited from \code{\link{PipeOpTaskPreprocSimple}}/\code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. } \examples{ library("mlr3") task = tsk("pima") po = PipeOpFilterRows$new(param_vals = list( - filter = expression(age < median(age) & mass > 30), - na_column = "_all_") + filter_formula = ~ age < 31 & glucose > median(glucose), + na_selector = selector_all()) ) po$train(list(task)) po$state @@ -102,8 +88,8 @@ po$state Other PipeOps: \code{\link{PipeOpEnsemble}}, \code{\link{PipeOpImpute}}, -\code{\link{PipeOpProxy}}, \code{\link{PipeOpTargetTrafo}}, +\code{\link{PipeOpTaskPreprocSimple}}, \code{\link{PipeOpTaskPreproc}}, \code{\link{PipeOp}}, \code{\link{mlr_pipeops_boxcox}}, @@ -114,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_classweights}}, \code{\link{mlr_pipeops_colapply}}, \code{\link{mlr_pipeops_collapsefactors}}, +\code{\link{mlr_pipeops_colroles}}, \code{\link{mlr_pipeops_copy}}, \code{\link{mlr_pipeops_datefeatures}}, \code{\link{mlr_pipeops_encodeimpact}}, @@ -124,23 +111,34 @@ Other PipeOps: \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, \code{\link{mlr_pipeops_ica}}, +\code{\link{mlr_pipeops_imputeconstant}}, \code{\link{mlr_pipeops_imputehist}}, +\code{\link{mlr_pipeops_imputelearner}}, \code{\link{mlr_pipeops_imputemean}}, \code{\link{mlr_pipeops_imputemedian}}, \code{\link{mlr_pipeops_imputemode}}, -\code{\link{mlr_pipeops_imputenewlvl}}, +\code{\link{mlr_pipeops_imputeoor}}, \code{\link{mlr_pipeops_imputesample}}, \code{\link{mlr_pipeops_kernelpca}}, \code{\link{mlr_pipeops_learner}}, \code{\link{mlr_pipeops_missind}}, \code{\link{mlr_pipeops_modelmatrix}}, +\code{\link{mlr_pipeops_multiplicityexply}}, +\code{\link{mlr_pipeops_multiplicityimply}}, \code{\link{mlr_pipeops_mutate}}, +\code{\link{mlr_pipeops_nmf}}, \code{\link{mlr_pipeops_nop}}, +\code{\link{mlr_pipeops_ovrsplit}}, +\code{\link{mlr_pipeops_ovrunite}}, \code{\link{mlr_pipeops_pca}}, -\code{\link{mlr_pipeops_predictionunion}}, +\code{\link{mlr_pipeops_proxy}}, \code{\link{mlr_pipeops_quantilebin}}, +\code{\link{mlr_pipeops_randomprojection}}, +\code{\link{mlr_pipeops_randomresponse}}, \code{\link{mlr_pipeops_regravg}}, \code{\link{mlr_pipeops_removeconstants}}, +\code{\link{mlr_pipeops_renamecolumns}}, +\code{\link{mlr_pipeops_replicate}}, \code{\link{mlr_pipeops_scalemaxabs}}, \code{\link{mlr_pipeops_scalerange}}, \code{\link{mlr_pipeops_scale}}, @@ -148,13 +146,15 @@ Other PipeOps: \code{\link{mlr_pipeops_smote}}, \code{\link{mlr_pipeops_spatialsign}}, \code{\link{mlr_pipeops_subsample}}, -\code{\link{mlr_pipeops_targetinverter}}, +\code{\link{mlr_pipeops_targetinvert}}, +\code{\link{mlr_pipeops_targetmutate}}, \code{\link{mlr_pipeops_targettrafoscalerange}}, -\code{\link{mlr_pipeops_targettrafosimple}}, \code{\link{mlr_pipeops_textvectorizer}}, \code{\link{mlr_pipeops_threshold}}, +\code{\link{mlr_pipeops_tunethreshold}}, \code{\link{mlr_pipeops_unbranch}}, \code{\link{mlr_pipeops_updatetarget}}, +\code{\link{mlr_pipeops_vtreat}}, \code{\link{mlr_pipeops_yeojohnson}}, \code{\link{mlr_pipeops}} } diff --git a/man/mlr_pipeops_fixfactors.Rd b/man/mlr_pipeops_fixfactors.Rd index c628bd09f..1953ff35c 100644 --- a/man/mlr_pipeops_fixfactors.Rd +++ b/man/mlr_pipeops_fixfactors.Rd @@ -86,6 +86,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_histbin}}, \code{\link{mlr_pipeops_ica}}, diff --git a/man/mlr_pipeops_histbin.Rd b/man/mlr_pipeops_histbin.Rd index 2b50a748b..88e2b24e8 100644 --- a/man/mlr_pipeops_histbin.Rd +++ b/man/mlr_pipeops_histbin.Rd @@ -98,6 +98,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_ica}}, diff --git a/man/mlr_pipeops_ica.Rd b/man/mlr_pipeops_ica.Rd index 1a607cf2c..09d45abda 100644 --- a/man/mlr_pipeops_ica.Rd +++ b/man/mlr_pipeops_ica.Rd @@ -124,6 +124,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputeconstant.Rd b/man/mlr_pipeops_imputeconstant.Rd index e6326f66c..abe1d0b05 100644 --- a/man/mlr_pipeops_imputeconstant.Rd +++ b/man/mlr_pipeops_imputeconstant.Rd @@ -100,6 +100,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputehist.Rd b/man/mlr_pipeops_imputehist.Rd index 3475a874a..5ed4ccd68 100644 --- a/man/mlr_pipeops_imputehist.Rd +++ b/man/mlr_pipeops_imputehist.Rd @@ -85,6 +85,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputelearner.Rd b/man/mlr_pipeops_imputelearner.Rd index e33964982..40fb93989 100644 --- a/man/mlr_pipeops_imputelearner.Rd +++ b/man/mlr_pipeops_imputelearner.Rd @@ -114,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputemean.Rd b/man/mlr_pipeops_imputemean.Rd index ac0e0a3cf..574a25da1 100644 --- a/man/mlr_pipeops_imputemean.Rd +++ b/man/mlr_pipeops_imputemean.Rd @@ -85,6 +85,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputemedian.Rd b/man/mlr_pipeops_imputemedian.Rd index 1e1474ea0..7d6f39645 100644 --- a/man/mlr_pipeops_imputemedian.Rd +++ b/man/mlr_pipeops_imputemedian.Rd @@ -85,6 +85,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputemode.Rd b/man/mlr_pipeops_imputemode.Rd index da306870c..7adbd0d84 100644 --- a/man/mlr_pipeops_imputemode.Rd +++ b/man/mlr_pipeops_imputemode.Rd @@ -92,6 +92,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputeoor.Rd b/man/mlr_pipeops_imputeoor.Rd index e9e63b8ef..42551f570 100644 --- a/man/mlr_pipeops_imputeoor.Rd +++ b/man/mlr_pipeops_imputeoor.Rd @@ -114,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_imputesample.Rd b/man/mlr_pipeops_imputesample.Rd index 39b731411..5f8bdb177 100644 --- a/man/mlr_pipeops_imputesample.Rd +++ b/man/mlr_pipeops_imputesample.Rd @@ -87,6 +87,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_kernelpca.Rd b/man/mlr_pipeops_kernelpca.Rd index 1af9ab1bf..6e436983c 100644 --- a/man/mlr_pipeops_kernelpca.Rd +++ b/man/mlr_pipeops_kernelpca.Rd @@ -99,6 +99,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_learner.Rd b/man/mlr_pipeops_learner.Rd index f68d72a57..8e7d0a6b3 100644 --- a/man/mlr_pipeops_learner.Rd +++ b/man/mlr_pipeops_learner.Rd @@ -118,6 +118,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_missind.Rd b/man/mlr_pipeops_missind.Rd index 94657cf2a..daea2ff90 100644 --- a/man/mlr_pipeops_missind.Rd +++ b/man/mlr_pipeops_missind.Rd @@ -114,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_modelmatrix.Rd b/man/mlr_pipeops_modelmatrix.Rd index ae497fe92..9a9b87282 100644 --- a/man/mlr_pipeops_modelmatrix.Rd +++ b/man/mlr_pipeops_modelmatrix.Rd @@ -91,6 +91,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_multiplicityexply.Rd b/man/mlr_pipeops_multiplicityexply.Rd index e6215987e..a59a4e64d 100644 --- a/man/mlr_pipeops_multiplicityexply.Rd +++ b/man/mlr_pipeops_multiplicityexply.Rd @@ -97,6 +97,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_multiplicityimply.Rd b/man/mlr_pipeops_multiplicityimply.Rd index ff89f4b78..8fb5d555b 100644 --- a/man/mlr_pipeops_multiplicityimply.Rd +++ b/man/mlr_pipeops_multiplicityimply.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_mutate.Rd b/man/mlr_pipeops_mutate.Rd index e09813493..6d09ab1d8 100644 --- a/man/mlr_pipeops_mutate.Rd +++ b/man/mlr_pipeops_mutate.Rd @@ -108,6 +108,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_nmf.Rd b/man/mlr_pipeops_nmf.Rd index 78a4a5140..1a59a4ab0 100644 --- a/man/mlr_pipeops_nmf.Rd +++ b/man/mlr_pipeops_nmf.Rd @@ -102,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_nop.Rd b/man/mlr_pipeops_nop.Rd index 1e87398d1..6eb987720 100644 --- a/man/mlr_pipeops_nop.Rd +++ b/man/mlr_pipeops_nop.Rd @@ -93,6 +93,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_ovrsplit.Rd b/man/mlr_pipeops_ovrsplit.Rd index 2f4a88685..77c9ac962 100644 --- a/man/mlr_pipeops_ovrsplit.Rd +++ b/man/mlr_pipeops_ovrsplit.Rd @@ -108,6 +108,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_ovrunite.Rd b/man/mlr_pipeops_ovrunite.Rd index e760d7456..c708b7ddf 100644 --- a/man/mlr_pipeops_ovrunite.Rd +++ b/man/mlr_pipeops_ovrunite.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_pca.Rd b/man/mlr_pipeops_pca.Rd index e80a84c99..20aece5ae 100644 --- a/man/mlr_pipeops_pca.Rd +++ b/man/mlr_pipeops_pca.Rd @@ -102,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_proxy.Rd b/man/mlr_pipeops_proxy.Rd index a1ef41cbe..bc5c8b238 100644 --- a/man/mlr_pipeops_proxy.Rd +++ b/man/mlr_pipeops_proxy.Rd @@ -114,6 +114,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_quantilebin.Rd b/man/mlr_pipeops_quantilebin.Rd index 445c260dc..6018b9238 100644 --- a/man/mlr_pipeops_quantilebin.Rd +++ b/man/mlr_pipeops_quantilebin.Rd @@ -90,6 +90,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_randomprojection.Rd b/man/mlr_pipeops_randomprojection.Rd index 284c53c95..fc1e40ba1 100644 --- a/man/mlr_pipeops_randomprojection.Rd +++ b/man/mlr_pipeops_randomprojection.Rd @@ -102,6 +102,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_randomresponse.Rd b/man/mlr_pipeops_randomresponse.Rd index cf5d5cc81..20038e05e 100644 --- a/man/mlr_pipeops_randomresponse.Rd +++ b/man/mlr_pipeops_randomresponse.Rd @@ -117,6 +117,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_regravg.Rd b/man/mlr_pipeops_regravg.Rd index 5f18cfb88..951d0a044 100644 --- a/man/mlr_pipeops_regravg.Rd +++ b/man/mlr_pipeops_regravg.Rd @@ -103,6 +103,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_removeconstants.Rd b/man/mlr_pipeops_removeconstants.Rd index 9c345e985..1dd4cb169 100644 --- a/man/mlr_pipeops_removeconstants.Rd +++ b/man/mlr_pipeops_removeconstants.Rd @@ -95,6 +95,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_renamecolumns.Rd b/man/mlr_pipeops_renamecolumns.Rd index 6070ad815..ae08aaf6b 100644 --- a/man/mlr_pipeops_renamecolumns.Rd +++ b/man/mlr_pipeops_renamecolumns.Rd @@ -94,6 +94,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_replicate.Rd b/man/mlr_pipeops_replicate.Rd index d7936bdf4..aaad037d0 100644 --- a/man/mlr_pipeops_replicate.Rd +++ b/man/mlr_pipeops_replicate.Rd @@ -87,6 +87,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_scale.Rd b/man/mlr_pipeops_scale.Rd index cab7fad11..688d34a15 100644 --- a/man/mlr_pipeops_scale.Rd +++ b/man/mlr_pipeops_scale.Rd @@ -109,6 +109,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_scalemaxabs.Rd b/man/mlr_pipeops_scalemaxabs.Rd index 3fc05aee4..845e02eb2 100644 --- a/man/mlr_pipeops_scalemaxabs.Rd +++ b/man/mlr_pipeops_scalemaxabs.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_scalerange.Rd b/man/mlr_pipeops_scalerange.Rd index e9c09a2b1..e6964d8f2 100644 --- a/man/mlr_pipeops_scalerange.Rd +++ b/man/mlr_pipeops_scalerange.Rd @@ -89,6 +89,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_select.Rd b/man/mlr_pipeops_select.Rd index 261de549c..9f7578a5d 100644 --- a/man/mlr_pipeops_select.Rd +++ b/man/mlr_pipeops_select.Rd @@ -105,6 +105,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_smote.Rd b/man/mlr_pipeops_smote.Rd index dc3bb82ba..d7b7cec74 100644 --- a/man/mlr_pipeops_smote.Rd +++ b/man/mlr_pipeops_smote.Rd @@ -106,6 +106,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_spatialsign.Rd b/man/mlr_pipeops_spatialsign.Rd index 0412d0c33..adf5f6de4 100644 --- a/man/mlr_pipeops_spatialsign.Rd +++ b/man/mlr_pipeops_spatialsign.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_subsample.Rd b/man/mlr_pipeops_subsample.Rd index 18e824caa..e1527d05f 100644 --- a/man/mlr_pipeops_subsample.Rd +++ b/man/mlr_pipeops_subsample.Rd @@ -99,6 +99,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_targetinvert.Rd b/man/mlr_pipeops_targetinvert.Rd index 9f300abd0..b956ca2f0 100644 --- a/man/mlr_pipeops_targetinvert.Rd +++ b/man/mlr_pipeops_targetinvert.Rd @@ -84,6 +84,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_targetmutate.Rd b/man/mlr_pipeops_targetmutate.Rd index 75ca70a6c..73e56866e 100644 --- a/man/mlr_pipeops_targetmutate.Rd +++ b/man/mlr_pipeops_targetmutate.Rd @@ -130,6 +130,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_targettrafoscalerange.Rd b/man/mlr_pipeops_targettrafoscalerange.Rd index 283edcdaf..a985666d1 100644 --- a/man/mlr_pipeops_targettrafoscalerange.Rd +++ b/man/mlr_pipeops_targettrafoscalerange.Rd @@ -96,6 +96,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_textvectorizer.Rd b/man/mlr_pipeops_textvectorizer.Rd index 9d9f769a8..dd16d03f8 100644 --- a/man/mlr_pipeops_textvectorizer.Rd +++ b/man/mlr_pipeops_textvectorizer.Rd @@ -194,6 +194,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_threshold.Rd b/man/mlr_pipeops_threshold.Rd index fe07d21f5..945296081 100644 --- a/man/mlr_pipeops_threshold.Rd +++ b/man/mlr_pipeops_threshold.Rd @@ -89,6 +89,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_tunethreshold.Rd b/man/mlr_pipeops_tunethreshold.Rd index 7f104e6f8..a5d42fbbd 100644 --- a/man/mlr_pipeops_tunethreshold.Rd +++ b/man/mlr_pipeops_tunethreshold.Rd @@ -110,6 +110,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_unbranch.Rd b/man/mlr_pipeops_unbranch.Rd index 1289a9e04..8fe8fd707 100644 --- a/man/mlr_pipeops_unbranch.Rd +++ b/man/mlr_pipeops_unbranch.Rd @@ -96,6 +96,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_updatetarget.Rd b/man/mlr_pipeops_updatetarget.Rd index a6a727c05..185a0adfa 100644 --- a/man/mlr_pipeops_updatetarget.Rd +++ b/man/mlr_pipeops_updatetarget.Rd @@ -110,6 +110,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_vtreat.Rd b/man/mlr_pipeops_vtreat.Rd index 3a5011d26..c681c23b4 100644 --- a/man/mlr_pipeops_vtreat.Rd +++ b/man/mlr_pipeops_vtreat.Rd @@ -162,6 +162,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/man/mlr_pipeops_yeojohnson.Rd b/man/mlr_pipeops_yeojohnson.Rd index 7f452ff05..bc5949fb2 100644 --- a/man/mlr_pipeops_yeojohnson.Rd +++ b/man/mlr_pipeops_yeojohnson.Rd @@ -99,6 +99,7 @@ Other PipeOps: \code{\link{mlr_pipeops_encodelmer}}, \code{\link{mlr_pipeops_encode}}, \code{\link{mlr_pipeops_featureunion}}, +\code{\link{mlr_pipeops_filterrows}}, \code{\link{mlr_pipeops_filter}}, \code{\link{mlr_pipeops_fixfactors}}, \code{\link{mlr_pipeops_histbin}}, diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index acc496035..2f0241b2b 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -7,10 +7,6 @@ test_that("PipeOpFilterRows - basic properties", { expect_equal(train_pipeop(op, inputs = list(task))[[1L]], task) expect_equal(predict_pipeop(op, inputs = list(task))[[1L]], task) expect_datapreproc_pipeop_class(PipeOpFilterRows, task = task) - expect_datapreproc_pipeop_class(PipeOpFilterRows, - constargs = list(param_vals = list(filter_formula = ~ age < median(age), - na_selector = selector_all())), - task = task) }) test_that("PipeOpFilterRows - filtering", { From 23c4bed741276be47275f98d1752331ea8b8bb80 Mon Sep 17 00:00:00 2001 From: sumny Date: Sat, 10 Oct 2020 17:39:13 +0200 Subject: [PATCH 07/12] fix env of formula --- R/PipeOpFilterRows.R | 3 ++- tests/testthat/test_pipeop_filterrows.R | 5 ++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index 47816b65c..e99877326 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -98,7 +98,8 @@ PipeOpFilterRows = R6Class("PipeOpFilterRows", row_ids = setdiff(row_ids, na_ids) if (length(self$param_set$values$filter_formula)) { - row_ids = row_ids[which(eval(self$param_set$values$filter_formula[[2L]], envir = task$data(row_ids, cols = task$feature_names)))] + frm = self$param_set$values$filter_formula + row_ids = row_ids[which(eval(frm[[2L]], envir = task$data(row_ids, cols = task$feature_names), enclos = environment(frm)))] } task$filter(row_ids) diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index 2f0241b2b..30922d6e6 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -32,8 +32,11 @@ test_that("PipeOpFilterRows - filtering", { predict_out$data(cols = task_predict$feature_names)) # Works with variables from an env - some_test_val = 7 + env = new.env() + assign("some_test_val", 7, envir = env) + some_test_val = -100 # this should not be taken! filter_formula = ~ pregnant == some_test_val + environment(filter_formula) = env op$param_set$values$filter_formula = filter_formula expect_true(all(op$train(list(task))[[1L]]$data(cols = "pregnant")[[1L]] == 7L)) }) From 229284c792e9f6e5adced910f53c3b6530ac8685 Mon Sep 17 00:00:00 2001 From: sumny Date: Mon, 19 Oct 2020 18:11:38 +0200 Subject: [PATCH 08/12] finalize first version --- R/PipeOpFilterRows.R | 98 ++++++++++++++----------- man/mlr_pipeops_filterrows.Rd | 48 ++++++------ tests/testthat/test_pipeop_filterrows.R | 34 ++++++--- 3 files changed, 100 insertions(+), 80 deletions(-) diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index e99877326..4b254337e 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -2,12 +2,10 @@ #' #' @usage NULL #' @name mlr_pipeops_filterrows -#' @format [`R6Class`] object inheriting from [`PipeOpTaskPreprocSimple`]/[`PipeOpTaskPreproc`]/[`PipeOp`]. +#' @format [`R6Class`] object inheriting from [`PipeOpTaskPreproc`]/[`PipeOp`]. #' #' @description #' Filter rows of the data of a [`Task`][mlr3::Task]. -#' Also directly allows for the removal of rows with missing values with respect to some user-defined features. -#' If both row filtering and missing value removal is performed, filtering is done after missing value removal. #' #' @section Construction: #' ``` @@ -23,86 +21,82 @@ #' @section Input and Output Channels: #' Input and output channels are inherited from [`PipeOpTaskPreproc`]. #' -#' The output is the input [`Task`][mlr3::Task] with rows kept according to the filtering expression and -#' rows with missing values with respect to the user-defined features removed. +#' The output is the input [`Task`][mlr3::Task] with rows kept according to the filtering expression. +#' Whether filtering is performed during training and/or prediction can be specified via the `SDcols` parameter, see below. #' #' @section State: -#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpTaskPreproc`], as well as: -#' * `na_selection` :: `character` \cr -#' A `character` vector of all feature names that are checked for missing values in the [`Task`][mlr3::Task]. -#' Initialized to [`selector_none()`]. +#' The `$state` is left empty (`list()`). #' #' @section Parameters: #' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as: #' * `filter_formula` :: `NULL` | `formula` \cr #' Expression of the filtering to be performed, in the form of a `formula` that evaluates to `TRUE` or `FALSE` -#' for each row within the data of the [`Task`][mlr3::Task]. -#' Rows for which the evaluation is `TRUE` are kept, others are removed. +#' for each row within the [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. +#' Rows for which the evaluation is `TRUE` are kept in the output [`Task`][mlr3::Task], others are removed. #' Initialized to `NULL`, i.e., no filtering is performed and all rows are kept. -#' * `na_selector` :: `function` | [`Selector`] \cr -#' [`Selector`] function, takes a [`Task`][mlr3::Task] as an argument and returns a `character` vector of features -#' to check for missing values. -#' Rows with missing values with respect to these features are removed. -#' See [`Selector`] for example functions. -#' Initialized to `selector_none()`, i.e., no missing value removal is performed. +#' * `SDcols` :: `function` | [`Selector`] \cr +#' [`Selector`] function, takes a [`Task`][mlr3::Task] as an argument and returns a `character` vector of features. +#' This character vector is set as the `.SDcols` argument when the formula above is evaluated within the columns of the +#' [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. +#' Initialized to [`selector_all()`], i.e., all features can be used as the `.SD` variable. +#' * `phase` :: `character(1)` \cr +#' Character specifying the phase when filtering should be performed. Can either be `"always"`, `"train"`, or `"predict"`. +#' Initialized to `"always"`, i.e., filtering is performed both during training and prediction. #' #' @section Internals: #' A `formula` created using the `~` operator always contains a reference to the `environment` in which #' the `formula` is created. This makes it possible to use variables in the `~`-expressions that both #' reference either column names or variable names. #' -#' Uses the [`is.na()`][base::is.na] function for the checking of missing values. -#' #' @section Methods: -#' Only methods inherited from [`PipeOpTaskPreprocSimple`]/[`PipeOpTaskPreproc`]/[`PipeOp`]. +#' Only methods inherited from [`PipeOpTaskPreproc`]/[`PipeOp`]. #' #' @examples #' library("mlr3") #' task = tsk("pima") +#' # filter based on some formula #' po = PipeOpFilterRows$new(param_vals = list( -#' filter_formula = ~ age < 31 & glucose > median(glucose), -#' na_selector = selector_all()) +#' filter_formula = ~ age < 31 & glucose > median(glucose, na.rm = TRUE)) #' ) #' po$train(list(task)) -#' po$state +#' # missing value removal for all features +#' po$param_set$values$filter_formula = ~ !apply(is.na(.SD), MARGIN = 1L, FUN = any) +#' po$train(list(task)) +#' # missing value removal only for some features +#' po$param_set$values$SDcols = selector_name(c("mass", "pressure")) +#' po$train(list(task)) #' @family PipeOps #' @include PipeOpTaskPreproc.R #' @export PipeOpFilterRows = R6Class("PipeOpFilterRows", - inherit = PipeOpTaskPreprocSimple, + inherit = PipeOpTaskPreproc, public = list( initialize = function(id = "filterrows", param_vals = list()) { ps = ParamSet$new(params = list( ParamUty$new("filter_formula", tags = c("train", "predict"), custom_check = check_filter_formulae), - ParamUty$new("na_selector", tags = c("train", "required"), custom_check = check_function) + ParamUty$new("SDcols", tags = c("train", "predict"), custom_check = check_function), + ParamFct$new("phase", levels = c("always", "train", "predict"), tags = c("train", "predict")) )) - ps$values = list(filter_formula = NULL, na_selector = selector_none()) + ps$values = list(filter_formula = NULL, SDcols = selector_all(), phase = "always") super$initialize(id, param_set = ps, param_vals = param_vals) } ), private = list( - .get_state = function(task) { - na_selection = self$param_set$values$na_selector(task) - assert_subset(na_selection, task$feature_names) - list(na_selection = na_selection) - }, - - .transform = function(task) { - row_ids = task$row_ids - - na_ids = if (length(self$state$na_selection)) { - row_ids[which(rowSums(is.na(task$data(cols = self$state$na_selection))) > 0L)] + .train_task = function(task) { + self$state = list(NULL) + if (self$param_set$values$phase %in% c("always", "train") && length(self$param_set$values$filter_formula)) { + filter_task(task, frm = self$param_set$values$filter_formula, SDcols = self$param_set$values$SDcols(task)) } else { - integer(0L) + task } - row_ids = setdiff(row_ids, na_ids) + }, - if (length(self$param_set$values$filter_formula)) { - frm = self$param_set$values$filter_formula - row_ids = row_ids[which(eval(frm[[2L]], envir = task$data(row_ids, cols = task$feature_names), enclos = environment(frm)))] + .predict_task = function(task) { + if (self$param_set$values$phase %in% c("always", "predict") && length(self$param_set$values$filter_formula)) { + filter_task(task, frm = self$param_set$values$filter_formula, SDcols = self$param_set$values$SDcols(task)) + } else { + task } - - task$filter(row_ids) } ) ) @@ -119,4 +113,20 @@ check_filter_formulae = function(x) { } } +# helper function to filter a task based on a formula +# the formula is evaluated within the data.table backend of a task where .SDcols is set to SDcols +# (but only if required) +# @param task [Task] +# @param frm [formula] +# @param SDcols [character] +filter_task = function(task, frm, SDcols) { + taskdata = task$data() + row_ids = if (any(grepl(".SD", x = frm[[2L]]))) { + task$row_ids[which(task$data()[, eval(frm[[2L]], envir = NULL, enclos = environment(frm)), .SDcols = SDcols])] + } else { + task$row_ids[which(task$data()[, eval(frm[[2L]], enclos = environment(frm))])] + } + task$filter(row_ids) +} + mlr_pipeops$add("filterrows", PipeOpFilterRows) diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index 9300abb96..3fcea8037 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -5,12 +5,10 @@ \alias{PipeOpFilterRows} \title{PipeOpFilterRows} \format{ -\code{\link{R6Class}} object inheriting from \code{\link{PipeOpTaskPreprocSimple}}/\code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. +\code{\link{R6Class}} object inheriting from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. } \description{ Filter rows of the data of a \code{\link[mlr3:Task]{Task}}. -Also directly allows for the removal of rows with missing values with respect to some user-defined features. -If both row filtering and missing value removal is performed, filtering is done after missing value removal. } \section{Construction}{ \preformatted{PipeOpFilterRows$new(id = "filterrows", param_vals = list()) @@ -28,18 +26,13 @@ be set during construction. Default \code{list()}. Input and output channels are inherited from \code{\link{PipeOpTaskPreproc}}. -The output is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the filtering expression and -rows with missing values with respect to the user-defined features removed. +The output is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the filtering expression. +Whether filtering is performed during training and/or prediction can be specified via the \code{SDcols} parameter, see below. } \section{State}{ -The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}, as well as: -\itemize{ -\item \code{na_selection} :: \code{character} \cr -A \code{character} vector of all feature names that are checked for missing values in the \code{\link[mlr3:Task]{Task}}. -Initialized to \code{\link[=selector_none]{selector_none()}}. -} +The \verb{$state} is left empty (\code{list()}). } \section{Parameters}{ @@ -48,15 +41,17 @@ The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}} \itemize{ \item \code{filter_formula} :: \code{NULL} | \code{formula} \cr Expression of the filtering to be performed, in the form of a \code{formula} that evaluates to \code{TRUE} or \code{FALSE} -for each row within the data of the \code{\link[mlr3:Task]{Task}}. -Rows for which the evaluation is \code{TRUE} are kept, others are removed. +for each row within the \code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. +Rows for which the evaluation is \code{TRUE} are kept in the output \code{\link[mlr3:Task]{Task}}, others are removed. Initialized to \code{NULL}, i.e., no filtering is performed and all rows are kept. -\item \code{na_selector} :: \code{function} | \code{\link{Selector}} \cr -\code{\link{Selector}} function, takes a \code{\link[mlr3:Task]{Task}} as an argument and returns a \code{character} vector of features -to check for missing values. -Rows with missing values with respect to these features are removed. -See \code{\link{Selector}} for example functions. -Initialized to \code{selector_none()}, i.e., no missing value removal is performed. +\item \code{SDcols} :: \code{function} | \code{\link{Selector}} \cr +\code{\link{Selector}} function, takes a \code{\link[mlr3:Task]{Task}} as an argument and returns a \code{character} vector of features. +This character vector is set as the \code{.SDcols} argument when the formula above is evaluated within the columns of the +\code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. +Initialized to \code{\link[=selector_all]{selector_all()}}, i.e., all features can be used as the \code{.SD} variable. +\item \code{phase} :: \code{character(1)} \cr +Character specifying the phase when filtering should be performed. Can either be \code{"always"}, \code{"train"}, or \code{"predict"}. +Initialized to \code{"always"}, i.e., filtering is performed both during training and prediction. } } @@ -65,24 +60,27 @@ Initialized to \code{selector_none()}, i.e., no missing value removal is perform A \code{formula} created using the \code{~} operator always contains a reference to the \code{environment} in which the \code{formula} is created. This makes it possible to use variables in the \code{~}-expressions that both reference either column names or variable names. - -Uses the \code{\link[base:NA]{is.na()}} function for the checking of missing values. } \section{Methods}{ -Only methods inherited from \code{\link{PipeOpTaskPreprocSimple}}/\code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. +Only methods inherited from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}. } \examples{ library("mlr3") task = tsk("pima") +# filter based on some formula po = PipeOpFilterRows$new(param_vals = list( - filter_formula = ~ age < 31 & glucose > median(glucose), - na_selector = selector_all()) + filter_formula = ~ age < 31 & glucose > median(glucose, na.rm = TRUE)) ) po$train(list(task)) -po$state +# missing value removal for all features +po$param_set$values$filter_formula = ~ !apply(is.na(.SD), MARGIN = 1L, FUN = any) +po$train(list(task)) +# missing value removal only for some features +po$param_set$values$SDcols = selector_name(c("mass", "pressure")) +po$train(list(task)) } \seealso{ Other PipeOps: diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index 30922d6e6..48f532d11 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -19,17 +19,18 @@ test_that("PipeOpFilterRows - filtering", { dt_predict = task_predict$data(cols = task_predict$feature_names) op = PipeOpFilterRows$new(param_vals = list( - filter_formula = ~ (age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree))) + filter_formula = ~ (age < 31 & glucose > median(glucose, na.rm = TRUE)) | + pedigree < mean(pedigree, na.rm = TRUE))) train_out = op$train(list(task_train))[[1L]] - expect_equal(dt_train[(age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree), ], - train_out$data(cols = task_train$feature_names)) + expect_equal(dt_train[(age < 31 & glucose > median(glucose, na.rm = TRUE)) | + pedigree < mean(pedigree, na.rm = TRUE), ], train_out$data(cols = task_train$feature_names)) predict_out = op$predict(list(task_predict))[[1L]] - expect_equal(dt_predict[(age < 31 & glucose > median(glucose)) | pedigree < mean(pedigree), ], - predict_out$data(cols = task_predict$feature_names)) + expect_equal(dt_predict[(age < 31 & glucose > median(glucose, na.rm = TRUE)) | + pedigree < mean(pedigree, na.rm = TRUE), ], predict_out$data(cols = task_predict$feature_names)) # Works with variables from an env env = new.env() @@ -50,7 +51,7 @@ test_that("PipeOpFilterRows - missing values removal", { dt_train = task_train$data(cols = task_train$feature_names) dt_predict = task_predict$data(cols = task_predict$feature_names) - op = PipeOpFilterRows$new(param_vals = list(na_selector = selector_name("insulin"))) + op = PipeOpFilterRows$new(param_vals = list(filter_formula = ~ !is.na(insulin))) train_out = op$train(list(task_train))[[1L]] @@ -61,8 +62,13 @@ test_that("PipeOpFilterRows - missing values removal", { expect_equal(dt_predict[!is.na(insulin), ], predict_out$data(cols = task_predict$feature_names)) -}) + op$param_set$values$phase = "train" + expect_equal(op$predict(list(task_predict))[[1L]], task_predict) + + op$param_set$values$phase = "predict" + expect_equal(op$train(list(task_train))[[1L]], task_train) +}) test_that("PipeOpFilterRows - filtering and missing values removal", { set.seed(3) @@ -73,18 +79,24 @@ test_that("PipeOpFilterRows - filtering and missing values removal", { dt_train = task_train$data(cols = task_train$feature_names) dt_predict = task_predict$data(cols = task_predict$feature_names) - op = PipeOpFilterRows$new(param_vals = list(filter_formula = ~ age > median(age), - na_selector = selector_all())) + op = PipeOpFilterRows$new(param_vals = list( + filter_formula = ~ age > median(age, na.rm = TRUE) & + !apply(is.na(.SD), MARGIN = 1L, FUN = any))) train_out = op$train(list(task_train))[[1L]] - expect_equal(na.omit(dt_train)[age > median(age)], + expect_equal(na.omit(dt_train[age > median(age, na.rm = TRUE)]), train_out$data(cols = task_train$feature_names)) predict_out = op$predict(list(task_predict))[[1L]] - expect_equal(na.omit(dt_predict)[age > median(age)], + expect_equal(na.omit(dt_predict[age > median(age, na.rm = TRUE)]), predict_out$data(cols = task_predict$feature_names)) + + # Test with SDcols selector being explicitly set + op$param_set$values$filter_formula = ~ !apply(is.na(.SD), MARGIN = 1L, FUN = any) + op$param_set$values$SDcols = selector_name("insulin") + expect_equal(op$train(list(task))[[1L]]$data(), task$data()[!is.na(insulin), ]) }) test_that("PipeOpFilterRows - check_filter_formulae", { From 5ecf756f732673d1ccab3bbfd59062d34ea77f35 Mon Sep 17 00:00:00 2001 From: sumny Date: Tue, 20 Oct 2020 16:50:14 +0200 Subject: [PATCH 09/12] fix environment stuff for formula, add one more test and update docs --- R/PipeOpFilterRows.R | 17 ++++++++--------- man/mlr_pipeops_filterrows.Rd | 8 ++++---- tests/testthat/test_pipeop_filterrows.R | 5 +++++ 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index 4b254337e..ca73bc98b 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -22,21 +22,21 @@ #' Input and output channels are inherited from [`PipeOpTaskPreproc`]. #' #' The output is the input [`Task`][mlr3::Task] with rows kept according to the filtering expression. -#' Whether filtering is performed during training and/or prediction can be specified via the `SDcols` parameter, see below. +#' Whether filtering is performed during training and/or prediction can be specified via the `phase` parameter, see below. #' #' @section State: -#' The `$state` is left empty (`list()`). +#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpTaskPreproc`]. #' #' @section Parameters: #' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as: #' * `filter_formula` :: `NULL` | `formula` \cr #' Expression of the filtering to be performed, in the form of a `formula` that evaluates to `TRUE` or `FALSE` -#' for each row within the [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. +#' for each row within the frame of the [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. #' Rows for which the evaluation is `TRUE` are kept in the output [`Task`][mlr3::Task], others are removed. #' Initialized to `NULL`, i.e., no filtering is performed and all rows are kept. #' * `SDcols` :: `function` | [`Selector`] \cr #' [`Selector`] function, takes a [`Task`][mlr3::Task] as an argument and returns a `character` vector of features. -#' This character vector is set as the `.SDcols` argument when the formula above is evaluated within the columns of the +#' This character vector is set as the `.SDcols` argument when the formula above is evaluated within the frame of the #' [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. #' Initialized to [`selector_all()`], i.e., all features can be used as the `.SD` variable. #' * `phase` :: `character(1)` \cr @@ -83,7 +83,7 @@ PipeOpFilterRows = R6Class("PipeOpFilterRows", ), private = list( .train_task = function(task) { - self$state = list(NULL) + self$state = list() if (self$param_set$values$phase %in% c("always", "train") && length(self$param_set$values$filter_formula)) { filter_task(task, frm = self$param_set$values$filter_formula, SDcols = self$param_set$values$SDcols(task)) } else { @@ -114,17 +114,16 @@ check_filter_formulae = function(x) { } # helper function to filter a task based on a formula -# the formula is evaluated within the data.table backend of a task where .SDcols is set to SDcols +# the formula is evaluated within the frame of the data.table backend of a task where .SDcols is set to SDcols # (but only if required) # @param task [Task] # @param frm [formula] # @param SDcols [character] filter_task = function(task, frm, SDcols) { - taskdata = task$data() row_ids = if (any(grepl(".SD", x = frm[[2L]]))) { - task$row_ids[which(task$data()[, eval(frm[[2L]], envir = NULL, enclos = environment(frm)), .SDcols = SDcols])] + task$row_ids[which(task$data()[, (eval(frm[[2L]], envir = as.list(environment(frm)))), .SDcols = SDcols])] } else { - task$row_ids[which(task$data()[, eval(frm[[2L]], enclos = environment(frm))])] + task$row_ids[which(task$data()[, (eval(frm[[2L]], envir = as.list(environment(frm))))])] } task$filter(row_ids) } diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index 3fcea8037..aae999649 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -27,12 +27,12 @@ be set during construction. Default \code{list()}. Input and output channels are inherited from \code{\link{PipeOpTaskPreproc}}. The output is the input \code{\link[mlr3:Task]{Task}} with rows kept according to the filtering expression. -Whether filtering is performed during training and/or prediction can be specified via the \code{SDcols} parameter, see below. +Whether filtering is performed during training and/or prediction can be specified via the \code{phase} parameter, see below. } \section{State}{ -The \verb{$state} is left empty (\code{list()}). +The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}. } \section{Parameters}{ @@ -41,12 +41,12 @@ The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}} \itemize{ \item \code{filter_formula} :: \code{NULL} | \code{formula} \cr Expression of the filtering to be performed, in the form of a \code{formula} that evaluates to \code{TRUE} or \code{FALSE} -for each row within the \code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. +for each row within the frame of the \code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. Rows for which the evaluation is \code{TRUE} are kept in the output \code{\link[mlr3:Task]{Task}}, others are removed. Initialized to \code{NULL}, i.e., no filtering is performed and all rows are kept. \item \code{SDcols} :: \code{function} | \code{\link{Selector}} \cr \code{\link{Selector}} function, takes a \code{\link[mlr3:Task]{Task}} as an argument and returns a \code{character} vector of features. -This character vector is set as the \code{.SDcols} argument when the formula above is evaluated within the columns of the +This character vector is set as the \code{.SDcols} argument when the formula above is evaluated within the frame of the \code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. Initialized to \code{\link[=selector_all]{selector_all()}}, i.e., all features can be used as the \code{.SD} variable. \item \code{phase} :: \code{character(1)} \cr diff --git a/tests/testthat/test_pipeop_filterrows.R b/tests/testthat/test_pipeop_filterrows.R index 48f532d11..8e9daed4c 100644 --- a/tests/testthat/test_pipeop_filterrows.R +++ b/tests/testthat/test_pipeop_filterrows.R @@ -40,6 +40,11 @@ test_that("PipeOpFilterRows - filtering", { environment(filter_formula) = env op$param_set$values$filter_formula = filter_formula expect_true(all(op$train(list(task))[[1L]]$data(cols = "pregnant")[[1L]] == 7L)) + + filter_formula = ~ pregnant == some_test_val & !apply(is.na(.SD), MARGIN = 1L, FUN = any) + environment(filter_formula) = env + op$param_set$values$filter_formula = filter_formula + expect_equal(op$train(list(task))[[1L]]$data(), na.omit(task$data())[pregnant == 7, ]) }) test_that("PipeOpFilterRows - missing values removal", { From 2e3fb8b87e38466f4186b2a711763e05a6e118f3 Mon Sep 17 00:00:00 2001 From: sumny Date: Thu, 11 Mar 2021 14:56:16 +0100 Subject: [PATCH 10/12] minor reiterate --- NEWS.md | 3 --- R/PipeOpFilterRows.R | 8 ++++---- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/NEWS.md b/NEWS.md index 1180d3a77..f56e76b75 100644 --- a/NEWS.md +++ b/NEWS.md @@ -15,9 +15,6 @@ * Changed PipeOps: - PipeOpColApply: now allows for an applicator function with multiple columns as a return value; also inherits from PipeOpTaskPreprocSimple now - New PipeOps: - - PipeOpFilterRows - # mlr3pipelines 0.3.1 * Changed PipeOps: diff --git a/R/PipeOpFilterRows.R b/R/PipeOpFilterRows.R index ca73bc98b..c3dc595ce 100644 --- a/R/PipeOpFilterRows.R +++ b/R/PipeOpFilterRows.R @@ -29,7 +29,7 @@ #' #' @section Parameters: #' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as: -#' * `filter_formula` :: `NULL` | `formula` \cr +#' * `filter_formula` :: `formula` | `NULL` \cr #' Expression of the filtering to be performed, in the form of a `formula` that evaluates to `TRUE` or `FALSE` #' for each row within the frame of the [`data.table`] [`DataBackend`][mlr3::DataBackend] of the [`Task`][mlr3::Task]. #' Rows for which the evaluation is `TRUE` are kept in the output [`Task`][mlr3::Task], others are removed. @@ -73,9 +73,9 @@ PipeOpFilterRows = R6Class("PipeOpFilterRows", public = list( initialize = function(id = "filterrows", param_vals = list()) { ps = ParamSet$new(params = list( - ParamUty$new("filter_formula", tags = c("train", "predict"), custom_check = check_filter_formulae), - ParamUty$new("SDcols", tags = c("train", "predict"), custom_check = check_function), - ParamFct$new("phase", levels = c("always", "train", "predict"), tags = c("train", "predict")) + ParamUty$new("filter_formula", tags = c("train", "predict", "required"), custom_check = check_filter_formulae), + ParamUty$new("SDcols", tags = c("train", "predict", "required"), custom_check = check_function), + ParamFct$new("phase", levels = c("always", "train", "predict"), tags = c("train", "predict", "required")) )) ps$values = list(filter_formula = NULL, SDcols = selector_all(), phase = "always") super$initialize(id, param_set = ps, param_vals = param_vals) From 511d2d1a9926fd36daf103be095c0ee310a5c959 Mon Sep 17 00:00:00 2001 From: sumny Date: Thu, 11 Mar 2021 14:57:47 +0100 Subject: [PATCH 11/12] news and docs --- NEWS.md | 2 ++ man/mlr_pipeops_filterrows.Rd | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index f56e76b75..92776eec9 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,6 @@ # mlr3pipelines 0.3.4-9000 +* New PipeOps: + - PipeOpFilterRows # mlr3pipelines 0.3.4 diff --git a/man/mlr_pipeops_filterrows.Rd b/man/mlr_pipeops_filterrows.Rd index aae999649..a6aee8c34 100644 --- a/man/mlr_pipeops_filterrows.Rd +++ b/man/mlr_pipeops_filterrows.Rd @@ -39,7 +39,7 @@ The \verb{$state} is a named \code{list} with the \verb{$state} elements inherit The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}}, as well as: \itemize{ -\item \code{filter_formula} :: \code{NULL} | \code{formula} \cr +\item \code{filter_formula} :: \code{formula} | \code{NULL} \cr Expression of the filtering to be performed, in the form of a \code{formula} that evaluates to \code{TRUE} or \code{FALSE} for each row within the frame of the \code{\link{data.table}} \code{\link[mlr3:DataBackend]{DataBackend}} of the \code{\link[mlr3:Task]{Task}}. Rows for which the evaluation is \code{TRUE} are kept in the output \code{\link[mlr3:Task]{Task}}, others are removed. From 47d414b3077f39a5fccac63da05e47b0d4093b31 Mon Sep 17 00:00:00 2001 From: sumny Date: Thu, 22 Apr 2021 19:21:41 +0200 Subject: [PATCH 12/12] rerun docs --- man/mlr_graphs_robustify.Rd | 1 - 1 file changed, 1 deletion(-) diff --git a/man/mlr_graphs_robustify.Rd b/man/mlr_graphs_robustify.Rd index a6043ff89..f30fb6552 100644 --- a/man/mlr_graphs_robustify.Rd +++ b/man/mlr_graphs_robustify.Rd @@ -45,7 +45,6 @@ Performs the following steps: \item Imputes \code{factor} features using \code{\link{PipeOpImputeOOR}} \item Encodes \code{factors} using \code{one-hot-encoding}. Factors with a cardinality > max_cardinality are collapsed using \code{\link{PipeOpCollapseFactors}} -\item If \code{scaling}, numeric features are scaled to mean 0 and standard deviation 1 } The graph is built conservatively, i.e. the function always tries to assure everything works.