Skip to content

Commit

Permalink
updates to README (simplify, remove twitter link)
Browse files Browse the repository at this point in the history
  • Loading branch information
bblodfon committed May 7, 2024
1 parent bfb184c commit b3c3b4c
Show file tree
Hide file tree
Showing 2 changed files with 119 additions and 164 deletions.
127 changes: 56 additions & 71 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,124 +23,109 @@ Probabilistic Supervised Learning for **[mlr3](https://github.com/mlr-org/mlr3/)
[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)
<!-- badges: end -->

## What is mlr3proba ?
## What is mlr3proba?

**mlr3proba** is a machine learning toolkit for making probabilistic predictions within the **[mlr3](https://github.com/mlr-org/mlr3)** ecosystem. It currently supports the following tasks:
`mlr3proba` is a machine learning toolkit for making probabilistic predictions within the **[mlr3](https://github.com/mlr-org/mlr3)** ecosystem.
It currently supports the following tasks:

* **Probabilistic supervised regression** - Supervised regression with a predictive distribution as the return type.
* **Predictive survival analysis** - Survival analysis where individual predictive hazards can be queried. This is equivalent to probabilistic supervised regression with censored observations.
* **Unconditional distribution estimation**, where the distribution is returned. Sub-cases are density estimation and unconditional survival estimation.

Key features of **mlr3proba** are

* A unified fit/predict model interface to any probabilistic predictive model (frequentist, Bayesian, or other)
* Pipeline/model composition
* Task reduction strategies
* Domain-agnostic evaluation workflows using task specific algorithmic performance measures.

**mlr3proba** makes use of the **[distr6](https://github.com/alan-turing-institute/distr6)** probability distribution interface as its probabilistic predictive return type.
1. **Predictive survival analysis**: survival analysis where individual hazards and survival distributions can be queried.
2. **Unconditional distribution estimation**: main returned output is the distribution.
Sub-cases are density estimation and unconditional survival estimation.
3. **Probabilistic supervised regression**: Supervised regression with a predictive distribution as the return type.

The survival analysis part is considered in a mature state, the rest are in early stages of development.

## Feature Overview

The current **mlr3proba** release focuses on survival analysis, and contains:

* Task frameworks for survival analysis (`TaskSurv`)
* A comprehensive selection of predictive survival learners (mostly via [mlr3extralearners](https://github.com/mlr-org/mlr3extralearners/))
* A comprehensive selection of performance measures for predictive survival learners, with respect to prognostic index (continuous rank) prediction, and probabilistic (distribution) prediction
* PipeOps integrated with **[mlr3pipelines](https://github.com/mlr-org/mlr3pipelines)**, for basic pipeline building, and reduction/composition strategies using linear predictors and baseline hazards.

## Roadmap

The vision of **mlr3proba** is to provide comprehensive machine learning functionality to the mlr3 ecosystem for continuous probabilistic return types.

The lifecycle of the survival task and features are considered `maturing` and any major changes are unlikely.

The density and probabilistic supervised regression tasks are currently in the early stages of development. Task frameworks have been drawn up, but may not be stable; learners need to be interfaced, and contributions are very welcome (see [issues](https://github.com/mlr-org/mlr3proba/issues)).
Key features of `mlr3proba` focus on survival analysis and are:

- Task frameworks for survival analysis (`TaskSurv`)
- A comprehensive selection of predictive survival learners (mostly via
[mlr3extralearners](https://github.com/mlr-org/mlr3extralearners/))
- A unified `train`/`predict` model interface to any probabilistic predictive model (frequentist, Bayesian, Deep Learning, or other)
- Use of the **[distr6](https://github.com/alan-turing-institute/distr6)**
probability distribution interface as its probabilistic predictive return type
- A comprehensive selection of measures for evaluating the performance of survival learners, with respect to prognostic index (continuous rank) prediction, and probabilistic (distribution) prediction
- Basic ML pipeline building integrated with **[mlr3pipelines](https://github.com/mlr-org/mlr3pipelines)**
- Reduction/composition strategies using linear predictors and baseline hazards

## Installation

`mlr3proba` is not on CRAN and is unlikely to be reuploaded (see [here](https://twitter.com/RaphaelS101/status/1506321623250571265) for reasons). As such you must install with one of the following methods:
`mlr3proba` is not currently on CRAN.
Please follow one of the two following methods to install it:

### Install from r-universe:
### R-universe

```{r eval=FALSE}
options(repos=c(
mlrorg = 'https://mlr-org.r-universe.dev',
raphaels1 = 'https://raphaels1.r-universe.dev',
CRAN = 'https://cloud.r-project.org'
))
install.packages("mlr3proba")
```

or

```{r eval=FALSE}
```r
install.packages("mlr3proba", repos = "https://mlr-org.r-universe.dev")
```

### Or for easier installation going forward:
Or for easier installation going forward:

1. Run `usethis::edit_r_environ()` then in the file that opened add or edit `options` to look something like

```{r eval=FALSE}
1. Run `usethis::edit_r_environ()` then in the file that opened add or edit `options` to look something like:
```r
options(repos = c(
raphaels1 = "https://raphaels1.r-universe.dev",
mlrorg = "https://mlr-org.r-universe.dev",
CRAN = 'https://cloud.r-project.org'
raphaels1 = "https://raphaels1.r-universe.dev",
mlrorg = "https://mlr-org.r-universe.dev", # add this line
CRAN = "https://cloud.r-project.org"
))
```

2. Save and close the file, restart your R session
2. Save and close the file, restart your `R` session
3. Run `install.packages("mlr3proba")` as usual

### Install from GitHub:
### GitHub

```{r eval=FALSE}
Install the latest development version:
```r
remotes::install_github("mlr-org/mlr3proba")
```

## Learners

Core learners are implemented in [mlr3proba](https://mlr3proba.mlr-org.com/reference/index.html#survival-learners), recommended common learners are implemented in [mlr3learners](https://github.com/mlr-org/mlr3learners), and many more are implemented in [mlr3extralearners](https://github.com/mlr-org/mlr3extralearners). Use the [interactive search table](https://mlr-org.com/learners.html) to search for available survival learners and see the [learner status page](https://mlr3extralearners.mlr-org.com/articles/learner_status.html) for their live status.
- [Core learners](https://mlr3proba.mlr-org.com/reference/index.html#survival-learners) are implemented in `mlr3proba` and include the Kaplan-Meier Estimator, the Cox Proportional Hazards model and the Survival Tree learner.
- In [mlr3extralearners](https://github.com/mlr-org/mlr3extralearners) we have interfaced several advanced ML survival learners.
Use the [interactive search table](https://mlr-org.com/learners.html) to search for THE available survival learners and see the [learner status page](https://mlr3extralearners.mlr-org.com/articles/learner_status.html) for their live status.

## Measures

For density estimation only the log-loss is currently implemented, for survival analysis, see full list [here](https://mlr3proba.mlr-org.com/reference/index.html#survival-measures).
For density estimation and probabilistic regression only the **log-loss** is currently implemented.
For survival analysis, see full list [here](https://mlr3proba.mlr-org.com/reference/index.html#survival-measures).

Some commonly used measures are the following:

| ID | Measure | Package | Type |
| :--| :------ | :------ | :------ |
| [surv.dcalib](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.dcalib.html) | D-Calibration | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Calibration
| [surv.cindex](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.cindex.html) | Concordance Index | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Discrimination
| [surv.uno_auc](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.uno_auc.html) | Uno's AUC | [survAUC](https://CRAN.R-project.org/package=survAUC) | Discrimination
| [surv.graf](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.graf.html) | Integrated Brier Score | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule
| [surv.rcll](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.rcll.html) | Right-Censored Log loss | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule
| [surv.intlogloss](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.intlogloss.html) | Integrated Log Loss | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule
| ID | Measure | Package | Category | Prediction Type
| :--| :------ | :------ | :------ | :------- |
| [surv.dcalib](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.dcalib.html) | D-Calibration | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Calibration | distr
| [surv.cindex](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.cindex.html) | Concordance Index | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Discrimination | crank
| [surv.uno_auc](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.uno_auc.html) | Uno's AUC | [survAUC](https://CRAN.R-project.org/package=survAUC) | Discrimination | lp
| [surv.graf](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.graf.html) | Integrated Brier Score | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule | distr
| [surv.rcll](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.rcll.html) | Right-Censored Log loss | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule | distr
| [surv.intlogloss](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.intlogloss.html) | Integrated Log-Likelihood | [mlr3proba](https://CRAN.R-project.org/package=mlr3proba) | Scoring Rule | distr

## Bugs, Questions, Feedback

**mlr3proba** is a free and open source software project that
encourages participation and feedback. If you have any issues,
questions, suggestions or feedback, please do not hesitate to open an
**mlr3proba** is a free and open source software project that encourages participation and feedback.
If you have any issues, questions, suggestions or feedback, please do not hesitate to open an
"issue" about it on the [GitHub page](https://github.com/mlr-org/mlr3proba/issues)\!

In case of problems / bugs, it is often helpful if you provide a
"minimum working example" that showcases the behavior (but don't worry about this if the bug is obvious).

In case of problems / bugs, it is often helpful if you provide a "minimum working example" using [reprex](https://reprex.tidyverse.org/) that showcases the behavior.

## Similar Projects

Predecessors to this package are previous instances of survival modelling in **[mlr](https://github.com/mlr-org/mlr)**.
The **[skpro](https://github.com/alan-turing-institute/skpro)** package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor.
Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as **[rjags](https://CRAN.R-project.org/package=rjags)** and **[stan](https://github.com/stan-dev/rstan)**.
For implementation of a few survival models and measures, a central package is **[survival](https://github.com/therneau/survival)**.
There does not appear to be a package that provides an architectural framework for distribution/density estimation, see **[this list](https://vita.had.co.nz/papers/density-estimation.pdf)** for a review of density estimation packages in R.
There does not appear to be a package that provides an architectural framework for distribution/density estimation, see **[this list](https://vita.had.co.nz/papers/density-estimation.pdf)** for a review of density estimation packages in `R`.

## Acknowledgements

Several people contributed to the building of `mlr3proba`. Firstly, thanks to Michel Lang for writing `mlr3survival`. Several learners and measures implemented in `mlr3proba`, as well as the prediction, task, and measure surv objects, were written initially in `mlr3survival` before being absorbed into `mlr3proba`. Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types. Also for mathematical contributions towards the scoring rules implemented in the package. Finally thanks to Bernd Bischl and the rest of the mlr core team for building `mlr3` and for many conversations about the design of `mlr3proba`.
Several people contributed to the building of `mlr3proba`.
Firstly, thanks to Michel Lang for writing `mlr3survival`.
Several learners and measures implemented in `mlr3proba`, as well as the prediction, task, and measure surv objects, were written initially in `mlr3survival` before being absorbed into `mlr3proba`.
Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types.
Also for mathematical contributions towards the scoring rules implemented in the package.
Finally thanks to Bernd Bischl and the rest of the mlr core team for building `mlr3` and for many conversations about the design of `mlr3proba`.

## Citing mlr3proba

Expand Down
Loading

0 comments on commit b3c3b4c

Please sign in to comment.