diff --git a/CHANGELOG.md b/CHANGELOG.md index c118ecab..0ae7c569 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## 0.2.3 - ???? +## 0.2.3 - 2021-01-27 ### Features diff --git a/documentation/installation.md b/documentation/installation.md index ace7a620..a5201221 100644 --- a/documentation/installation.md +++ b/documentation/installation.md @@ -15,6 +15,11 @@ interfaces. * [Linux / MacOS](#linux--macos) * [Windows](#windows) * [Running a minimal example](#running-a-minimal-example) + * [Compilation on and for Raspberry Pi](#compilation-on-and-for-raspberry-pi) + * [Install requirements](#install-requirements) + * [Compile Bazel](#compile-bazel) + * [Compile YDF](#compile-ydf) + * [Test YDF](#test-ydf) * [Using the C++ library](#using-the-c-library) * [Troubleshooting](#troubleshooting) diff --git a/documentation/learners.md b/documentation/learners.md index f2c111ee..d7195616 100644 --- a/documentation/learners.md +++ b/documentation/learners.md @@ -105,6 +105,23 @@ the gradient of the loss relative to the model output). - Rolling number of trees used to detect validation loss increase and trigger early stopping. +#### [focal_loss_alpha](../yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto?q=symbol:focal_loss_alpha) + +- **Type:** Real **Default:** 0.5 **Possible values:** min:0 max:1 + +- EXPERIMENTAL. Wighting parameter for focal loss, positive samples weighted + by alpha, negative samples by (1-alpha). The default 0.5 value means no + active class-level weighting. Only used with Focal loss i.e. + `loss="BINARY_FOCAL_LOSS"` + +#### [focal_loss_gamma](../yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto?q=symbol:focal_loss_gamma) + +- **Type:** Real **Default:** 2 **Possible values:** min:0 + +- EXPERIMENTAL. Exponent of the misprediction exponent term in focal Loss, + corresponds to gamma parameter in https://arxiv.org/pdf/1708.02002.pdf. Only + used with Focal loss i.e. `loss="BINARY_FOCAL_LOSS"` + #### [forest_extraction](../yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto?q=symbol:forest_extraction) - **Type:** Categorical **Default:** MART **Possible values:** MART, DART @@ -133,6 +150,17 @@ the gradient of the loss relative to the model output). - How to grow the tree.
- `LOCAL`: Each node is split independently of the other nodes. In other words, as long as a node satisfy the splits "constraints (e.g. maximum depth, minimum number of observations), the node will be split. This is the "classical" way to grow decision trees.
- `BEST_FIRST_GLOBAL`: The node with the best loss reduction among all the nodes of the tree is selected for splitting. This method is also called "best first" or "leaf-wise growth". See "Best-first decision tree learning", Shi and "Additive logistic regression : A statistical view of boosting", Friedman for more details. +#### [honest](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:honest) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- In honest trees, different training examples are used to infer the structure + and the leaf values. This regularization technique trades examples for bias + estimates. It might increase or reduce the quality of the model. See + "Generalized Random Forests", Athey et al. In this paper, Honest tree are + trained with the Random Forest algorithm with a sampling without + replacement. + #### [in_split_min_examples_check](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:in_split_min_examples_check) - **Type:** Categorical **Default:** true **Possible values:** true, false @@ -185,7 +213,7 @@ the gradient of the loss relative to the model output). - **Type:** Categorical **Default:** DEFAULT **Possible values:** DEFAULT, BINOMIAL_LOG_LIKELIHOOD, SQUARED_ERROR, MULTINOMIAL_LOG_LIKELIHOOD, - LAMBDA_MART_NDCG5, XE_NDCG_MART + LAMBDA_MART_NDCG5, XE_NDCG_MART, BINARY_FOCAL_LOSS - The loss optimized by the model. If not specified (DEFAULT) the loss is selected automatically according to the \"task\" and label statistics. For example, if task=CLASSIFICATION and the label has two possible values, the loss will be set to BINOMIAL_LOG_LIKELIHOOD. Possible values are:
- `DEFAULT`: Select the loss automatically according to the task and label statistics.
- `BINOMIAL_LOG_LIKELIHOOD`: Binomial log likelihood. Only valid for binary classification.
- `SQUARED_ERROR`: Least square loss. Only valid for regression.
- `MULTINOMIAL_LOG_LIKELIHOOD`: Multinomial log likelihood i.e. cross-entropy. Only valid for binary or multi-class classification.
- `LAMBDA_MART_NDCG5`: LambdaMART with NDCG5.
- `XE_NDCG_MART`: Cross Entropy Loss NDCG. See arxiv.org/abs/1911.09798.
@@ -501,6 +529,17 @@ It is probably the most well-known of the Decision Forest training algorithms. - How to grow the tree.
- `LOCAL`: Each node is split independently of the other nodes. In other words, as long as a node satisfy the splits "constraints (e.g. maximum depth, minimum number of observations), the node will be split. This is the "classical" way to grow decision trees.
- `BEST_FIRST_GLOBAL`: The node with the best loss reduction among all the nodes of the tree is selected for splitting. This method is also called "best first" or "leaf-wise growth". See "Best-first decision tree learning", Shi and "Additive logistic regression : A statistical view of boosting", Friedman for more details. +#### [honest](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:honest) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- In honest trees, different training examples are used to infer the structure + and the leaf values. This regularization technique trades examples for bias + estimates. It might increase or reduce the quality of the model. See + "Generalized Random Forests", Athey et al. In this paper, Honest tree are + trained with the Random Forest algorithm with a sampling without + replacement. + #### [in_split_min_examples_check](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:in_split_min_examples_check) - **Type:** Categorical **Default:** true **Possible values:** true, false @@ -610,6 +649,16 @@ It is probably the most well-known of the Decision Forest training algorithms. - Random seed for the training of the model. Learners are expected to be deterministic by the random seed. +#### [sampling_with_replacement](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:sampling_with_replacement) + +- **Type:** Categorical **Default:** true **Possible values:** true, false + +- If true, the training examples are sampled with replacement. If false, the + training samples are sampled without replacement. Only used when + "bootstrap_training_dataset=true". If false (sampling without replacement) + and if "bootstrap_size_ratio=1" (default), all the examples are used to + train all the trees (you probably do not want that). + #### [sorting_strategy](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:sorting_strategy) - **Type:** Categorical **Default:** PRESORT **Possible values:** IN_NODE, @@ -741,6 +790,17 @@ used to grow the tree while the second is used to prune the tree. - How to grow the tree.
- `LOCAL`: Each node is split independently of the other nodes. In other words, as long as a node satisfy the splits "constraints (e.g. maximum depth, minimum number of observations), the node will be split. This is the "classical" way to grow decision trees.
- `BEST_FIRST_GLOBAL`: The node with the best loss reduction among all the nodes of the tree is selected for splitting. This method is also called "best first" or "leaf-wise growth". See "Best-first decision tree learning", Shi and "Additive logistic regression : A statistical view of boosting", Friedman for more details. +#### [honest](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:honest) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- In honest trees, different training examples are used to infer the structure + and the leaf values. This regularization technique trades examples for bias + estimates. It might increase or reduce the quality of the model. See + "Generalized Random Forests", Athey et al. In this paper, Honest tree are + trained with the Random Forest algorithm with a sampling without + replacement. + #### [in_split_min_examples_check](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:in_split_min_examples_check) - **Type:** Categorical **Default:** true **Possible values:** true, false