Skip to content

Commit

Permalink
Internal change
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 416571722
  • Loading branch information
achoum authored and copybara-github committed Dec 15, 2021
1 parent 9e9f777 commit b553989
Show file tree
Hide file tree
Showing 6 changed files with 96 additions and 9 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## ????
## 0.2.2 - 2021-12-13

### Features

Expand Down
2 changes: 0 additions & 2 deletions documentation/developer_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@
* [How to test the code](#how-to-test-the-code)
* [Models and Learners](#models-and-learners)

<!-- Added by: gbm, at: Mon 31 May 2021 06:16:20 PM CEST -->

<!--te-->

## Design principles
Expand Down
2 changes: 0 additions & 2 deletions documentation/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ interfaces.
* [Using the C++ library](#using-the-c-library)
* [Troubleshooting](#troubleshooting)

<!-- Added by: gbm, at: Mon 31 May 2021 06:16:20 PM CEST -->

<!--te-->

## Installation pre-compiled command-line-interface
Expand Down
2 changes: 0 additions & 2 deletions documentation/learner_distributed_gradient_boosted_trees.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@
* [IO](#io)
* [Limitations](#limitations)

<!-- Added by: gbm, at: Fri 08 Oct 2021 02:54:48 PM CEST -->

<!--te-->

## Introduction
Expand Down
95 changes: 95 additions & 0 deletions documentation/learners.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,14 @@ the gradient of the loss relative to the model output).
- Lambda regularization applied to certain training loss functions. Only for
NDCG loss.

#### [loss](../yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto?q=symbol:loss)

- **Type:** Categorical **Default:** DEFAULT **Possible values:** DEFAULT,
BINOMIAL_LOG_LIKELIHOOD, SQUARED_ERROR, MULTINOMIAL_LOG_LIKELIHOOD,
LAMBDA_MART_NDCG5, XE_NDCG_MART

- The loss optimized by the model. If not specified (DEFAULT) the loss is selected automatically according to the \"task\" and label statistics. For example, if task=CLASSIFICATION and the label has two possible values, the loss will be set to BINOMIAL_LOG_LIKELIHOOD. Possible values are:<br>- `DEFAULT`: Select the loss automatically according to the task and label statistics.<br>- `BINOMIAL_LOG_LIKELIHOOD`: Binomial log likelihood. Only valid for binary classification.<br>- `SQUARED_ERROR`: Least square loss. Only valid for regression.<br>- `MULTINOMIAL_LOG_LIKELIHOOD`: Multinomial log likelihood i.e. cross-entropy. Only valid for binary or multi-class classification.<br>- `LAMBDA_MART_NDCG5`: LambdaMART with NDCG5.<br>- `XE_NDCG_MART`: Cross Entropy Loss NDCG. See arxiv.org/abs/1911.09798.<br>

#### [max_depth](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:max_depth)

- **Type:** Integer **Default:** 6 **Possible values:** min:-1
Expand Down Expand Up @@ -315,6 +323,13 @@ the gradient of the loss relative to the model output).
number of random projections to test at each node as
`num_features^num_projections_exponent`.

#### [sparse_oblique_weights](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:sparse_oblique_weights)

- **Type:** Categorical **Default:** BINARY **Possible values:** BINARY,
CONTINUOUS

- For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Possible values:<br>- `BINARY`: The oblique weights are sampled in {-1,1} (default).<br>- `CONTINUOUS`: The oblique weights are be sampled in [-1,1].

#### [split_axis](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:split_axis)

- **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:**
Expand All @@ -329,6 +344,19 @@ the gradient of the loss relative to the model output).
- Ratio of the dataset (sampling without replacement) used to train individual
trees for the random sampling method.

#### [uplift_min_examples_in_treatment](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_min_examples_in_treatment)

- **Type:** Integer **Default:** 5 **Possible values:** min:0

- For uplift models only. Minimum number of examples per treatment in a node.

#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

#### [use_goss](../yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto?q=symbol:use_goss)

- **Type:** Categorical **Default:** false **Possible values:** true, false
Expand Down Expand Up @@ -399,6 +427,24 @@ It is probably the most well-known of the Decision Forest training algorithms.
- If true, the tree training evaluates conditions of the type `X is NA` i.e.
`X is missing`.

#### [bootstrap_size_ratio](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:bootstrap_size_ratio)

- **Type:** Real **Default:** 1 **Possible values:** min:0

- Number of examples used to train each trees; expressed as a ratio of the
training dataset size.

#### [bootstrap_training_dataset](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:bootstrap_training_dataset)

- **Type:** Categorical **Default:** true **Possible values:** true, false

- If true (default), each tree is trained on a separate dataset sampled with
replacement from the original dataset. If false, all the trees are trained
on the entire same dataset. If bootstrap_training_dataset:false, OOB metrics
are not available. bootstrap_training_dataset=false is used in "Extremely
randomized trees"
(https://link.springer.com/content/pdf/10.1007%2Fs10994-006-6226-1.pdf).

#### [categorical_algorithm](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:categorical_algorithm)

- **Type:** Categorical **Default:** CART **Possible values:** CART, ONE_HOT,
Expand Down Expand Up @@ -540,6 +586,15 @@ It is probably the most well-known of the Decision Forest training algorithms.
as well as -1. If not set or equal to -1, the `num_candidate_attributes` is
used.

#### [num_oob_variable_importances_permutations](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:num_oob_variable_importances_permutations)

- **Type:** Integer **Default:** 1 **Possible values:** min:1

- Number of time the dataset is re-shuffled to compute the permutation
variable importances. Increasing this value increase the training time (if
"compute_oob_variable_importances:true") as well as the stability of the oob
variable importance metrics.

#### [num_trees](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:num_trees)

- **Type:** Integer **Default:** 300 **Possible values:** min:1
Expand Down Expand Up @@ -585,13 +640,33 @@ It is probably the most well-known of the Decision Forest training algorithms.
number of random projections to test at each node as
`num_features^num_projections_exponent`.

#### [sparse_oblique_weights](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:sparse_oblique_weights)

- **Type:** Categorical **Default:** BINARY **Possible values:** BINARY,
CONTINUOUS

- For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Possible values:<br>- `BINARY`: The oblique weights are sampled in {-1,1} (default).<br>- `CONTINUOUS`: The oblique weights are be sampled in [-1,1].

#### [split_axis](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:split_axis)

- **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:**
AXIS_ALIGNED, SPARSE_OBLIQUE

- What structure of split to consider for numerical features.<br>- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.<br>- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.

#### [uplift_min_examples_in_treatment](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_min_examples_in_treatment)

- **Type:** Integer **Default:** 5 **Possible values:** min:0

- For uplift models only. Minimum number of examples per treatment in a node.

#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

#### [winner_take_all](../yggdrasil_decision_forests/learner/random_forest/random_forest.proto?q=symbol:winner_take_all_inference)

- **Type:** Categorical **Default:** true **Possible values:** true, false
Expand Down Expand Up @@ -788,13 +863,33 @@ used to grow the tree while the second is used to prune the tree.
number of random projections to test at each node as
`num_features^num_projections_exponent`.

#### [sparse_oblique_weights](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:sparse_oblique_weights)

- **Type:** Categorical **Default:** BINARY **Possible values:** BINARY,
CONTINUOUS

- For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Possible values:<br>- `BINARY`: The oblique weights are sampled in {-1,1} (default).<br>- `CONTINUOUS`: The oblique weights are be sampled in [-1,1].

#### [split_axis](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:split_axis)

- **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:**
AXIS_ALIGNED, SPARSE_OBLIQUE

- What structure of split to consider for numerical features.<br>- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.<br>- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.

#### [uplift_min_examples_in_treatment](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_min_examples_in_treatment)

- **Type:** Integer **Default:** 5 **Possible values:** min:0

- For uplift models only. Minimum number of examples per treatment in a node.

#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

#### [validation_ratio](../yggdrasil_decision_forests/learner/cart/cart.proto?q=symbol:validation_ratio)

- **Type:** Real **Default:** 0.1 **Possible values:** min:0 max:1
Expand Down
2 changes: 0 additions & 2 deletions documentation/user_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ It is complementary to the beginner example available in `examples/`.
* [Fast engine](#fast-engine)
* [Advanced features](#advanced-features)

<!-- Added by: gbm, at: Mon 31 May 2021 06:16:20 PM CEST -->

<!--te-->

## Interfaces
Expand Down

0 comments on commit b553989

Please sign in to comment.