From 8bf9db09166de189c0e0c2f8633a17b4bcb7a845 Mon Sep 17 00:00:00 2001 From: Richard Stotz Date: Wed, 21 Aug 2024 03:21:05 -0700 Subject: [PATCH] Prepare release of TF-DF 1.10.0 and YDF 1.10.0 and PYDF 0.7.0 PiperOrigin-RevId: 665797043 --- CHANGELOG.md | 3 +- documentation/public/docs/hyperparameters.md | 92 +++++++------------ .../port/python/CHANGELOG.md | 14 ++- .../port/python/config/setup.py | 4 +- .../port/python/dev_requirements.txt | 2 +- .../pybind11_protobuf/workspace.bzl | 4 +- .../port/python/tools/build_test_linux.sh | 2 +- .../port/python/tools/release_macos.sh | 3 +- .../port/python/tools/release_windows.bat | 2 +- .../port/python/ydf/cc/BUILD | 4 + .../port/python/ydf/model/generic_model.py | 6 -- .../port/python/ydf/version.py | 2 +- .../utils/compatibility.h | 1 + 13 files changed, 63 insertions(+), 76 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7fc4feb9..376703a8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ Note: This is the changelog of the C++ library. The Python port has a separate Changelog under `yggdrasil_decision_forests/port/python/CHANGELOG.md`. -## Head +## 1.10.0 - 2024-08-21 ### Features @@ -11,6 +11,7 @@ Changelog under `yggdrasil_decision_forests/port/python/CHANGELOG.md`. - The default value of `num_candidate_attributes` in the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART. +- Added support for GCS for file I/O. ## 1.9.0 - 2024-03-12 diff --git a/documentation/public/docs/hyperparameters.md b/documentation/public/docs/hyperparameters.md index 37883238..8c0075ed 100644 --- a/documentation/public/docs/hyperparameters.md +++ b/documentation/public/docs/hyperparameters.md @@ -24,11 +24,6 @@ learner: "RANDOM_FOREST" num_trees: 1000 } ``` - -## Table of content - -[TOC] - ## GRADIENT_BOOSTED_TREES A [Gradient Boosted Trees](https://statweb.stanford.edu/~jhf/ftp/trebst.pdf) @@ -427,14 +422,15 @@ reasonable time. - Coefficient applied to each tree prediction. A small value (0.02) tends to give more accurate results (assuming enough trees are trained), but results - in larger models. Analogous to neural network learning rate. + in larger models. Analogous to neural network learning rate. Fixed to 1.0 + for DART models. #### [sorting_strategy](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Categorical **Default:** PRESORT **Possible values:** IN_NODE, - PRESORT + PRESORT, FORCE_PRESORT, AUTO -- How are sorted the numerical features in order to find the splits
- PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
. +- How are sorted the numerical features in order to find the splits
- AUTO: Selects the most efficient method among IN_NODE, FORCE_PRESORT, and LAYER.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
- FORCE_PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- PRESORT: Automatically choose between FORCE_PRESORT and IN_NODE.
. #### [sparse_oblique_max_num_projections](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -473,7 +469,7 @@ reasonable time. - **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:** AXIS_ALIGNED, SPARSE_OBLIQUE, MHLD_OBLIQUE -- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 +- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits on a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 #### [subsample](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto) @@ -536,8 +532,8 @@ reasonable time. ## RANDOM_FOREST -A Random Forest (https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf) is -a collection of deep CART decision trees trained independently and without +A [Random Forest](https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf) +is a collection of deep CART decision trees trained independently and without pruning. Each tree is trained on a random subset of the original training dataset (sampled with replacement). @@ -853,9 +849,9 @@ reasonable time. #### [sorting_strategy](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Categorical **Default:** PRESORT **Possible values:** IN_NODE, - PRESORT + PRESORT, FORCE_PRESORT, AUTO -- How are sorted the numerical features in order to find the splits
- PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
. +- How are sorted the numerical features in order to find the splits
- AUTO: Selects the most efficient method among IN_NODE, FORCE_PRESORT, and LAYER.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
- FORCE_PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- PRESORT: Automatically choose between FORCE_PRESORT and IN_NODE.
. #### [sparse_oblique_max_num_projections](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -894,7 +890,7 @@ reasonable time. - **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:** AXIS_ALIGNED, SPARSE_OBLIQUE, MHLD_OBLIQUE -- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 +- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits on a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 #### [uplift_min_examples_in_treatment](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -1134,9 +1130,9 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. #### [sorting_strategy](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Categorical **Default:** IN_NODE **Possible values:** IN_NODE, - PRESORT + PRESORT, FORCE_PRESORT, AUTO -- How are sorted the numerical features in order to find the splits
- PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
. +- How are sorted the numerical features in order to find the splits
- AUTO: Selects the most efficient method among IN_NODE, FORCE_PRESORT, and LAYER.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
- FORCE_PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- PRESORT: Automatically choose between FORCE_PRESORT and IN_NODE.
. #### [sparse_oblique_max_num_projections](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -1175,7 +1171,7 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. - **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:** AXIS_ALIGNED, SPARSE_OBLIQUE, MHLD_OBLIQUE -- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 +- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits on a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 #### [uplift_min_examples_in_treatment](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -1325,7 +1321,8 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. - Coefficient applied to each tree prediction. A small value (0.02) tends to give more accurate results (assuming enough trees are trained), but results - in larger models. Analogous to neural network learning rate. + in larger models. Analogous to neural network learning rate. Fixed to 1.0 + for DART models. #### [use_hessian_gain](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.proto) @@ -1343,8 +1340,8 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. ## ISOLATION_FOREST -An Isolation Forest (https://ieeexplore.ieee.org/abstract/document/4781136) is a -collection of decision trees trained without labels and independently to +An [Isolation Forest](https://ieeexplore.ieee.org/abstract/document/4781136) is +a collection of decision trees trained without labels and independently to partition the feature space. The Isolation Forest prediction is an anomaly score that indicates whether an example originates from a same distribution to the training examples. We refer to Isolation Forest as both the original algorithm @@ -1455,11 +1452,12 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. #### [max_depth](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) -- **Type:** Integer **Default:** 16 **Possible values:** min:-1 +- **Type:** Integer **Default:** -2 **Possible values:** min:-2 - Maximum depth of the tree. `max_depth=1` means that all trees will be roots. - `max_depth=-1` means that tree depth is not restricted by this parameter. - Values <= -2 will be ignored. + `max_depth=-1` means that tree depth unconstrained by this parameter. + `max_depth=-2` means that the maximum depth is log2(number of sampled + examples per tree) (default). #### [max_num_nodes](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -1496,15 +1494,6 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. numerical features, the value is capped automatically. The value 1 is allowed but results in ordinary (non-oblique) splits. -#### [mhld_oblique_sample_attributes](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - -- **Type:** Categorical **Default:** false **Possible values:** true, false - -- For MHLD oblique splits i.e. `split_axis=MHLD_OBLIQUE`. If true, applies the - attribute sampling controlled by the "num_candidate_attributes" or - "num_candidate_attributes_ratio" parameters. If false, all the attributes - are tested. - #### [min_examples](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Integer **Default:** 5 **Possible values:** min:1 @@ -1566,16 +1555,10 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. #### [sorting_strategy](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) -- **Type:** Categorical **Default:** PRESORT **Possible values:** IN_NODE, - PRESORT - -- How are sorted the numerical features in order to find the splits
- PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
. - -#### [sparse_oblique_max_num_projections](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - -- **Type:** Integer **Default:** 6000 **Possible values:** min:1 +- **Type:** Categorical **Default:** AUTO **Possible values:** IN_NODE, + PRESORT, FORCE_PRESORT, AUTO -- For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Maximum number of projections (applied after the num_projections_exponent).
Oblique splits try out max(p^num_projections_exponent, max_num_projections) random projections for choosing a split, where p is the number of numerical features. Increasing "max_num_projections" increases the training time but not the inference time. In late stage model development, if every bit of accuracy if important, increase this value.
The paper "Sparse Projection Oblique Random Forests" (Tomita et al, 2020) does not define this hyperparameter. +- How are sorted the numerical features in order to find the splits
- AUTO: Selects the most efficient method among IN_NODE, FORCE_PRESORT, and LAYER.
- IN_NODE: The features are sorted just before being used in the node. This solution is slow but consumes little amount of memory.
- FORCE_PRESORT: The features are pre-sorted at the start of the training. This solution is faster but consumes much more memory than IN_NODE.
- PRESORT: Automatically choose between FORCE_PRESORT and IN_NODE.
. #### [sparse_oblique_normalization](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) @@ -1584,12 +1567,6 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. - For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Normalization applied on the features, before applying the sparse oblique projections.
- `NONE`: No normalization.
- `STANDARD_DEVIATION`: Normalize the feature by the estimated standard deviation on the entire train dataset. Also known as Z-Score normalization.
- `MIN_MAX`: Normalize the feature by the range (i.e. max-min) estimated on the entire train dataset. -#### [sparse_oblique_num_projections_exponent](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - -- **Type:** Real **Default:** 2 **Possible values:** min:0 - -- For sparse oblique splits i.e. `split_axis=SPARSE_OBLIQUE`. Controls of the number of random projections to test at each node.
Increasing this value very likely improves the quality of the model, drastically increases the training time, and doe not impact the inference time.
Oblique splits try out max(p^num_projections_exponent, max_num_projections) random projections for choosing a split, where p is the number of numerical features. Therefore, increasing this `num_projections_exponent` and possibly `max_num_projections` may improve model quality, but will also significantly increase training time.
Note that the complexity of (classic) Random Forests is roughly proportional to `num_projections_exponent=0.5`, since it considers sqrt(num_features) for a split. The complexity of (classic) GBDT is roughly proportional to `num_projections_exponent=1`, since it considers all features for a split.
The paper "Sparse Projection Oblique Random Forests" (Tomita et al, 2020) recommends values in [1/4, 2]. - #### [sparse_oblique_projection_density_factor](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Real **Default:** 2 **Possible values:** min:0 @@ -1606,27 +1583,28 @@ The hyper-parameter protobuffers are used with the C++ and CLI APIs. #### [split_axis](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) - **Type:** Categorical **Default:** AXIS_ALIGNED **Possible values:** - AXIS_ALIGNED, SPARSE_OBLIQUE, MHLD_OBLIQUE + AXIS_ALIGNED, SPARSE_OBLIQUE -- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits one a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020.
- `MHLD_OBLIQUE`: Multi-class Hellinger Linear Discriminant splits from "Classification Based on Multivariate Contrast Patterns", Canete-Sifuentes et al., 2029 +- What structure of split to consider for numerical features.
- `AXIS_ALIGNED`: Axis aligned splits (i.e. one condition at a time). This is the "classical" way to train a tree. Default value.
- `SPARSE_OBLIQUE`: Sparse oblique splits (i.e. random splits on a small number of features) from "Sparse Projection Oblique Random Forests", Tomita et al., 2020. This includes the splits described in "Extended Isolation Forests" (Sahand Hariri et al., 2018). #### [subsample_count](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/isolation_forest/isolation_forest.proto) -- **Type:** Integer **Default:** 300 **Possible values:** min:0 +- **Type:** Integer **Default:** 256 **Possible values:** min:0 - Number of examples used to grow each tree. Only one of "subsample_ratio" and - "subsample_count" can be set. If neither is set, "subsample_count" is - assumed to be equal to 256. This is the default value recommended in the - isolation forest paper. + "subsample_count" can be set. By default, sample 256 examples per tree. Note + that this parameter also restricts the tree's maximum depth to log2(examples + used per tree) unless max_depth is set explicitly. #### [subsample_ratio](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/isolation_forest/isolation_forest.proto) -- **Type:** Integer **Default:** 300 **Possible values:** min:0 +- **Type:** Real **Default:** 1 **Possible values:** min:0 - Ratio of number of training examples used to grow each tree. Only one of - "subsample_ratio" and "subsample_count" can be set. If neither is set, - "subsample_count" is assumed to be equal to 256. This is the default value - recommended in the isolation forest paper. + "subsample_ratio" and "subsample_count" can be set. By default, sample 256 + examples per tree. Note that this parameter also restricts the tree's + maximum depth to log2(examples used per tree) unless max_depth is set + explicitly. #### [uplift_min_examples_in_treatment](https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto) diff --git a/yggdrasil_decision_forests/port/python/CHANGELOG.md b/yggdrasil_decision_forests/port/python/CHANGELOG.md index e015d990..bedb1c8a 100644 --- a/yggdrasil_decision_forests/port/python/CHANGELOG.md +++ b/yggdrasil_decision_forests/port/python/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## Head +## 0.7.0 - 2024-08-21 ### Feature @@ -13,12 +13,13 @@ - Models can be pickled safely. - Native support for Xarray as a dataset format for all operations (e.g., training, evaluation, predictions). -- The output of `model.to_jax_function` can then be converted to a TensorFlow - Lite model. +- The output of `model.to_jax_function` can be converted to a TensorFlow Lite + model. - Change the default number of examples to scan when training on files to determine the semantic and dictionaries of columns from 10k to 100k. - Various improvements of error messages. - Evaluation for Anomaly Detection models. +- Oblique splits for Anomaly Detection models. ### Fix @@ -31,6 +32,13 @@ multidimensional categorical integers. - Fix error when defining categorical sets for non-ragged multidimensional inputs. +- MacOS: Fix compatibility with other protobuf-using libraries such as + Tensorflow. + +#### Release music + +Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen", +Op. 129. Ludwig van Beethoven ## 0.6.0 - 2024-07-04 diff --git a/yggdrasil_decision_forests/port/python/config/setup.py b/yggdrasil_decision_forests/port/python/config/setup.py index 36e4fb19..29d657d7 100644 --- a/yggdrasil_decision_forests/port/python/config/setup.py +++ b/yggdrasil_decision_forests/port/python/config/setup.py @@ -22,13 +22,13 @@ from setuptools.command.install import install from setuptools.dist import Distribution -_VERSION = "0.6.0" +_VERSION = "0.7.0" with open("README.md", "r", encoding="utf-8") as fh: long_description = fh.read() REQUIRED_PACKAGES = [ - "numpy<2.0.0", + "numpy", "absl_py", "protobuf>=3.14", ] diff --git a/yggdrasil_decision_forests/port/python/dev_requirements.txt b/yggdrasil_decision_forests/port/python/dev_requirements.txt index d644b780..d2fbf648 100644 --- a/yggdrasil_decision_forests/port/python/dev_requirements.txt +++ b/yggdrasil_decision_forests/port/python/dev_requirements.txt @@ -4,7 +4,7 @@ pydantic requests fastapi[standard]>=0.112.0,<0.113.0 tensorflow_decision_forests; platform_machine != 'aarch64' and python_version >= '3.9' and python_version < '3.12' -tensorflow; platform_machine != 'aarch64' +tensorflow; platform_machine != 'aarch64' and python_version >= '3.9' and python_version < '3.12' portpicker matplotlib scikit-learn diff --git a/yggdrasil_decision_forests/port/python/oss_third_party/pybind11_protobuf/workspace.bzl b/yggdrasil_decision_forests/port/python/oss_third_party/pybind11_protobuf/workspace.bzl index 4a0f35a1..99280dea 100644 --- a/yggdrasil_decision_forests/port/python/oss_third_party/pybind11_protobuf/workspace.bzl +++ b/yggdrasil_decision_forests/port/python/oss_third_party/pybind11_protobuf/workspace.bzl @@ -3,8 +3,8 @@ load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def deps(): - PYBIND_PROTOBUF_COMMIT_HASH = "3d7834b607758bbd2e3d210c6c478453922f20c0" - PYBIND_PROTOBUF_SHA = "89ba0a6eb92a834dc08dc199da5b94b4648168c56d5409116f9b7699e5350f11" + PYBIND_PROTOBUF_COMMIT_HASH = "f1b245929759230f31cdd1e5f9e0e69f817fed95" + PYBIND_PROTOBUF_SHA = "7eeabdaa39d5b1f48f1feb0894d6b7f02f77964e2a6bc1eaa4a90fe243e0a34c" http_archive( name = "com_google_pybind11_protobuf", strip_prefix = "pybind11_protobuf-{commit}".format(commit = PYBIND_PROTOBUF_COMMIT_HASH), diff --git a/yggdrasil_decision_forests/port/python/tools/build_test_linux.sh b/yggdrasil_decision_forests/port/python/tools/build_test_linux.sh index 557fdb96..695f5cc9 100755 --- a/yggdrasil_decision_forests/port/python/tools/build_test_linux.sh +++ b/yggdrasil_decision_forests/port/python/tools/build_test_linux.sh @@ -33,7 +33,7 @@ build_and_maybe_test () { echo " Compiler : $CC" bazel version - local ARCHITECTURE=$(uname --m) + local ARCHITECTURE=$(uname -m) local flags="--config=linux_cpp17 --features=-fully_static_link" if [ "$ARCHITECTURE" == "x86_64" ]; then diff --git a/yggdrasil_decision_forests/port/python/tools/release_macos.sh b/yggdrasil_decision_forests/port/python/tools/release_macos.sh index 5d6333bf..9ee5d655 100755 --- a/yggdrasil_decision_forests/port/python/tools/release_macos.sh +++ b/yggdrasil_decision_forests/port/python/tools/release_macos.sh @@ -14,6 +14,7 @@ # limitations under the License. +# Running this script inside a python venv may not work. set -vex declare -a python_versions=("3.8" "3.9" "3.10" "3.11" "3.12") @@ -27,7 +28,7 @@ do source ${TMPDIR}venv/bin/activate pip install --upgrade pip - echo "Building with $(python3 -V 2>&1)" + echo "Building with $(python -V 2>&1)" bazel clean --expunge RUN_TESTS=0 CC="clang" ./tools/build_test_linux.sh diff --git a/yggdrasil_decision_forests/port/python/tools/release_windows.bat b/yggdrasil_decision_forests/port/python/tools/release_windows.bat index 19c7c035..814613c4 100644 --- a/yggdrasil_decision_forests/port/python/tools/release_windows.bat +++ b/yggdrasil_decision_forests/port/python/tools/release_windows.bat @@ -34,7 +34,7 @@ cls setlocal -set YDF_VERSION=0.5.0 +set YDF_VERSION=0.7.0 set BAZEL=bazel.exe set BAZEL_SH=C:\msys64\usr\bin\bash.exe set BAZEL_FLAGS=--config=windows_cpp20 --config=windows_avx2 diff --git a/yggdrasil_decision_forests/port/python/ydf/cc/BUILD b/yggdrasil_decision_forests/port/python/ydf/cc/BUILD index 40ea0854..3ce2aacc 100644 --- a/yggdrasil_decision_forests/port/python/ydf/cc/BUILD +++ b/yggdrasil_decision_forests/port/python/ydf/cc/BUILD @@ -11,6 +11,10 @@ package( pybind_extension( name = "ydf", srcs = ["ydf.cc"], + linkopts = select({ + "@bazel_tools//src/conditions:darwin": ["-Wl,-exported_symbol,_PyInit_ydf"], + "//conditions:default": [], + }), deps = [ "//ydf/dataset:dataset_cc", "//ydf/learner:learner_cc", diff --git a/yggdrasil_decision_forests/port/python/ydf/model/generic_model.py b/yggdrasil_decision_forests/port/python/ydf/model/generic_model.py index 409c2376..558b6e2b 100644 --- a/yggdrasil_decision_forests/port/python/ydf/model/generic_model.py +++ b/yggdrasil_decision_forests/port/python/ydf/model/generic_model.py @@ -897,12 +897,6 @@ def pre_processing(raw_features): force: Try to export even in currently unsupported environments. WARNING: Setting this to true may crash the Python runtime. """ - if platform.system() == "Darwin" and not force: - raise ValueError( - "Exporting to TensorFlow is currently broken on MacOS and may crash" - " the current Python process. To proceed anyway, add parameter" - " `force=True`." - ) if mode == "keras": log.warning( diff --git a/yggdrasil_decision_forests/port/python/ydf/version.py b/yggdrasil_decision_forests/port/python/ydf/version.py index abef65f6..5bd6b189 100644 --- a/yggdrasil_decision_forests/port/python/ydf/version.py +++ b/yggdrasil_decision_forests/port/python/ydf/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -version = "0.6.0" +version = "0.7.0" diff --git a/yggdrasil_decision_forests/utils/compatibility.h b/yggdrasil_decision_forests/utils/compatibility.h index fac95744..9ec2e11a 100644 --- a/yggdrasil_decision_forests/utils/compatibility.h +++ b/yggdrasil_decision_forests/utils/compatibility.h @@ -24,6 +24,7 @@ #include +#include #include #include "absl/types/optional.h"