This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #570 from rsepassi/push
v1.4.4
- Loading branch information
Showing
43 changed files
with
1,854 additions
and
365 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Running on Cloud ML Engine | ||
|
||
Google Cloud Platform offers a managed training environment for TensorFlow | ||
models called [Cloud ML Engine](https://cloud.google.com/ml-engine/) and | ||
you can easily launch Tensor2Tensor on it, including for hyperparameter tuning. | ||
|
||
# Launch | ||
|
||
It's the same `t2t-trainer` you know and love with the addition of the | ||
`--cloud_mlengine` flag, which by default will launch on a 1-GPU machine. | ||
|
||
``` | ||
# Note that both the data dir and output dir have to be on GCS | ||
DATA_DIR=gs://my-bucket/data | ||
OUTPUT_DIR=gs://my-bucket/train | ||
t2t-trainer \ | ||
--problems=translate_ende_wmt32k \ | ||
--model=transformer \ | ||
--hparams_set=transformer_base \ | ||
--data_dir=$DATA_DIR \ | ||
--output_dir=$OUTPUT_DIR \ | ||
--cloud_mlengine | ||
``` | ||
|
||
By passing `--worker_gpu=4` or `--worker_gpu=8` it will automatically launch on | ||
machines with 4 or 8 GPUs. | ||
|
||
You can additionally pass the `--cloud_mlengine_master_type` to select another | ||
kind of machine (see the [docs for | ||
`masterType`](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#traininginput) | ||
for your options). If you provide this flag yourself, make sure you pass the | ||
correct value for `--worker_gpu`. | ||
|
||
**Note**: `t2t-trainer` only currently supports launching with single machines, | ||
possibly with multiple GPUs. Multi-machine setups are not yet supported out of | ||
the box with the `--cloud_mlengine` flag, though multi-machine should in | ||
principle work just fine. Contributions/testers welcome. | ||
|
||
## `--t2t_usr_dir` | ||
|
||
Launching on Cloud ML Engine works with `--t2t_usr_dir` as well as long as the | ||
directory is fully self-contained (i.e. the imports only refer to other modules | ||
in the directory). If there are additional PyPI dependencies that you need, you | ||
can include a `setup.py` file in your directory (ensure that it uses | ||
`setuptools.find_packages`). | ||
|
||
# Hyperparameter Tuning | ||
|
||
Hyperparameter tuning with `t2t-trainer` and Cloud ML Engine is also a breeze | ||
with `--hparams_range` and the `--autotune_*` flags: | ||
|
||
``` | ||
t2t-trainer \ | ||
--problems=translate_ende_wmt32k \ | ||
--model=transformer \ | ||
--hparams_set=transformer_base \ | ||
--data_dir=$DATA_DIR \ | ||
--output_dir=$OUTPUT_DIR \ | ||
--cloud_mlengine \ | ||
--hparams_range=transformer_base_range \ | ||
--autotune_objective='metrics-translate_ende_wmt32k/neg_log_perplexity' \ | ||
--autotune_maximize \ | ||
--autotune_max_trials=100 \ | ||
--autotune_parallel_trials=3 | ||
``` | ||
|
||
The `--hparams_range` specifies the search space and should be registered with | ||
`@register_ranged_hparams`. It defines a `RangedHParams` object that sets | ||
search ranges and scales for various parameters. See `transformer_base_range` | ||
in | ||
[`transformer.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py) | ||
for an example. | ||
|
||
The metric name passed as `--autotune_objective` should be exactly what you'd | ||
see in TensorBoard. To minimize a metric, set `--autotune_maximize=False`. | ||
|
||
You control how many total trials to run with `--autotune_max_trials` and the | ||
number of jobs to launch in parallel with `--autotune_parallel_trials`. | ||
|
||
Happy tuning! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
|
||
setup( | ||
name='tensor2tensor', | ||
version='1.4.3', | ||
version='1.4.4', | ||
description='Tensor2Tensor', | ||
author='Google Inc.', | ||
author_email='[email protected]', | ||
|
@@ -35,9 +35,9 @@ | |
'flask', | ||
'future', | ||
'gevent', | ||
'google-api-python-client', | ||
'gunicorn', | ||
'gym<=0.9.5', # gym in version 0.9.6 has some temporary issues. | ||
'munch', | ||
'numpy', | ||
'requests', | ||
'scipy', | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# coding=utf-8 | ||
# Copyright 2017 The Tensor2Tensor Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Tests for t2t_trainer.""" | ||
|
||
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
# Dependency imports | ||
|
||
from tensor2tensor.bin import t2t_trainer | ||
from tensor2tensor.utils import trainer_lib_test | ||
|
||
import tensorflow as tf | ||
|
||
FLAGS = tf.flags.FLAGS | ||
|
||
|
||
class TrainerTest(tf.test.TestCase): | ||
|
||
@classmethod | ||
def setUpClass(cls): | ||
trainer_lib_test.TrainerLibTest.setUpClass() | ||
|
||
def testTrain(self): | ||
FLAGS.problems = "tiny_algo" | ||
FLAGS.model = "transformer" | ||
FLAGS.hparams_set = "transformer_tiny" | ||
FLAGS.train_steps = 1 | ||
FLAGS.eval_steps = 1 | ||
FLAGS.output_dir = tf.test.get_temp_dir() | ||
FLAGS.data_dir = tf.test.get_temp_dir() | ||
t2t_trainer.main(None) | ||
|
||
|
||
if __name__ == "__main__": | ||
tf.test.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.