Skip to content

Latest commit

 

History

History
86 lines (68 loc) · 4.45 KB

README.md

File metadata and controls

86 lines (68 loc) · 4.45 KB

text-localization-agent

The code to train the agent.

Prerequisites

You need Python 3 (preferably 3.6) installed, as well as the requirements from requirements.txt:

$ pip install -r requirements.txt 

Furthermore, you need to install the text-localization-environment by following its Installation instructions.

Usage

Training an agent requires two files:

  1. A textfile where each line contains the path to one image in the training dataset
  2. A numpy file (.npy) that contains the bounding boxes associated with each image. For n images this file contains a list with n entries where each entry is a list of bounding boxes in the format ((xtopleft, ytopleft), (xbottomright, ybottomright))

Datasets generated by the dataset generator fullfill these requirements. With these two files you can start training by starting the train_agent.py script. Here is an overview of the available options:

You need to specify a config file similar to the example.ini the following way: python3 train_agent.py --config config.ini

Overview of executable python scripts:

File Purpose
iou.py Measures the IoUs of an agent for 100 epsiodes
train_agent.py Train the agent
visualize_agent_graph.py Creates a .dot file of the computational graph of the agent
visualize_agent.py Creates an image sequnce of maximum 15 steps of an episode

TensorBoard

If you would like the program to generate log-files appropriate for visualization in TensorBoard, you need to:

  • Install tensorflow
    $ pip install tensorflow
    (If you use Python 3.7 and the installation fails, use: pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.12.0-py3-none-any.whl instead. See here, why.)
  • Run the text-localization-agent program with the --tensorboard flag
    $ python train-agent.py --tensorboard --imagefile … --boxfile …
  • Start TensorBoard pointing to the tensorboard/ directory inside the text-localization-agent project
    $ tensorboard --logdir=<path to text-localization-agent>/tensorboard/
    …
    TensorBoard 1.12.0 at <link to TensorBoard UI> (Press CTRL+C to quit)
  • Open the TensorBoard UI via the link that is provided when the tensorboard program is started (usually: http://localhost:6006)

Training on the chair's servers

To run the training on one of the chair's servers you need to:

  • Clone the necessary repositories
  • Create a new virtual environment. Note that the Python version needs to be at least 3.6 for everything to run. The default might be a lower version so if that is the case you must make sure that the correct version is used. You can pass the correct python version to virtualenv via the -p parameter, for example
    $ virtualenv -p python3.6 <envname>
    (If there is no Python 3.6/3.7 installed you are out of luck because we don't have sudo access)
  • Activate the environment via
    $ source <envname>/bin/activate
  • Install the required packages (see section "Prerequisites"). Don't forget cupy, tb_chainer and tensorflow!
  • Prepare the training data (either generate it using the dataset-generator or transfer existing data on the server)
  • To avoid stopping the training after disconnecting from the server, you might want to use a terminal-multiplexer such as tmux or screen
  • Set the CUDA_PATH and LD_LIBRARY_PATH variables if they are not already set. The command should be something like
    $ export CUDA_PATH=/usr/local/cuda
    $ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
  • To download the ResNet-50 caffemodel (it isn't downloaded automatically) see link and save it where necessary (an error will tell you where if you try to create a TextLocEnv).
  • Start training!

These instructions are for starting from scratch, for example if there is already a suitable virtual environment you obviously don't need to create a new one.