English | 中文

CRNN

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

1. Introduction

Convolutional Recurrent Neural Network (CRNN) integrates CNN feature extraction and RNN sequence modeling as well as transcription into a unified framework.

As shown in the architecture graph (Figure 1), CRNN firstly extracts a feature sequence from the input image via Convolutional Layers. After that, the image is represented by a squence extracted features, where each vector is associated with a receptive field on the input image. For futher process the feature, CRNN adopts Recurrent Layers to predict a label distribution for each frame. To map the distribution to text field, CRNN adds a Transcription Layer to translate the per-frame predictions into the final label sequence. [1]

Figure 1. Architecture of CRNN [1]

2. Results

Training Perf.

According to our experiments, the training (following the steps in Model Training) performance and evaluation (following the steps in Model Evaluation) accuracy are as follows:

Performance tested on ascend 910 with graph mode

Model	Device Card	Backbone	Train Dataset	Model Params	Batch size per card	Graph train 8P (s/epoch)	Graph train 8P (ms/step)	Graph train 8P (FPS)	Avg Eval Accuracy	Recipe	Download
CRNN	8P	VGG7	MJ+ST	8.72 M	16	2488.82	22.06	5802.71	82.03%	yaml	ckpt \| mindir
CRNN	8P	ResNet34_vd	MJ+ST	24.48 M	64	2157.18	76.48	6694.84	84.45%	yaml	ckpt \| mindir

Detailed accuracy results for each benchmark dataset (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE):

Model	Backbone	IC03_860	IC03_867	IC13_857	IC13_1015	IC15_1811	IC15_2077	IIIT5k_3000	SVT	SVTP	CUTE80	Average
CRNN	VGG7	94.53%	94.00%	92.18%	90.74%	71.95%	66.06%	84.10%	83.93%	73.33%	69.44%	82.03%
CRNN	ResNet34_vd	94.42%	94.23%	93.35%	92.02%	75.92%	70.15%	87.73%	86.40%	76.28%	73.96%	84.45%

Performance tested on ascend 910* with graph mode

Model	Device Card	Backbone	Train Dataset	Model Params	Batch size per card	Graph train 8P (s/epoch)	Graph train 8P (ms/step)	Graph train 8P (FPS)	Avg Eval Accuracy	Recipe	Download
CRNN	8P	VGG7	MJ+ST	8.72 M	16	2488.82	14.76	8672.09	81.31%	yaml	ckpt

Inference Perf.

The inference performance is tested on Mindspore Lite, please take a look at Mindpore Lite Inference for more details.

Device	Env	Model	Backbone	Params	Test Dataset	Batch size	Graph infer 1P (FPS)
Ascend310P	Lite2.0	CRNN	ResNet34_vd	24.48 M	IC15	1	361.09
Ascend310P	Lite2.0	CRNN	ResNet34_vd	24.48 M	SVT	1	274.67

Notes:

To reproduce the result on other contexts, please ensure the global batch size is the same.
The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to 4. Character Dictionary.
The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to Dataset Download & Dataset Usage section.
The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100).

3. Quick Start

3.1 Preparation

3.1.1 Installation

Please refer to the installation instruction in MindOCR.

3.1.2 Dataset Download

Please download lmdb dataset for traininig and evaluation from here (ref: deep-text-recognition-benchmark). There're several zip files:

data_lmdb_release.zip contains the entire datasets including training data, validation data and evaluation data.
- training/ contains two datasets: MJSynth (MJ) and SynthText (ST)
- validation/ is the union of the training sets of IC13, IC15, IIIT, and SVT.
- evaluation/ contains several benchmarking datasets, which are IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.
validation.zip: same as the validation/ within data_lmdb_release.zip
evaluation.zip: same as the evaluation/ within data_lmdb_release.zip

Unzip the data_lmdb_release.zip, the data structure should be like

data_lmdb_release/
├── evaluation
│   ├── CUTE80
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC03_860
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC03_867
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC13_1015
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── ...
├── training
│   ├── MJ
│   │   ├── MJ_test
│   │   │   ├── data.mdb
│   │   │   └── lock.mdb
│   │   ├── MJ_train
│   │   │   ├── data.mdb
│   │   │   └── lock.mdb
│   │   └── MJ_valid
│   │       ├── data.mdb
│   │       └── lock.mdb
│   └── ST
│       ├── data.mdb
│       └── lock.mdb
└── validation
    ├── data.mdb
    └── lock.mdb

3.1.3 Dataset Usage

Here we used the datasets under training/ folders for training, and the union dataset validation/ for validation. After training, we used the datasets under evaluation/ to evluation model accuracy.

Training: (total 14,442,049 samples)

MJSynth (MJ)
- Train: 21.2 GB, 7224586 samples
- Valid: 2.36 GB, 802731 samples
- Test: 2.61 GB, 891924 samples
SynthText (ST)
- Train: 16.0 GB, 5522808 samples

Validation:

Valid: 138 MB, 6992 samples

Evaluation: (total 12,067 samples)

CUTE80: 8.8 MB, 288 samples
IC03_860: 36 MB, 860 samples
IC03_867: 4.9 MB, 867 samples
IC13_857: 72 MB, 857 samples
IC13_1015: 77 MB, 1015 samples
IC15_1811: 21 MB, 1811 samples
IC15_2077: 25 MB, 2077 samples
IIIT5k_3000: 50 MB, 3000 samples
SVT: 2.4 MB, 647 samples
SVTP: 1.8 MB, 645 samples

Data configuration for model training

To reproduce the training of model, it is recommended that you modify the configuration yaml as follows:

...
train:
  ...
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of training dataset
    data_dir: training/                                               # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
    # label_file:                                                     # Path of training label file, concatenated with `dataset_root` to be the complete path of training label file, not required when using LMDBDataset
...
eval:
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of validation dataset
    data_dir: validation/                                             # Dir of validation dataset, concatenated with `dataset_root` to be the complete dir of validation dataset
    # label_file:                                                     # Path of validation label file, concatenated with `dataset_root` to be the complete path of validation label file, not required when using LMDBDataset
  ...

Data configuration for model evaluation

We use the dataset under evaluation/ as the benchmark dataset. On each individual dataset (e.g. CUTE80, IC03_860, etc.), we perform a full evaluation by setting the dataset's directory to the evaluation dataset. This way, we get a list of the corresponding accuracies for each dataset, and then the reported accuracies are the average of these values.

To reproduce the reported evaluation results, you can:

Option 1: Repeat the evaluation step for all individual datasets: CUTE80, IC03_860, IC03_867, IC13_857, IC131015, IC15_1811, IC15_2077, IIIT5k_3000, SVT, SVTP. Then take the average score.
Option 2: Put all the benchmark datasets folder under the same directory, e.g. evaluation/. Modify the eval.dataset.data_dir in the config yaml accordingly. Then execute the script tools/benchmarking/multi_dataset_eval.py.

Evaluate on one specific dataset

For example, you can evaluate the model on dataset CUTE80 by modifying the config yaml as follows:

...
train:
  # NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of evaluation dataset
    data_dir: evaluation/CUTE80/                                      # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
    # label_file:                                                     # Path of evaluation label file, concatenated with `dataset_root` to be the complete path of evaluation label file, not required when using LMDBDataset
  ...

By running tools/eval.py as noted in section Model Evaluation with the above config yaml, you can get the accuracy performance on dataset CUTE80.

Evaluate on multiple datasets under the same folder

Assume you have put all benckmark datasets under evaluation/ as shown below:

data_lmdb_release/
├── evaluation
│   ├── CUTE80
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC03_860
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC03_867
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── IC13_1015
│   │   ├── data.mdb
│   │   └── lock.mdb
│   ├── ...

then you can evaluate on each dataset by modifying the config yaml as follows, and execute the script tools/benchmarking/multi_dataset_eval.py.

...
train:
  # NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of evaluation dataset
    data_dir: evaluation/                                   # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
    # label_file:                                                     # Path of evaluation label file, concatenated with `dataset_root` to be the complete path of evaluation label file, not required when using LMDBDataset
  ...

3.1.4 Check YAML Config Files

Apart from the dataset setting, please also check the following important args: system.distribute, system.val_while_train, common.batch_size, train.ckpt_save_dir, train.dataset.dataset_root, train.dataset.data_dir, train.dataset.label_file, eval.ckpt_load_path, eval.dataset.dataset_root, eval.dataset.data_dir, eval.dataset.label_file, eval.loader.batch_size. Explanations of these important args:

system:
  distribute: True                                                    # `True` for distributed training, `False` for standalone training
  amp_level: 'O3'
  seed: 42
  val_while_train: True                                               # Validate while training
  drop_overflow_update: False
common:
  ...
  batch_size: &batch_size 64                                          # Batch size for training
...
train:
  ckpt_save_dir: './tmp_rec'                                          # The training result (including checkpoints, per-epoch performance and curves) saving directory
  dataset_sink_mode: False
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of training dataset
    data_dir: training/                                               # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
    # label_file:                                                     # Path of training label file, concatenated with `dataset_root` to be the complete path of training label file, not required when using LMDBDataset
...
eval:
  ckpt_load_path: './tmp_rec/best.ckpt'                               # checkpoint file path
  dataset_sink_mode: False
  dataset:
    type: LMDBDataset
    dataset_root: dir/to/data_lmdb_release/                           # Root dir of validation/evaluation dataset
    data_dir: validation/                                             # Dir of validation/evaluation dataset, concatenated with `dataset_root` to be the complete dir of validation/evaluation dataset
    # label_file:                                                     # Path of validation/evaluation label file, concatenated with `dataset_root` to be the complete path of validation/evaluation label file, not required when using LMDBDataset
  ...
  loader:
      shuffle: False
      batch_size: 64                                                  # Batch size for validation/evaluation
...

Notes:

As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust batch_size accordingly to keep the global batch size unchanged for a different number of NPUs, or adjust the learning rate linearly to a new global batch size.

3.2 Model Training

Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please modify the configuration parameter system.distribute as True and run

# distributed training on multiple Ascend devices
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml

Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please modify the configuration parametersystem.distribute as False and run:

# standalone training on a CPU/Ascend device
python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml

The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg train.ckpt_save_dir. The default directory is ./tmp_rec.

3.3 Model Evaluation

To evaluate the accuracy of the trained model, you can use eval.py. Please set the checkpoint path to the arg eval.ckpt_load_path in the yaml config file, set the evaluation dataset path to the arg eval.dataset.data_dir, set system.distribute to be False, and then run:

python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml

Similarly, the accuracy of the trained model can be evaluated using multiple evaluation datasets by properly setting the args eval.ckpt_load_path, eval.dataset.data_dir, and system.distribute in the yaml config file. And then run:

python tools/benchmarking/multi_dataset_eval.py --config configs/rec/crnn/crnn_resnet34.yaml

4. Character Dictionary

Default Setting

To transform the groud-truth text into label ids, we have to provide the character dictionary where keys are characters and values are IDs. By default, the dictionary is "0123456789abcdefghijklmnopqrstuvwxyz", which means id=0 will correspond to the charater "0". In this case, the dictionary only considers numbers and lowercase English characters, excluding spaces.

Built-in Dictionaries

There are some built-in dictionaries, which are placed in mindocr/utils/dict/, and you can choose the appropriate dictionary to use.

en_dict.txt is an English dictionary containing 94 characters, including numbers, common symbols, and uppercase and lowercase English letters.
ch_dict.txt is a Chinese dictionary containing 6623 characters, including commonly used simplified and traditional Chinese, numbers, common symbols, uppercase and lowercase English letters.

Customized Dictionary

You can also customize a dictionary file (***.txt) and place it under mindocr/utils/dict/, the format of the dictionary file should be a .txt file with one character per line.

To use a specific dictionary, set the parameter common.character_dict_path to the path of the dictionary, and change the parameter common.num_classes to the corresponding number, which is the number of characters in the dictionary + 1.

Notes:

You can include the space character by setting the parameter common.use_space_char in configuration yaml to True.
Remember to check the value of dataset->transform_pipeline->RecCTCLabelEncode->lower in the configuration yaml. Set it to False if you prefer case-sensitive encoding.

5. Chinese Text Recognition Model Training

Currently, this model supports multilingual recognition and provides pre-trained models for different languages. Details are as follows:

Chinese Dataset Preparation and Configuration

We use a public Chinese text benchmark dataset Benchmarking-Chinese-Text-Recognition for CRNN training and evaluation.

For detailed instruction of data preparation and yaml configuration, please refer to ch_dataeset.

Training

To train with the prepared datsets and config file, please run:

mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/crnn/crnn_resnet34_ch.yaml

Results and Pretrained Weights

After training, evaluation results on the benchmark test set are as follows, where we also provide the model config and pretrained weights.

Model	Language	Context	Backbone	Scene	Web	Document	Train T.	FPS	Recipe	Download
CRNN	Chinese	D910x4-MS1.10-G	ResNet34_vd	60.45%	65.95%	97.68%	647 s/epoch	1180	crnn_resnet34_ch.yaml	ckpt \| mindir

Notes:

The input shape for exported MindIR file in the download link is (1, 3, 32, 320).

Training with Custom Datasets

You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial Training Recognition Network with Custom Datasets.

6. MindSpore Lite Inference

To inference with MindSpot Lite on Ascend 310, please refer to the tutorial MindOCR Inference. In short, the whole process consists of the following steps:

1. Model Export

Please download the exported MindIR file first, or refer to the Model Export tutorial and use the following command to export the trained ckpt model to MindIR file:

python tools/export.py --model_name_or_config crnn_resnet34 --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt
# or
python tools/export.py --model_name_or_config configs/rec/crnn/crnn_resnet34.yaml --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt

The data_shape is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in Notes under results table.

2. Environment Installation

Please refer to Environment Installation tutorial to configure the MindSpore Lite inference environment.

3. Model Conversion

Please refer to Model Conversion, and use the converter_lite tool for offline conversion of the MindIR file.

4. Inference

Assuming that you obtain output.mindir after model conversion, go to the deploy/py_infer directory, and use the following command for inference:

python infer.py \
    --input_images_dir=/your_path_to/test_images \
    --rec_model_path=your_path_to/output.mindir \
    --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
    --res_save_dir=results_dir

References

[1] Baoguang Shi, Xiang Bai, Cong Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. arXiv preprint arXiv:1507.05717, 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CRNN

1. Introduction

2. Results

Training Perf.

Inference Perf.

3. Quick Start

3.1 Preparation

3.1.1 Installation

3.1.2 Dataset Download

3.1.3 Dataset Usage

3.1.4 Check YAML Config Files

3.2 Model Training

3.3 Model Evaluation

4. Character Dictionary

Default Setting

Built-in Dictionaries

Customized Dictionary

5. Chinese Text Recognition Model Training

Chinese Dataset Preparation and Configuration

Training

Results and Pretrained Weights

Training with Custom Datasets

6. MindSpore Lite Inference

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

CRNN

1. Introduction

2. Results

Training Perf.

Inference Perf.

3. Quick Start

3.1 Preparation

3.1.1 Installation

3.1.2 Dataset Download

3.1.3 Dataset Usage

3.1.4 Check YAML Config Files

3.2 Model Training

3.3 Model Evaluation

4. Character Dictionary

Default Setting

Built-in Dictionaries

Customized Dictionary

5. Chinese Text Recognition Model Training

Chinese Dataset Preparation and Configuration

Training

Results and Pretrained Weights

Training with Custom Datasets

6. MindSpore Lite Inference

References