TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohui Zhao ArXiv 2019
CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor
This paper proposes a learning-based key information extraction method with limited requirement of human resources. It combines the information from both semantic meaning and spatial distribution of texts in documents. Their proposed model, applies convolutional neural networks on gridded texts where texts are embedded as features with semantical connotations.
The proposed model, tackles the key information extraction problem by
- First creating gridded texts with the proposed grid positional mapping method. To generate the grid data for the convolutional neural network, the scanned document image are processed by an OCR engine to acquire the texts and their absolute/relative positions. The texts are mapped from the original scanned document image to the target grid, such that the mapped grid preserves the original spatial relationship among texts yet more suitable to be used as the input for the convolutional neural network.
- Then the CUTIE model is applied on the gridded texts. The rich semantic information is encoded from the gridded texts at the very beginning stage of the convolutional neural network with a word embedding layer.
Source: Nanonets
pip install -r requirements.txt
- Run
clovaai_api.py
for ocr on Train image dataset. - Using
textbox_generation.py
convert ocr json file to model compatible dataset. - Add remaining invoices field using
add_remianing.py
. - Open
dataset_creater.html
in browser to annotate the invoice fields. - Creat new vocab for your dataset using
create_vocab.py
. - Generate your own dictionary with main_build_dict.py / main_data_tokenizer.py
- Train your model with main_train_json.py
CUTIE achieves best performance with rows/cols well configured. For more insights, refer to statistics in the file (others/TrainingStatistic.xlsx).
Result evaluated on 4,484 receipt documents, including taxi receipts, meals entertainment receipts, and hotel receipts, with 9 different key information classes. (AP / softAP)
Method | #Params | Taxi | Hotel |
---|---|---|---|
CloudScan | - | 82.0 / - | 60.0 / - |
BERT | 110M | 88.1 / - | 71.7 / - |
CUTIE | 14M | 94.0 / 97.3 | 74.6 / 87.0 |