-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to do incremental-training on tesseract-ocr? #391
Comments
Are you using very old instructions (old Tesseract release, old repository URL, ...)? |
@stweil Thank You for your response. Yes I'm using tesseract-4.1.1, Old Repository. First time training is working fine with |
@stweil I just want to know, how can I do incremental-training on my existing trained model? What steps I should follow? |
What about reading Tesseract documentation and Readme of this repository? |
@zaryabRiasat, the first step is using a recent software release instead of an old one and also reading the current documentation. |
I'm working with
tesseract-4.1.1
and trying to do training(fine-tuning)
for this I have followed steps:Downloaded
eng.traineddata
fromtessdata_best
and pasted it into/usr/share/tesseract-ocr/4.00/tessdata
.Then I've created image-crops using
craft-text-detector
in python and made ground-truths(.gt.txt)
for each image crop.Then cloned git clone
https://github.com/tesseract-ocr/ocrd-train.git
and then cdocrd-train
.Inside
ocrd-train/data
folder, I've createdmy-model-ground-truth
folder and pasted.png
and.gt.txt
files in it.Then I ran command
make tesseract-langdata
on terminal.At last I ran command
make training MODEL_NAME=my-model MAX_ITERATIONS=20000 PSM=7 FINETUNE_TYPE=Impact DEBUG_INTERVAL=-1 START_MODEL=eng TESSDATA=/usr/share/tesseract-ocr/4.00/tessdata/
Above procedure took some time, and I got
my-model.traineddata
file inocrd-train/data/
. I've pasted that file in/usr/share/tesseract-ocr/4.00/tessdata
and it is giving results better thaneng.traineddata
.For above training I used 20 images, now I want to do incremental-training. I want to train 30 more images on previously trained
my-model.traineddata
. Here I'm confused because after completion of previous training there are some folder inocrd-train/data/
:my-model (folder)
my-model-ground-truth (folder)
eng (folder)
langdata (folder)
my-model.traineddata (file)
Now what should I do for incremental-training?
Do I only need to remove files in my-model-ground-truth and paste new
.png
and.gt.txt
files of 30 images, and usemy-model
asSTART_MODEL
?Or I need to remove other folders as well?
The text was updated successfully, but these errors were encountered: