OCR examples with Tesseract
Application that uses Tesseract and Tess4J to provide REST API for testing various options. Additionally, some snippets (very simple examples) in tests.
- OCR image by providing absolute path to file
- OCR image by sending file
- Selecting Tesseract engine mode and page segmentation mode
- Return result in text or HOcr
- Specifying languages (missing dictionaries will be automatically downloaded)
- Saving a file after OCR (text file, PDF with text layer)
Look at Swagger for details: http://localhost:8080/swagger-ui/
You have to have running application locally - see below.
- The simplest usage of Tesseract
- Generating HOcr
- OCR from PDF file using PDFBox
Look at test folder for details: pl.marcinkowalczyk.ocr.examples.tesseract.
Prerequisites: installed JDK 11 (you can use AdoptOpenJDK).
- Build application:
- Linux:
./mvnw clean package
- Windows:
mvnw.cmd clean package
- Linux:
- Run application:
java -jar target/ocr-examples-0.0.1-SNAPSHOT.jar
- Open browser with URL: http://localhost:8080/
Send HTTP request. Provide an absolute path to an image file for OCR as a parameter.
http://localhost:8080/api/tess/path?absolute=<absolute_path_to_image_file>
Example:
http://localhost:8080/api/tess/path?absolute=C:/dev/ocr/ocr-examples/src/test/resources/test_image.png
For more endpoints and parameters explore Swagger: http://localhost:8080/swagger-ui/