OCR Processing with Tesseract and Magick in R

This project demonstrates the use of Tesseract and Magick libraries in R for Optical Character Recognition (OCR) and image preprocessing tasks. It supports text extraction from images and PDFs, language updates, and custom OCR configurations.

About

This project leverages the tesseract and magick R libraries to:

Perform OCR on images and PDFs.
Preprocess images for better OCR accuracy.
Support multilingual text extraction.
Customize OCR with specific parameters.

##Features

✅ OCR support for multiple languages. ✅ Image preprocessing with Magick. ✅ OCR from PDF files. ✅ Custom OCR settings (e.g., character whitelists).

Prerequisites

Ensure the following are installed on your system:

R (version >= 4.0)
Tesseract OCR (installed and configured)
Required R libraries: tesseract, magick, pdftools

Installation

Install the required R packages:

install.packages("tesseract")
install.packages("magick")
install.packages("pdftools")

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
Custom OCR Parameters		Custom OCR Parameters
Image Preprocessing with Magick		Image Preprocessing with Magick
License		License
README.md		README.md
basic OCR with Tesseract		basic OCR with Tesseract
pdf ocr		pdf ocr
shinyApp.Rmd		shinyApp.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR Processing with Tesseract and Magick in R

About

Prerequisites

Installation

About

Releases

Packages

License

shahan24h/ocr-with-tesseract

Folders and files

Latest commit

History

Repository files navigation

OCR Processing with Tesseract and Magick in R

About

Prerequisites

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages