It´s a script extract PDF´s images and use tesseract OCR for scan it
Before you can run it, you need to install Python 3.8, onwards, and tesseract OCR
You can download for your OS from their Oficial Download Page
For Windows, you can download the binary installer from here.
$ pip install pillow
$ pip install pytesseract
$ pip install opencv-python
$ pip install PyMuPDF
When finish the installation, you can run the script
$ cd tesseract-ocr-pdf
$ python main.py
You need to provide the path to your tesseract.exe. For example:
> [!] Insert path to your tesseract.exe
> C:\Users\User\tesseract\tesseract.exe
Then, the path to your PDF´file
> [!] Insert path to your tesseract.exe
> C:\Users\User\Documents\file.pdf
And then, the script starts to extract images, scan and create the file with the text output