Pdf Squirrel

Overview

The Pdf Squirrel project is designed to facilitate the detection and manipulation of graphical and text blocks in images. This repository includes scripts for finding blocks in images, converting PDF files to images, normalizing and processing document images, applying blurs to specific sections, and drawing rectangles around detected sentences in images.

Features

Block Detection: Identify and outline blocks in images.
PDF to Image Conversion: Convert PDF documents into images.
Image Normalization: Process images to enhance text visibility and remove graphical elements.
Blur Effects: Apply blur effects to text blocks to visually separate them.
Sentence Detection: Draw rectangles around detected text sentences.

Installation

Clone the repository and navigate to the directory:

git clone https://github.com/yourusername/pdf_squirrel.git
cd pdf_squirrel

Install the required Python packages:

pip install -r requirements.txt

Usage

Each script in the repository can be run independently, depending on the task. Below are examples of how to use each script:

Finding Blocks

python find_blocks.py --source_dir=path_to_images --output_dir=path_to_save_images

Converting PDFs to Images

python pdf_to_img.py --source_dir=path_to_pdfs --output_dir=path_to_save_images

Normalizing Images

python img_normalize.py --source_dir=path_to_images --output_dir=path_to_save_images

Applying Blur to Text Blocks

python img_blur.py --source_dir=path_to_images --output_dir=path_to_save_images --box_blur_radius=5

Drawing Rectangles Around Sentences

python sentence_blocks.py --source_dir=path_to_images --output_dir=path_to_save_images

Contributing

Contributions are welcome! Please feel free to submit a pull request or create an issue if you have suggestions or find a bug.

License

This project is licensed under the MIT License - see the MIT License file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
find_blocks.py		find_blocks.py
img_blur.py		img_blur.py
img_normalize.py		img_normalize.py
nltk_download.py		nltk_download.py
pdf_to_img.py		pdf_to_img.py
requirements.txt		requirements.txt
sentence_blocks.py		sentence_blocks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pdf Squirrel

Overview

Features

Installation

Usage

Finding Blocks

Converting PDFs to Images

Normalizing Images

Applying Blur to Text Blocks

Drawing Rectangles Around Sentences

Contributing

License

About

Releases

Languages

License

otanadzetsotne/pdf_squirrel

Folders and files

Latest commit

History

Repository files navigation

Pdf Squirrel

Overview

Features

Installation

Usage

Finding Blocks

Converting PDFs to Images

Normalizing Images

Applying Blur to Text Blocks

Drawing Rectangles Around Sentences

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages