The Pdf Squirrel project is designed to facilitate the detection and manipulation of graphical and text blocks in images. This repository includes scripts for finding blocks in images, converting PDF files to images, normalizing and processing document images, applying blurs to specific sections, and drawing rectangles around detected sentences in images.
- Block Detection: Identify and outline blocks in images.
- PDF to Image Conversion: Convert PDF documents into images.
- Image Normalization: Process images to enhance text visibility and remove graphical elements.
- Blur Effects: Apply blur effects to text blocks to visually separate them.
- Sentence Detection: Draw rectangles around detected text sentences.
Clone the repository and navigate to the directory:
git clone https://github.com/yourusername/pdf_squirrel.git
cd pdf_squirrel
Install the required Python packages:
pip install -r requirements.txt
Each script in the repository can be run independently, depending on the task. Below are examples of how to use each script:
python find_blocks.py --source_dir=path_to_images --output_dir=path_to_save_images
python pdf_to_img.py --source_dir=path_to_pdfs --output_dir=path_to_save_images
python img_normalize.py --source_dir=path_to_images --output_dir=path_to_save_images
python img_blur.py --source_dir=path_to_images --output_dir=path_to_save_images --box_blur_radius=5
python sentence_blocks.py --source_dir=path_to_images --output_dir=path_to_save_images
Contributions are welcome! Please feel free to submit a pull request or create an issue if you have suggestions or find a bug.
This project is licensed under the MIT License - see the MIT License file for details.