PDF Extractor using Natural Language Processing
- Download the repository
- Install the requirements:
pip3 install -r requirements.txt
- Load the language model for Spacy:
python3 -m spacy download en
- Copy the PDF files to be cleaned into the directory "PDFs"
- Run the extraction tool:
python3 run.py
- The output is written to the directory "output"