katharinawuensche / NLPdf Public

Notifications You must be signed in to change notification settings
Fork 1
Star 1

PDF Extractor using Natural Language Processong

1 star 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
PDF_Processor.py		PDF_Processor.py
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Repository files navigation

NLPdf

PDF Extractor using Natural Language Processing

Quickstart

Download the repository
Install the requirements:

pip3 install -r requirements.txt

Load the language model for Spacy:

python3 -m spacy download en

Copy the PDF files to be cleaned into the directory "PDFs"
Run the extraction tool:

python3 run.py

The output is written to the directory "output"

About

PDF Extractor using Natural Language Processong

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%