Patent-Claims-Topic-Modeling

Our goal is to better understand competition in the mobile communications sector by studying patent claims. We want to create a method that groups patent claims into clear topics and use this method in a simple interactive tool. This tool should let users pick how many groups they want and show the names and number of claims in each group. For this project, you'll need to install the following dependencies:

Python 3 Installations Needed:

Libraries:

requests: To make HTTP requests for web scraping.
beautifulsoup4: For parsing HTML and XML documents.
nltk: Natural Language Toolkit for text processing.
gensim: For topic modeling using Latent Dirichlet Allocation (LDA).
pandas: For data manipulation and analysis.
Flask: Web framework for building the interactive application.
pyLDAvis: For visualizing the LDA topics.

Running the Notebooks and Flask Application:

To ensure proper execution, follow the sequence below:

Data-Gathering.ipynb:

Run this notebook first to scrape patent claims data from various sources.
Save the extracted data as pickle files in the same directory as the code.

Patent-Claims-Data-Cleaning.ipynb:

Proceed to run this notebook after Data-Gathering.ipynb.
Clean the extracted patent claims data to prepare it for analysis.
Save any cleaned data or intermediate results as pickle files in the same directory.

LDA.ipynb:

Run this notebook after Patent-Claims-Data-Cleaning.ipynb.
Build Latent Dirichlet Allocation (LDA) models to identify topics within the patent claims.
Save any generated models or processed data as necessary.

flaskApp.py:

Lastly, run the Flask application to interactively explore the LDA topic modeling results.
Ensure all necessary files, including pickle files generated from previous steps, are saved in the same directory as the code.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
README.md		README.md
all_claims.pkl		all_claims.pkl
corpus.pkl		corpus.pkl
data_words.pkl		data_words.pkl
id2word.pkl		id2word.pkl
new_corpus.pkl		new_corpus.pkl
num_corpus.pkl		num_corpus.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patent-Claims-Topic-Modeling

Python 3 Installations Needed:

Libraries:

Running the Notebooks and Flask Application:

To ensure proper execution, follow the sequence below:

Data-Gathering.ipynb:

Patent-Claims-Data-Cleaning.ipynb:

LDA.ipynb:

flaskApp.py:

About

Releases

Packages

Languages

karinfridkin/Patent-Claims-Topic-Modeling

Folders and files

Latest commit

History

Repository files navigation

Patent-Claims-Topic-Modeling

Python 3 Installations Needed:

Libraries:

Running the Notebooks and Flask Application:

To ensure proper execution, follow the sequence below:

Data-Gathering.ipynb:

Patent-Claims-Data-Cleaning.ipynb:

LDA.ipynb:

flaskApp.py:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages