This repository using NLP relationship extraction to extract entity-relationship combinations from a body of text and loads it into a graph database.
This repository loads data into a neo4j database. To use this code you'll need to download, install and setup Neo4j Desktop.
This repository uses the Open IE project to extract entities and relationships from text. To use this code you'll need to clone and follow the setup instructions on the Open IE GitHub repository page and run the project as an HTTP service.
This repository uses the text files in the data directory as its data source, which is a manual extract from ten White House Press Briefing documents. If you wish to use another data source simply replace the text files in the data directory.
Results from Open IE are cached to a file. If you change the data in the data directory after this file is created you'll need to delete this file.
Run the following commands to install the dependencies and run the code.
python -m venv env
source venv/bin/activate # or for Windows: .\env\Scripts\activate
pip install -r requirements.txt
To run this code use the following command.
python main.py
Once the data is loaded into neo4j you can verify the load with the following cypher query.
MATCH (n:Entity) RETURN n LIMIT 1000
To explore the data using neo4j's graph algorithms you'll first need to create a graph catalog like shown below (see the page on graph management for more details).
CALL gds.graph.create('knowledge-graph-catalog', 'Entity', 'RELATION', { relationshipProperties:'confidence'})
Once your graph catalog is created you can run the graph algorithm of your choice. For example,
CALL gds.pageRank.stream('knowledge-graph-catalog')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC