Leveraging Large Language Models for Automated Knowledge Graphs Generation in Non-Destructive Testing
This repository contains the code and resources for the "Automated Knowledge Graph Generation for Non-Destructive Testing Using Large Language Models" paper. The project aims to extract information from heterogeneous scientific articles in the Non-Destructive Testing (NDT) domain and organize it into a Knowledge Graph (KG) using Neo4j and OpenAI's GPT-4o.
- Introduction
- Features
- Installation
- Usage
- Results
- Challenges and Future Work
- Acknowledgements
- License
Non-Destructive Testing (NDT) is a crucial field in materials science and engineering, offering techniques to evaluate the properties of materials without causing damage. This project leverages advanced natural language processing (NLP) techniques, specifically large language models (LLMs) such as OpenAI's GPT-4o, to automate the extraction and organization of NDT methods, deterioration mechanisms, and physical changes into a comprehensive Knowledge Graph (KG).
- Automated Data Extraction: Uses GPT-4o to extract NDT-related information from scientific literature.
- Knowledge Graph Construction: Structures the extracted data into a Neo4j graph database.
- Querying and Analysis: Enables exploration and analysis of relationships between materials, deterioration mechanisms, physical changes, and NDT methods.
- Python 3.7 or higher
- Neo4j 4.0 or higher
- Required Python packages (listed in
requirements.txt
)
- Clone the Repository
git clone https://github.com/ghezalahmad/LLM_NDT_Knowledge_Graph.git cd LLM_NDT_Knowledge_Graph
- Set Up a Virtual Environment (Optional)
And then:
pip install pipenv pipenv install pipenv shell
python -m venv venv source venv/bin/activate # On Windows use `venv\\Scripts\\activate`
pip install -r requirements.txt
-
Set Up Neo4j
- Follow the instructions to install Neo4j: Neo4j Installation Guide
- Start the Neo4j service:
sudo systemctl start neo4j sudo systemctl enable neo4j
-
Configure Neo4j Connection
- Update the connection details in the
config.py
file with your Neo4j credentials and address:
NEO4J_URI = "bolt://localhost:7687" NEO4J_USER = "neo4j" NEO4J_PASSWORD = "your-password"
- Update the connection details in the
- Sign up for an API key at OpenAI.
- Replace
Your_API_KEY
in the script with your actual OpenAI API key.
-
Data Collection: Ensure your scientific literature and technical document dataset is ready. Place your documents in the
data/
directory. -
Run the Extraction Script
python agent_bricks.py
-
Generate the Knowledge Graph
python create_kg.py
ndt-knowledge-graph/
├── data/
│ ├── Concrete.rtf
│ ├── Wood.rtf
│ ├── Bricks.rtf
│ ├── Metal.rtf
├──Code
├ ├── agent_bricks.py
│ ├── agent_woods.py
│ ├── agent_concrete.py
│ ├── agent_steel.py
│ ├── agent_kg.py
├── requirements.txt
└── README.md
- You can query the Neo4j database using Cypher queries. Access the Neo4j browser at
http://localhost:7474
and use the provided queries in thequeries/
directory to explore the KG.
MATCH (m:Material)-[:HAS_DETERIORATION_MECHANISM]->(d:DeteriorationMechanism)-[:CAUSES_PHYSICAL_CHANGE]->(p:PhysicalChange)-[:DETECTED_BY]->(n:NDTMethod)
RETURN m.name, d.name, p.name, n.name
The constructed Knowledge Graph includes nodes representing four primary materials: concrete, steel, wood, and bricks. Each material node is linked to various deterioration mechanisms, physical changes, and corresponding NDT methods. The KG enables the exploration and analysis of how different NDT techniques are applied to detect specific types of deterioration across various materials.
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
- Ensuring consistent terminology across diverse documents.
- Balancing specificity and generalizability in the extracted data.
- Distinguishing between natural features and actual deterioration mechanisms.
- Expanding the corpus of scientific articles to include more diverse sources and materials.
- Improving the accuracy and depth of entity and relationship extraction through advanced machine learning techniques.
- Integrating the KG with other scientific databases and ontologies to enrich its content.
We thank Reincarnate for funding this project. Special thanks to Benjamin, Andre, and Sabine for their support and contributions.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or suggestions, please open an issue or contact [email protected].