- Project Overview
- Features
- Prerequisites
- Project Setup
- Contributing
- How to get started with TiDB Vectors
The Semantic Search Engine for Code Repositories is an AI-powered tool designed to help developers find relevant code snippets, functions, or entire libraries based on natural language queries. By leveraging advanced NLP techniques, large language models (LLMs), and TiDB Serverless with Vector Search, this tool allows users to efficiently locate specific code patterns, structures, or algorithms within a codebase.
- Natural Language Querying: Search for code using plain English queries like "find the piece of code that initializes the BST" or "locate the function that performs quicksort."
- Vector Search: Utilizes TiDB Serverless's Vector Search capabilities to identify and retrieve semantically similar code snippets.
- File and Line Number Retrieval: Provides the exact file path and line number where the relevant code appears, along with a code snippet for context.
- Contextual Understanding: Employs LLMs to understand the context and intent behind queries, making the search highly accurate and intuitive.
- Code Reuse Encouragement: Facilitates code reuse by making it easy to find existing solutions, reducing redundancy in development.
Before you begin, ensure you have the following installed on your machine:
- Python 3.8+
- Git
- Virtual Environment (Optional but recommended)
You'll also need: - A TiDB Cloud account and a Serverless instance set up.
- Clone the Repository
git clone https://github.com/jackabald/TiDB-Hack-NL-repo-search.git
cd TiDB-Hack-NL-repo-search`
- Set Up a Virtual Environment (Optional but Recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install Dependencies
pip install -r requirements.txt
- Install, configure Ollama and connect to large language models via the Ollama server.
ollama pull deepseek-coder
- Set up your
secrets.toml
file under.streamlit
directory and copyexample.secrets.toml
intosecrets.toml
and replace the keys
TIDB_URL="<your-tidb-pymysql>"
GITHUB_TOKEN="<your-github-token>"
JINA_API_KEY="<your-jina-api-key>"
Contributions to this project are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request on the project's GitHub repository.
Are you looking for implementing TiDB vectors in your application? Curious about getting started. You can definitely jump into the official docs here.
This project is licensed under the Apache License. Feel free to use, modify, and distribute the code as per the terms of the license.