Continuous advances in big models in natural language processing (NLP) have altered the way we interact with and interpret textual data. Although these models have shown outstanding intelligence, worries about data security and privacy have presented substantial hurdles. To address these issues, the DB-GPT project emerges as a novel solution that provides total control and unsurpassed privacy for all of your NLP requirements. In this post, we will look at the possibilities of DB GPT, its distinguishing traits, and how it is changing the NLP scene.
Table of Contents
What is DB GPT
DB GPT is a new natural language processing (NLP) technology that powers your database with a large language model. DB GPT may be used to automate a wide range of database processes, such as:
- Querying data
- Generating reports
- Translating data
- Classifying data
- Answering questions
DB GPT is still in the works, but it has the potential to change the way we interact with databases. Natural language may be used to access and analyze data in DB GPT, making your job much more efficient and productive.
DB GPT is a new natural language processing (NLP) technology that powers your database with a large language model. DB GPT may be used to automate a wide range of database processes, including data querying, report generation, data translation, data classification, and question answering. DB GPT is still in the works, but it has the potential to change the way we interact with databases.
Installation
Hardware Requirements
There are some hardware requirements since our project can reach ChatGPT performance of more than 85%. However, the project may be deployed and utilized on consumer-grade graphics cards in general. The following are the particular hardware requirements for deployment:
GPU | VRAM Size | Performance |
---|---|---|
RTX 4090 | 24 GB | Smooth conversation inference |
RTX 3090 | 24 GB | Smooth conversation inference, better than V100 |
V100 | 16 GB | Conversation inference possible, noticeable stutter |
install
This project is dependent on a local MySQL database service, which must be installed locally. For installation, we suggest Docker.
$ docker run --name=mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=aa12345678 -dit mysql:latest
We utilize the Chroma embedding database as the default for our vector database, so no specific installation is required. If you want to connect to other databases, you may install and configure them using our instructions. The miniconda3 virtual environment is used throughout the DB-GPT installation procedure. Install the Python required and create a virtual environment.
python>=3.10
conda create -n dbgpt_env python=3.10
conda activate dbgpt_env
pip install -r requirements.txt
Run
You can refer to this document to obtain the Vicuna weights: Vicuna .
If you have difficulty with this step, you can also directly use the model from this link as a replacement.
- Run server.
$ python pilot/server/llmserver.py
- Run gradio webui
$ python pilot/server/webserver.py
You must edit the .env file since the webserver needs to connect to the llmserver. MODEL_SERVER = “http://127.0.0.1:8000” should be replaced with your IP address. It’s very important.
Usage Instructions
They provide a Gradio user interface that allows you to use DB-GPT using our user interface. They have also developed many reference papers (in Chinese) that introduce the code and principles linked to our project.
Multi LLMs Usage
To utilize several models, use the LLM_MODEL option in the .env configuration file to switch between them.
1.In the pilot/datasets directory, place personal knowledge files or folders.
2.In the tool’s directory, run the knowledge repository script.
& python tools/knowledge_init.py
--vector_name : your vector store name default_value:default
--append: append mode, True:append, False: not append default_value:False
3.In the interface, enter the name of your knowledge repository (if not provided, use “default”) so that you may use it for Q&A based on your knowledge base.
It should be noted that the default vector model used is text2vec-big-chinese (which is a huge model, therefore if your personal computer configuration is insufficient, text2vec-base-chinese is advised). As a result, be sure you download the model and save it in the model’s directory.
If you encounter nltk-related issues when using the knowledge base, you must install the nltk toolkit. Please see the following documents for further information: nltk documents the Python interpreter and enter the following commands:
>>> import nltk
>>> nltk.download()
SQL Generation
Generate Create Table SQL
Generating executable SQL: For generating executable SQL, first choose the appropriate database, and then the model may generate SQL based on the database schema information. The successful outcome of running it would be as follows:
DB-GPT Architecture
DB-GPT uses FastChat to develop a big model operating system and offers a large language model driven by Vicuna. Furthermore, we provide private domain knowledge base question-answering capacity via LangChain. In addition, we support other plugins, and our architecture inherently supports the Auto-GPT plugin.
Is the architecture of the entire DB-GPT shown in the following figure:
The core capabilities mainly consist of the following parts:
- Knowledge base support: Provides question-answering functionality for private domain knowledge bases.
- Capability for large-scale model management: Provides a big model operating environment based on FastChat.
- Unified data vector storage and indexing: A unified method for storing and indexing multiple data types.
- Connection module: This module is used to connect several modules and data sources in order to achieve data flow and interaction.
- Agent and plugins: capabilities are provided, allowing users to change and improve the system’s behavior.
- Prompt creation and optimization: Generates high-quality prompts automatically and optimizes them to increase system response efficiency.
- Multi-platform product interface: Supports a wide range of client products, including web, mobile, and desktop applications.
Features
- SQL language capabilities
- SQL generation
- SQL diagnosis
- Private domain Q&A and data processing
- Database knowledge Q&A
- Data processing
- Plugins
- Support custom plugin execution tasks and natively support the Auto-GPT plugin, such as:
- Automatic execution of SQL and retrieval of query results
- Automatic crawling and learning of knowledge.
- Unified vector storage/indexing of knowledge base
- Support for unstructured data such as PDF, Markdown, CSV, and WebURL
- Milti LLMs Support
- Supports multiple large language models, currently supporting Vicuna (7b, 13b), ChatGLM-6b (int4, int8)
- TODO: codegen2, codet5p
Also Read: Fairseq: A Powerful Tool for Sequence Modeling
This article is to help you learn DB GPT. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.