LawGPT is a large language model (LLM) fine-tuned for the legal sector. This implies it has been trained on a wide range of legal literature, allowing it to understand and generate legal language.
Table of Contents
What is LaWGPT?
LaWGPT is a large language model (LLM) fine-tuned for the legal area. It is the work of Stability AI, a company that creates and deploys huge language models for a wide range of applications.
LaWGPT has been trained on a wide range of legal text, allowing it to understand and generate legal language. It is used in the legal profession for a range of activities, including:
LawGPT can be used for a variety of tasks in the legal profession, including:
- Legal research: LawGPT can help you obtain essential legal information quickly and conveniently.
- Legal writing: LawGPT can generate legal documents including contracts and petitions.
- Legal analysis: LawGPT may be used to examine legal arguments and find flaws.
- Legal education: LawGPT can be used to assist law students in learning the law.
- Legal practice: Lawyers may utilize LawGPT to give better service to their clients.
LawGPT is still in the works, but it has the potential to transform the legal profession. LawGPT can free up attorneys to focus on more complicated and strategic work by automating processes that are now performed by humans. Furthermore, LawGPT can assist attorneys in delivering better service to their clients by giving them access to more information and assisting them in more efficiently analyzing legal arguments.
Law GPT is a powerful new instrument with the potential to significantly alter the legal profession. Law GPT is going to have a more major part in the way law is practiced as it develops.
To get started quickly with the LaW GPT project, follow these steps to prepare the code and create the environment:
1. Download the code:
git clone [email protected]:pengxiao-song/LaWGPT.git cd LaWGPT
2. Create the environment:
conda create -n lawgpt python=3.10 -y conda activate lawgpt pip install -r requirements.txt
3. Launch the Web UI (optional, for easy parameter adjustment):
- Execute the service startup script:
- Access the Web UI by opening your web browser and navigating to http://127.0.0.1:7860.
4. Command line inference (optional, batch testing supported)
- First, construct the test sample set with reference to the file content;
- Second, execute the inference script: . where parameter is the test sample set path, and if it is empty or the path is wrong, it is run in interactive mode.
LaWGPT project directory structure
LaWGPT ├── assets # Static resources ├── resources # Project resources ├── models # Base models and Lora weights │ ├── base_models │ └── lora_weights ├── outputs # Fine-tuned instruction outputs ├── data # Experimental data ├── scripts # Script directory │ ├── finetune.sh # Instruction fine-tuning script │ └── webui.sh # Service startup script ├── templates # Prompt templates ├── tools # Toolkits ├── utils ├── train_clm.py # Secondary training ├── finetune.py # Instruction fine-tuning ├── webui.py # Service startup ├── README.md └── requirements.txt
Here’s a brief description of the main directories and files:
assets: This directory includes the project’s static resources.
resources: It contains project-specific resources.
models: This directory contains the base models and Lora weights.
outputs: It stores the output weights from fine-tuning instructions.
data: Experimental data is stored in this directory.
scripts: It contains various scripts, including
finetune.shfor instruction fine-tuning and
webui.shfor service startup.
templates: Prompt templates are stored here.
tools: Toolkits required for the project are located in this directory.
utils: Utility functions or modules can be found here.
train_clm.py: This script is used for secondary training.
finetune.py: It is used for fine-tuning instructions.
webui.py: This script is used to start the service.
README.md: A markdown file containing information about the project.
requirements.txt: A file listing the required Python packages for the project.
This project is based on datasets such as legal document data and judicial examination data released by the Chinese Judgment Document Network; for more information, please see the Chinese legal data summary.
- Primary data generation: Using Stanford_alpaca and self-instruct techniques, generate conversational Q&A data.
- Knowledge-led data generation: Using a knowledge-based self-instruct technique, generate data based on Chinese legal structured knowledge.
- Introduce ChatGPT to clean data and assist in the creation of high-quality datasets.
The training process of Law GPT series models is divided into two phases:
- Phase 1: Expand the legal vocabulary and prepare Chinese-LLaMA for large-scale legal instruments and codex data.
- The second stage: Create a legal conversation question and response dataset, then fine-tune the instructions based on the pre-trained model.
Secondary training process
- Refer to Construct a secondary training dataset
- Refer to Constructing a Directive Fine-tuning Data Set
Due to the limitations of computing resources, data scale, and other factors, Law GPT has many limitations at this stage:
- Model memory and language skills are limited due to limited data resources and model capacity. As a result, when presented with factual knowledge tests, inaccurate outcomes may be obtained.
- The models in the series only have a basic alignment with human purpose. As a result, potentially dangerous information and content that does not adhere to human preferences and values may be generated.
- There are issues with self-awareness, and Chinese understanding might be improved.
This article is to help you learn LaWGPT. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.