FastChat is a platform that seeks to simplify and speed up the process of developing and evaluating chatbots. With chatbots gaining popularity and usefulness in areas like customer service, education, entertainment, and health care, creating and testing chatbots can be a challenging and resource-intensive task. In this article, we’ll introduce you to FastChat, a new platform that streamlines the chatbot development and evaluation process.
Table of Contents
FastChat provides a user-friendly interface that allows users to create, test, and deploy chatbots in minutes. FastChat also offers a rich set of features, such as natural language understanding, dialogue management, response generation, and analytics. FastChat enables users to build chatbots for different purposes and scenarios, such as conversational agents, question answering systems, task-oriented bots, and social chatbots. FastChat supports multiple languages and platforms, such as web, mobile, and voice. FastChat is designed to help users create high-quality chatbots that can engage and satisfy their target audiences.
Installation
Method 1: Install With pip
pip3 install fschat
Method 2: Get source from GitHub
- Clone this repository and navigate to the FastChat folder.
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
If you are running on Mac:
brew install rust cmake
2. Install Package
pip3 install --upgrade pip # enable PEP 660 support
pip3 install -e .
Model Weights
Vicuna Weights
To comply with the LLaMA model licence, they provide Vicuna weights as delta weights. To acquire the Vicuna weights, add delta to the original LLaMA weights.
Instructions:
- Get the original LLaMA weights in the huggingface format by following the instructions here.
- Apply our delta to the scripts below to acquire Vicuna weights. They will download delta weights from our Hugging Face account automatically.
Weights v1.1 are only compatible with transformers>=4.28.0
and fschat >= 0.2.0
. Please update your local packages as needed. If you use the above instructions to perform a clean install, you should obtain all of the right versions.
Vicuna-7B
This conversion command requires around 30 GB of CPU RAM. If you don’t have enough memory, see the “Low CPU Memory Conversion” section below.
python3 -m fastchat.model.apply_delta \
--base-model-path /path/to/llama-7b \
--target-model-path /output/path/to/vicuna-7b \
--delta-path lmsys/vicuna-7b-delta-v1.1
Vicuna-13B
This conversion command needs around 60 GB of CPU RAM. See the “Low CPU Memory Conversion” section below if you do not have enough memory.
python3 -m fastchat.model.apply_delta \
--base-model-path /path/to/llama-13b \
--target-model-path /output/path/to/vicuna-13b \
--delta-path lmsys/vicuna-13b-delta-v1.1
Old weights
Low CPU Memory Conversion
You can try these methods to reduce the CPU RAM requirement of weight conversion.
- Add
--low-cpu-mem
to the preceding commands to break huge files into smaller ones and use the disc as temporary storage. This can keep the maximum RAM below 16GB. - Make a huge swap file and rely on the operating system to use the disc as virtual memory automatically.
FastChat-T5
Simply run the line below to start chatting. It will automatically download the weights from a Hugging Face repo.
python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
Supported Models
The following models are tested:
- Vicuna, Alpaca, LLaMA, Koala
- lmsys/fastchat-t5-3b-v1.0
- BlinkDL/RWKV-4-Raven
- databricks/dolly-v2-12b
- OpenAssistant/oasst-sft-1-pythia-12b
- project-baize/baize-lora-7B
- StabilityAI/stablelm-tuned-alpha-7b
- THUDM/chatglm-6b
The following command takes around 28GB of GPU memory for Vicuna-13B and 14GB of GPU memory for Vicuna-7B. If you don’t have enough memory, see the “No Enough Memory” section below.
python3 -m fastchat.serve.cli --model-path /path/to/model/weights
Multiple GPUs
Model parallelism can be used to aggregate GPU memory from numerous GPUs on the same system.
python3 -m fastchat.serve.cli --model-path /path/to/model/weights --num-gpus 2
CPU Only
This works just on the CPU and does not require a GPU. Vicuna-13B requires around 60GB of CPU memory, whereas Vicuna-7B requires approximately 30GB of CPU memory.
python3 -m fastchat.serve.cli --model-path /path/to/model/weights --device cpu
Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
Use --device mps
to enable GPU acceleration on Mac computers (requires torch >= 2.0). Use --load-8bit
to turn on 8-bit compression.
python3 -m fastchat.serve.cli --model-path /path/to/model/weights --device mps --load-8bit
Vicuna-7B can run on a 32GB M1 Macbook with 1 – 2 words / second.
Not Enough Memory
If you don’t have enough RAM, you may use 8-bit compression by adding –load-8bit to the preceding instructions. With somewhat reduced model quality, this can cut memory consumption in half. It works with the CPU, GPU, and Metal backends. Vicuna-13B can operate on a single NVIDIA 3090/4080/T4/V100(16GB) GPU with 8-bit compression.
python3 -m fastchat.serve.cli --model-path /path/to/model/weights --load-8bit
Additionally, you may use --cpu-offloading
to the aforementioned instructions to unload weights that do not fit on your GPU to CPU memory. This necessitates the activation of 8-bit compression and the installation of the bitsandbytes package, which is only accessible on Linux operating systems.
More Platforms
- MLC LLM, backed by TVM Unity compiler, deploys Vicuna natively on phones, consumer-class GPUs and web browsers via Vulkan, Metal, CUDA and WebGPU.
Serving with Web GUI
You’ll need three major components to serve utilizing the web UI: web servers that interact with users, model workers that host one or more models, and a controller to synchronize the webserver and model workers. The following commands should be entered into your terminal:
Launch the controller
python3 -m fastchat.serve.controller
This controller manages the distributed workers.
Launch the model worker
python3 -m fastchat.serve.model_worker --model-path /path/to/model/weights
Wait until the model loading procedure is complete and you notice “Uvicorn running on…”. You can launch numerous model workers at the same time to service various models. The model worker will immediately connect to the controller.
Send a test message using the following command to confirm that your model worker is correctly linked to your controller:
python3 -m fastchat.serve.test_message --model-name vicuna-7b
You will see a short output.
Launch the Gradio web server
python3 -m fastchat.serve.gradio_web_server
This is the user interface with which users will interact.
You will be able to serve your models via the web UI if you follow these instructions. You may now open your browser and start chatting with a model.
Generate API
Huggingface Generation APIs
See fastchat/serve/huggingface_api.py
OpenAI-compatible RESTful APIs & SDK
Evaluation
GPT-4 is the foundation of our AI-enhanced assessment workflow. This section gives a high-level overview of the pipeline. Please see the evaluation documents for further information.
Pipeline Steps
- Create replies using several models: For ChatGPT, use
qa_baseline_gpt35.py
or provide the model checkpoint and runget_model_answer.py
for Vicuna and other models. - Generate reviews with GPT-4: GPT-4 may be used to generate reviews automatically. If you do not have access to the GPT-4 API, you can do this step manually.
- Generate visualization data: Run
generate_webpage_data_from_table.py
to create data for a static website that will allow you to visualize the evaluation data. - Data visualization: Under the webpage directory, provide a static website. To serve the webpage locally, use
python3 -m http.server
Data Format and Contribution
For evaluation, they employ a data format encoded with JSON lines. Models, prompts, reviewers, questions, replies, and reviews are all included in the structure.
By gaining access to the appropriate data, you may customize the evaluation process or contribute to the data.
For detailed instructions, please refer to the evaluation documentation.
Also Read ChatGPT4 for Free.
This article helps you learn about FastChat. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.