docs/model_support.md
This document describes how to support a new model in FastChat.
To support a new local model in FastChat, you need to correctly handle its prompt template and model loading. The goal is to make the following command run with the correct prompts.
python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
You can run this example command to learn the code logic.
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
You can add --debug to see the actual prompt sent to the model.
FastChat uses the Conversation class to handle prompt templates and BaseModelAdapter class to handle model loading.
register_conv_template to add a new one. Please also add a link to the official reference code if possible.register_model_adapter to add a new one.After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.
python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hfpython3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5python3 -m fastchat.serve.cli --model-path ~/model_weights/RWKV-4-Raven-7B-v11x-Eng99%-Other1%-20230429-ctx8192.pthpython3 -m fastchat.serve.cli --model-path mosaicml/mpt-7b-chatpeft in the model path. Note: If
loading multiple peft models, you can have them share the base model weights by
setting the environment variable PEFT_SHARE_BASE_WEIGHTS=true in any model
worker.To support an API-based model, consider learning from the existing OpenAI example. If the model is compatible with OpenAI APIs, then a configuration file is all that's needed without any additional code. For custom protocols, implementation of a streaming generator in fastchat/serve/api_provider.py is required, following the provided examples. Currently, FastChat is compatible with OpenAI, Anthropic, Google Vertex AI, Mistral, Nvidia NGC, YandexGPT and Reka.
api_endpoints.json:{
"gpt-3.5-turbo": {
"model_name": "gpt-3.5-turbo",
"api_type": "openai",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-******",
"anony_only": false,
"recommended_config": {
"temperature": 0.7,
"top_p": 1.0
},
"text-arena": true,
"vision-arena": false,
}
}
--register api_endpoints.json:python3 -m fastchat.serve.gradio_web_server --controller "" --share --register api_endpoints.json
Now, you can open a browser and interact with the model.