docs/arena.md
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://lmarena.ai. We invite the entire community to join this benchmarking effort by contributing your votes and models.
If you want to see a specific model in the arena, you can follow the methods below.
If you have a model hosted by a 3rd party API provider or yourself, please give us the access to an API endpoint.
python3 -m fastchat.serve.controller to start the controller and begin registering local model workers and API-provided workers.python3 -m fastchat.serve.sglang_worker --model-path <model-path> --tokenizer-path <tokenizer-path> to run local vision-language models. Currently supported models include the LLaVA and Yi-VL series.api_endpoints.json.--vision-arena flag on.--use-remote-storage--random_questions metadata_sampled.json. Check sections below for how to generate this.Example command:
python3 -m fastchat.serve.gradio_web_server_multi --share --register-api-endpoint-file api_endpoints.json --vision-arena --use-remote-storage --random-questions metadata_sampled.json
AZURE_IMG_MODERATION_ENDPOINT: This is the endpoint that the NSFW moderator is hosted (e.g. https://{endpoint}/contentmoderator/moderate/v1.0/ProcessImage/Evaluate). Change the endpoint to your own.AZURE_IMG_MODERATION_API_KEY: Your API key to run this endpoint.PHOTODNA_API_KEY: The API key that runs the CSAM detector endpoint.Example in ~/.bashrc:
export AZURE_IMG_MODERATION_ENDPOINT=https://<endpoint>/contentmoderator/moderate/v1.0/ProcessImage/Evaluate
export AZURE_IMG_MODERATION_API_KEY=<api-key>
export PHOTODNA_API_KEY=<api-key>
We provide random samples of example images for users to interact with coming from various datasets including DocVQA, RealWorldQA, ChartQA and VizWiz-VQA.
python fastchat/serve/vision/create_vqa_examples_dir.py