localscore/README.md
LocalScore is an open-source benchmarking tool designed to measure how fast Large Language Models (LLMs) run on your specific hardware. It also provides a public database for benchmark results, helping you make informed decisions about running AI models locally.
You can view the leaderboard at localscore.ai
<p align="center"> <i><a href="https://builders.mozilla.org/"></a>
<a href="https://localscore.ai">LocalScore</a> is a <a href="https://builders.mozilla.org/">Mozilla Builders</a> project. </i>
</p>LocalScore evaluates three key performance metrics:
These metrics are combined into a single LocalScore value using a geometric mean:
$\text{score} = 10 \cdot \sqrt[3]{\text{avg\_prompt\_tps} \cdot \text{avg\_gen\_tps} \cdot \frac{1000}{\text{avg\_ttft\_ms}}}$
As a general guideline:
There are four primary ways to run LocalScore. The first two are covered on the LocalScore download page.
In addition to the following instructions, you can always build from source. For that case please follow the Llamafile build instructions.
The LocalScore bundle include the LocalScore binary and a model inside of it.
Visit localscore.ai to get the current bundles.
Download the appropriate binary for your operating system: Latest Release Download Page
chmod +x localscore
./localscore -m path/to/model.gguf
localscore.exe -m path\to\model.gguf
Every new llamafile (>=v0.9.2) contains commands to run LocalScore.
Example:
# Download a llamafile from Hugging Face
curl -O https://huggingface.co/Mozilla/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct.Q4_K_M.llamafile
# Run LocalScore
chmod +x Llama-3.2-1B-Instruct.Q4_K_M.llamafile
./Llama-3.2-1B-Instruct.Q4_K_M.llamafile --bench
Download any llamafile smaller than 4GB from Hugging Face and run it.
Llama-3.2-1B-Instruct.Q4_K_M.llamafile.exe --bench
If you have Llamafile installed, you can run LocalScore directly from it.
# Run LocalScore
llamafile --bench -m path/to/model.gguf
llamafile.exe --bench -m path\to\model.gguf
usage: localscore [options]
options:
-h, --help Show this help message
-m, --model <filename> Model to benchmark (default: path/to/default)
-c, --cpu Disable GPU acceleration (alias for --gpu=disabled)
-g, --gpu <auto|amd|apple|nvidia|disabled> GPU backend to use (default: "auto")
-i, --gpu-index <i> Select GPU by index (default: 0)
--list-gpus List available GPUs and exit
-o, --output <csv|json|md> Output format (default: md)
-v, --verbose Enable verbose output
-y, --send-results Send results without confirmation
-n, --no-send-results Disable sending results
-e, --extended Run 4 repetitions (shortcut for --reps=4)
--long Run 16 repetitions (shortcut for --reps=16)
--reps <N> Set custom number of repetitions
./localscore -m path/to/model.gguf --cpu
./localscore -m path/to/model.gguf -y
./localscore -m path/to/model.gguf -e
When you submit benchmark results to localscore.ai, the following non-personally identifiable system information is collected:
This data helps build a comprehensive database of hardware performance for LLM inference, allowing users to compare different setups and make informed decisions.
Check out the Troubleshooting doc for common issues and solutions.
Contributions are welcome! See the main Llamafile README for building instructions and development guidelines.
LocalScore was created with support from Mozilla Builders and builds upon the excellent work of:
LocalScore is released under the MIT License.