benchmark/asr/README.md
This benchmark evaluates the performance and accuracy (Word Error Rate - WER) of Automatic Speech Recognition (ASR) models served via SGLang.
openai/whisper-large-v3openai/whisper-large-v3-turboQwen/Qwen3-ASR-1.7BQwen/Qwen3-ASR-0.6BInstall the required dependencies:
apt install ffmpeg
pip install librosa soundfile datasets evaluate jiwer transformers openai torchcodec torch
Launch the SGLang server with a Whisper model:
python -m sglang.launch_server --model-path openai/whisper-large-v3 --port 30000
Basic usage (using chat completions API):
python bench_sglang.py --base-url http://localhost:30000 --model openai/whisper-large-v3 --n-examples 10
Using the OpenAI-compatible transcription API:
python bench_sglang.py \
--base-url http://localhost:30000 \
--model openai/whisper-large-v3 \
--api-type transcription \
--language English \
--n-examples 10
Run with streaming and show real-time output:
python bench_sglang.py \
--base-url http://localhost:30000 \
--model openai/whisper-large-v3 \
--api-type transcription \
--stream \
--show-predictions \
--concurrency 1
Run with higher concurrency and save results:
python bench_sglang.py \
--base-url http://localhost:30000 \
--model openai/whisper-large-v3 \
--concurrency 8 \
--n-examples 100 \
--output results.json \
--show-predictions
| Argument | Description | Default |
|---|---|---|
--base-url | SGLang server URL | http://localhost:30000 |
--model | Model name on the server | openai/whisper-large-v3 |
--dataset | HuggingFace dataset for evaluation | D4nt3/esb-datasets-earnings22-validation-tiny-filtered |
--split | Dataset split to use | validation |
--concurrency | Number of concurrent requests | 4 |
--n-examples | Number of examples to process (-1 for all) | -1 |
--output | Path to save results as JSON | None |
--show-predictions | Display sample predictions | False |
--print-n | Number of samples to display | 5 |
--api-type | API to use: chat (chat completions) or transcription (audio transcriptions) | chat |
--language | Language for transcription API (e.g., English, en) | None |
--stream | Enable streaming mode for transcription API | False |
The benchmark outputs:
| Metric | Description |
|---|---|
| Total Requests | Number of successful ASR requests processed |
| WER | Word Error Rate (lower is better), computed using the evaluate library |
| Average Latency | Mean time per request (seconds) |
| Median Latency | 50th percentile latency (seconds) |
| 95th Latency | 95th percentile latency (seconds) |
| Throughput | Requests processed per second |
| Token Throughput | Output tokens per second |
python bench_sglang.py --api-type transcription --concurrency 128 --model openai/whisper-large-v3 --show-predictions
Loading dataset: D4nt3/esb-datasets-earnings22-validation-tiny-filtered...
Using API type: transcription
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Performing warmup...
Processing 511 samples...
------------------------------
Results for openai/whisper-large-v3:
Total Requests: 511
WER: 12.7690
Average Latency: 1.3602s
Median Latency: 1.2090s
95th Latency: 2.9986s
Throughput: 19.02 req/s
Token Throughput: 354.19 tok/s
Total Test Time: 26.8726s
------------------------------
==================== Sample Predictions ====================
Sample 1:
REF: on the use of taxonomy i you know i think it is it is early days for us to to make any clear indications to the market about the proportion that would fall under that requirement
PRED: on the eu taxonomy i think it is early days for us to make any clear indications to the market about the proportion that would fall under that requirement
----------------------------------------
Sample 2:
REF: so within fiscal year 2021 say 120 a 100 depending on what the micro will do and next year it is not necessarily payable in q one is we will look at what the cash flows for 2022 look like
PRED: so within fiscal year 2021 say $120000 $100000 depending on what the macro will do and next year it is not necessarily payable in q one is we will look at what the cash flows for 2022 look like
----------------------------------------
Sample 3:
REF: we talked about 4.7 gigawatts
PRED: we talked about 4.7 gigawatts
----------------------------------------
Sample 4:
REF: and you know depending on that working capital build we will we will see what that yields
PRED: and depending on that working capital build we will see what that yields what
----------------------------------------
Sample 5:
REF: so on on sinopec what we have agreed with sinopec way back then is that free cash flows after paying all capexs are distributed out 30 70%
PRED: so on sinopec what we have agreed with sinopec way back then is that free cash flows after paying all capexes are distributed out 30% 70%
----------------------------------------
============================================================
--stream with --show-predictions, use --concurrency 1 for clean sequential output--language option accepts both full names (e.g., English) and ISO 639-1 codes (e.g., en)Server connection refused
--base-urlOut of memory errors
--concurrency to lower GPU memory usage