Back to Pyrefly

Pyrefly Benchmarks

scripts/benchmark/README.md

0.63.15.7 KB
Original Source

Pyrefly Benchmarks

Benchmarking scripts for comparing pyrefly against other Python type checkers.

Ported from https://github.com/lolpack/type_coverage_py

Type Checking Speed Benchmark

Measures wall-clock execution time and peak memory usage of type checkers (pyright, pyrefly, ty, mypy, zuban) across popular open-source Python packages.

Prerequisites

Make a venv and install the type checkers you want to benchmark:

bash
pip install pyright mypy pyrefly ty zuban

Usage

bash
# Run all checkers on all packages in install_envs.json
python3 typecheck_benchmark.py

# Run only pyrefly and mypy on the first 5 packages
python3 typecheck_benchmark.py -c pyrefly mypy -p 5

# Run specific packages with 10 measured runs and 2 warmup runs
python3 typecheck_benchmark.py -n requests flask django -r 10 -w 2

# Save results to a custom directory
python3 typecheck_benchmark.py -o ./my_results --os-name macos

# Use a custom install_envs.json
python3 typecheck_benchmark.py --install-envs /path/to/install_envs.json

Options

FlagDescription
-p, --packages NMax number of packages to benchmark
-n, --package-names NAME [NAME ...]Specific package names to benchmark
-c, --checkers NAME [NAME ...]Type checkers to run (default: all five)
-t, --timeout SECSTimeout per checker invocation (default: 300s)
-r, --runs NMeasured runs per checker per package (default: 5)
-w, --warmup NWarmup runs to discard before measuring (default: 1)
-o, --output DIROutput directory for JSON results
--os-name NAMEOS name for output filename (e.g., macos)
--install-envs PATHPath to install_envs.json

Output

Results are saved as JSON in results/ (or the directory specified by -o). Each run produces a dated file (e.g., benchmark_2025-03-17.json) and a latest.json symlink.

LSP Benchmark (Go to Definition)

Measures textDocument/definition latency across LSP servers. For each run, the script picks a random Python file and identifier, starts each server, and times the Go to Definition response. All servers are run in parallel on the same position for fair comparison.

lsp_benchmark.py has two modes:

  1. Multi-package mode (--install-envs) — auto-clones packages from install_envs.json and benchmarks each one (recommended).
  2. Single-repo mode (--root) — benchmarks an already-cloned repo.

Prerequisites

The LSP servers must be available as commands. Typical commands:

  • pyrefly: pyrefly lsp
  • ty: ty server
  • pyright: pyright-langserver --stdio
  • zuban: zubanls
bash
# Run all checkers on all packages (auto-clones from GitHub)
python3 lsp_benchmark.py --install-envs install_envs.json

# Run only pyrefly on 3 packages with 10 runs each
python3 lsp_benchmark.py --install-envs install_envs.json -c pyrefly -p 3 -r 10

# Run specific packages
python3 lsp_benchmark.py --install-envs install_envs.json -n requests flask django -r 20

# Save results to a custom directory
python3 lsp_benchmark.py --install-envs install_envs.json -o ./my_results --os-name macos

Usage (single-repo mode)

bash
# Benchmark only pyrefly with JSON output
python3 lsp_benchmark.py \
    --root ~/projects/some-python-repo \
    --servers pyrefly \
    --pyrefly-cmd "pyrefly lsp" \
    --runs 20 \
    --json results.json

# Reproducible run with a fixed seed
python3 lsp_benchmark.py \
    --root ~/projects/some-python-repo \
    --pyrefly-cmd "pyrefly lsp" \
    --seed 42 --runs 50

# All four servers
python3 lsp_benchmark.py \
    --root ~/projects/some-python-repo \
    --pyrefly-cmd "pyrefly lsp" \
    --ty-cmd "ty server" \
    --pyright-cmd "pyright-langserver --stdio" \
    --zuban-cmd "zubanls" \
    --runs 20

Options

FlagDescriptionMode
--install-envs PATHPath to install_envs.json (enables multi-package mode)multi
-p, --packages NMax number of packages to benchmarkmulti
-n, --package-names NAME [NAME ...]Specific package names to benchmarkmulti
-c, --checkers NAME [NAME ...]Type checkers to benchmark (default: all four)multi
-o, --output DIROutput directory for JSON resultsmulti
--os-name NAMEOS name for output filenamemulti
--root PATHRepo root / workspace folder (default: cwd)single
--servers LISTComma-separated server names (default: auto-detect from provided commands)single
--pyrefly-cmd CMDCommand to start pyrefly LSPsingle
--ty-cmd CMDCommand to start ty LSPsingle
--pyright-cmd CMDCommand to start pyright LSPsingle
--zuban-cmd CMDCommand to start zuban LSPsingle
--traceVerbose LSP wire trace to stderrsingle
--json PATHWrite JSON report to filesingle
--settings-json JSONJSON settings to send via workspace/didChangeConfigurationsingle
--pyright-disable-indexingDisable Pyright background indexingsingle
-r, --runs NRuns per package (default: 100 multi, 1 single)both
-s, --seed NRNG seed for reproducibilityboth
--timeout SECSPer-request timeout (default: 10s)both

Output

Prints a summary table to stdout. With --json (single-repo) or -o (multi-package), writes a detailed JSON report including per-run results, latency percentiles (p50, p95), and per-server ok/found/valid rates.

Package List (install_envs.json)

The install_envs.json file defines which packages to benchmark in the type checking benchmark, including:

  • GitHub URLs for cloning
  • Subdirectories to type-check (check_paths)
  • Whether to pip install -e . the package (install)
  • Additional pip dependencies (deps)