ARCHITECTURE.md
There are a number of choices when writing a Python type checker. We are taking inspiration from Pyre1, Pyright and MyPy. Some notable choices:
def foo(x): return True would result in something equivalent to had you
written def foo(x: Any) -> bool: ....[] to however it is used first, then fix it
after. For example xs = []; xs.append(1); xs.append("") will infer that
xs: List[int] and then error on the final statement.x: int = 4 will both know
that x has type int, but also that the immediately next usage of x will
be aware the type is Literal[4].Pyrefly is split into a number of crates (mostly under crates/):
pyrefly_util are general purpose utilities, which have nothing to do with
Python or type checking. Examples include IO wrappers, locking, command line
helpers etc.pyrefly_derive are proc-macros for deriving traits such as TypeEq and
Visit.pyrefly_python are Python utilities with no type-checking aspects, such as
modelling modules or sys.info.pyrefly_graph provides utilities for indexing values and caching
computations that may depend on each other.pyrefly_bundled are the third-party
typeshed stubs.pyrefly_config defines the Pyrefly configuration, along with support for
reading Mypy/Pyright configuration.pyrefly_types defines the Pyrefly type along with operations on it.pyrefly_wasm defines the sandbox code that compiles to WASM.pyrefly itself is the type checker and everything else.There are many nuances of design that change on a regular basis. But the basic substrate on which the checker is built involves three steps:
import *
statements transitively.If we encounter unknowable information (e.g. recursion) we use Type::Var to
insert placeholders which are filled in later.
For each module, we solve the steps sequentially and completely. In particular, we do not try and solve a specific identifier first (like Roslyn or TypeScript), and do not use fine-grained incrementality (like Rust Analyzer using Salsa). Instead, we aim for raw performance and a simpler module-centric design - there's no need to solve a single binding in isolation if solving all bindings in a module is fast enough.
Given the program:
1: x: int = 4
2: print(x)
We might produce the bindings:
define int@0 = from builtins import intdefine x@1 = 4: int@0use x@2 = x@1anon @2 = print(x@2)export x = x@2Of note:
define (the definition of something), use (a
usage of a thing) and anon (a statement we need to type check, but don't
care about the result of).export keys and import
values.@line, but in reality it's the byte offset
in the file).VarGiven the program:
1: x = 1
2: while test():
3: x = x
4: print(x)
We end up with the bindings:
x@1 = 1x@3 = phi(x@1, x@3)x@4 = phi(x@1, x@3)The expression phi is the join point of the two values, e.g. phi(int, str)
would be int | str. We skip the distinction between define and use, since
it is not necessary for this example.
When solving x@3 we encounter recursion. Operationally:
x@3.x@1.x@1 to be Literal[1]x@3. But we are currently solving x@3, so we invent a
fresh Var (let's call it ?1) and return that.x@3 must be Literal[1] | ?1.?1 was introduced by x@3 we record that ?1 = Literal[1] | ?1. We
can take the upper reachable bound of that and conclude that
?1 = Literal[1].x@3 to just Literal[1].