docs/types/inference-and-checking.md
In spite of being dynamically-typed language, Enso is built with a sophisticated type checker capable of reasoning about Enso typed system. However, a type checker on its own is quite useless. For Enso to truly be usable, it must also have a powerful type inference engine.
<!-- MarkdownTOC levels="2,3" autolink="true" --> <!-- /MarkdownTOC -->A prototype of a type checker has been developed that does not try to reach all the design goals written below, but tries to perform best effort type checking to provide lints and warnings to developers to aid in development.
As static type checking isn't needed for Enso execution, the type checker is
only enabled if an --enable-static-analysis flag has been passed to the
compiler.
To try out the type checker you may run
./built-distribution/enso-engine-0.0.0-dev-windows-amd64/enso-0.0.0-dev/bin/enso --compile /path/to/Project --enable-static-analysis
Any issues found by the type checker are reported as compilation warnings. They are not treated as errors for two reasons: firstly, because the type checker is still a prototype and so it should not block programs in case there is a mistake in its logic, and secondly, because the program can still run - if the given problematic block of code that is guaranteed to error is not reached during execution it may not cause problems.
Currently the type checker reports the following kinds of warnings:
function (x : Integer) = ... with an argument of type text:
function "Txt",x.method_1 and the type
of x is not Any and does not define method called method_1,The type mismatch error is not reported for a type-asserting expression, because it can still be satisfied by a hidden intersection type. See the documentation about intersection types for more information.
Enso has integrated some form of dynamic type checking by implementing the argument type checks and type assertions which are checked at runtime. However, non-negligible amount of code still depends on more 'dynamic' dispatch. Moreover, Enso allows interoperability with external inherently dynamically-typed languages like JavaScript or Python, so there are cases where the types of values cannot really be known 'statically'.
To alleviate in this, the type inference and checking are implemented in a gradual, best-effort, basis.
The type inference algorithm tries to infer the types wherever it is possible, but it is designed to give up in a graceful way. Static type errors are only reported if the type checker can 'prove' that a given operation will surely fail at runtime (if the piece of code is reached). If the operation may fail or succeed, no errors are reported.
This makes Any a special type. In terms of the subtyping relationship it is a
top type, however in terms of the type
checker it is a bit closer to the
bottom type - because a value of
type Any can be of any particular type, that means no error will be reported
because there is no way to guarantee a failure at runtime. Arguments of type
Any can be passed to methods expecting all kinds of types (then, they can fail
at runtime, but there is no way to prove a guaranteed failure statically), and
all methods can be called on Any (not only those defined on Any, as the
actual value passed in can have the given method defined). It behaves similarly
to
any type in TypeScript
or dynamic type in Kotlin.
Because of that, the Any type can be used to represent unexpected return
values from polyglot calls.
Enso approach to intersection types allows for a value to have multiple different types, some of which are hidden, adds some challenges to the type checking. The current state (where the hidden part of the type can only be uncovered via an explicit cast) provides good balance between the ability to work with intersection types in the interactive GUI (that is capable of inserting the necessary casts) and pass them through various checked methods (which may hide the extra parts, but still being able to uncover them) and the capabilities of static analysis of types. Good balance is namely achieved because of:
y = x:T
narrowing type check that
check the type at run-time and allow to 'uncover' the hidden parts of the
type. They are treated as an instanceof check, so they are not validated by
static analysis but instead serve as evidence that if the code continues
execution, then indeed y must now have type T.x : Integer&Any check, it is no longer possible to report any
type mismatch warnings. Type inference knows the value must conform to
Integer, but it cannot rule out that x was not created as
Integer & Text. Therefore the static analysis must assume that every value
can have any combination of types "in its &Any part".Any changes to the related logic must be done very carefully as it is very easy to introduce a change that will introduce a 'collapse' of the type system that would make it unable to report any useful warnings.
The type inference relies on existing type signatures and type assertions. Since
function argument types are checked at runtime, the type checker treats them as
assertions that an incoming value is of a given type. Similarly, code following
a type assertion inside an expression (y = x : T, or (x:T).method) relies on
the fact that the control flow only proceeds if that assertion was satisfied.
The processing is performed by traversing the IR of each method body from
terminal IR nodes upwards. First types of the leafs - literals or variables -
are inferred, and then based on their types, the type of more complex 'nodes'
(e.g. function application) is derived based on very basic inference rules that
stem from the
simply typed lambda calculus
with some additional extensions for calling methods on objects and other
features of the language like pattern matching. Inside of blocks of code, the
inference is run line-by-line; whenever the assignment operator = creates a
new binding, the inferred type of that binding is saved in a mapping that can
then be used by subsequent lines of the block that can reference that binding.
Calling of methods on objects is implemented via special logic for application
on UnresolvedSymbol. That is because within the IR, the expression a.f b c
is actually translated to UnresolvedSymbol<f> a b c.
Similar desugaring is also applied to standalone methods and atom constructors:
method arg1 arg2 is
translated into UnresolvedSymbol<method> <internal-module-ref> arg1 arg2,
where <internal-module-ref> is a special variable that resolves to the
module in which the method is defined,Atom.Constructor arg1 arg2 becomes
UnresolvedSymbol<Constructor> Atom arg1 arg2.The most important rules for inference are then:
NameResolutionAlgorithm to either a local
binding (and then the type is read from the current LocalBindingsTyping), a
global type/module reference, or an unresolved symbol - in which case it gets
assigned a special UnresolvedSymbol type that is then used in application,f x is processed by looking up the types of f and
x; four possibilities are considered:
f has a function type A -> B - then the type of x is checked for
compatibility with A (this may yield type mismatch warnings) and the
type of the application expression becomes B,f has type UnresolvedSymbol<f> - then the method is looked up (using
MethodTypeResolver) in the scope associated with the type of x; the type
of such application expression is the full type of the referenced method,
f has type Any - then the type of the application expression becomes
Any, as the type of f is not known,f has some non-functional type - then a not invokable warning is
reported.x = e is processed by inferring the type of e and storing it in
the current LocalBindingsTyping under the name x,(a:A)-> b-> e gets its type by first adding the types of
arguments based on their ascriptions to the LocalBindingsTyping (an argument
with no ascription is treated as unknown - Any) and then inferring the type
of e in context that includes these additional bindings; then the type of
the whole expression is built by combining the types of the arguments with the
return type,Text, Integer, Float or Vector,case of expression, similarly to a lambda, the bindings
introduced by the patterns in each case are added to the LocalBindingsTyping
for each case's expression; the type of the whole expression is built by
computing the union of types inferred for each of the branches.These rules are implemented in TypePropagation::tryInferringType and its
related helper methods.
The implementation consists of three phases:
TypeInferenceSignatures analyzes the IR of method definitions and
associates a type signature with each method based on its type ascriptions.
If a method has no type ascriptions, the argument types and the return type
will default to unknown type (Any) but the method will still get a
signature at least indicating the arity of the method.StaticModuleScopeAnalysis builds a static counterpart of ModuleScope that
holds method definitions available in scopes of each module. The static
module scope is then used to resolve methods on types during the static
analysis, in the same way as ModuleScope is used at runtime for method
dispatch. i. The BuildScopeFromModuleAlgorithm and
MethodResolutionAlgorithm encapsulate the core logic, so that the same
logic is guaranteed to be used both at runtime and compile-time.TypeInferencePropagation which analyzes each method, tries to infer types
of each sub-expression and report any issues found. While a type is inferred
for every sub-expression inside a method body, the types are stored in
metadata only for the named bindings to conserve memory. The rationale is
mostly that the types of named bindings are worth storing as in the future
they could be used for features such as auto-complete.More information can be found in the documentation of the relevant classes.
Currently, the algorithm has several places that simply were not yet finished:
TypeRepresentation.ArrowType to hold the name
of each argument and a flag indicating if it has a default value; then the
logic of CurryNode and its relatives needs to be ported to the compiler to
support the argument reordering, deciding when to use default arguments
during the method call and when not to do it (e.g. not all arguments are
provided, or the ... operator is used to explicitly stop the default
application).Private_Access violations statically.
StaticModuleScopeAnalysis pass does not record if a given
method or constructor (or the whole module) is marked as private. This
property should be recorded, and if a private method is called from an
outside module, a warning about the private access could be reported also
statically.Integrating the type checker with the VS Code extension for Enso can lead to vastly improved developer experience. The inferred types of bindings can be used to offer some form of method autocomplete and warnings could be displayed inline inside the code editor.
Vector aEnso's type system is designed to allow for polymorphic types. While a full
implementation may be quite complicated to implement efficiently, doing that for
a specific case of Vector can be a good first step as it allows to use some
'tricks' to 'cheat' and get the desired result for relatively cheap.
It has been suggested that, upon construction, a Vector could compute the most
general type of its elements and store it inside its metadata. Then in runtime,
checking if a value v satisfies the type Vector Integer would merely require
checking if v is a Vector, reading its metadata to see the type the elements
satisfy and checking if that type is a subtype of Integer. This is much
cheaper than having to iterate over all elements of the vector.
There are still some challenges to overcome:
The current approach for inference is very simple, but can already provide basic checking helpful during development. If types of method arguments are known and method definitions on types have checked signatures, then most expressions can be inferred and the type checker can provide warnings in case the inferred types do not match the expectations, allowing the developer to find bugs before running the program.
However, the inference algorithm has been created with simplicity and checking
for program correctness in mind, so the inference does not work with ambiguity.
If function arguments do not have checked types, they are treated as Any and
very little information can be derived from them. Currently, the algorithm does
not try to do bidirectional inference that would try to guess what the types of
the function 'should be' based on how they are used.
In the future, one could try to implement more powerful inference that treats
un-annotated function arguments as type-variables and tries to propagate them
through the data flow, recording any constraints induced by method calls. Then
trying to infer what the type of the argument 'should' be to satisfy the
gathered constraints (the simplest form of these constraints stemming from
checking argument types is unification of type variables, but calling methods on
objects may give raise to other kinds of constraints (ones like 'type that has
method f defined on it with arity N')). In such case, a distinction may need
to be made between function arguments that are 'checked' and ones that have an
inferred type, but it is not checked at runtime. It is also unclear how the
unorthodox approaches like Enso's approach to intersection types would play with
solving these kinds of constraints.
Yet another big area for improvement is allowing for type polymorphism in
methods - being able to change the signature of map from
Vector -> (Any -> Any) -> Vector into Vector a -> (a -> b) -> Vector b would
allow for much more powerful type inference.
Such improvements in type inference may require rather fundamental changes from the current relatively simple 'propagation' algorithm.
[!WARNING]
The actionables for this section are:
- Work out how on earth we do inference and how we maximise inference power.
- Do we want to provide a way to reason about the runtime representation of types? This is 'Levity Polymorphism' style.
- We want error messages to be as informative as possible, and are willing to retain significant extra algorithmic state in the typechecker to ensure that they are. This means both formatting and useful information.
- It is going to be important to retain as much information as possible in order to provide informative error messages. This means that the eventual algorithm is likely to combine techniques from both W and M (context-insensitive and context-sensitive respectively).
In order to make Enso's type inference as helpful and friendly as possible to our users, we want the ability to infer the maximal subset of the types that Enso can express.
[!WARNING] The actionables for this section are:
- How do we do inference for higher-rank and impredicative instantiations.
- How do we infer contexts, and how do we make that inference granular (e.g.
IO.Read,IO.Write, rather than justIO).- How do we propagate inference information as far as possible?
- If it comes to a tension between typechecker speed and inference capability, Enso will err on the side of inference capability in order to promote ease of use. Speed will be increased by performing incremental type-checking where possible on subsequent changes.
- Where are we okay requiring annotations? Polymorphic recursion, higher rank function parameters, constrained data and dependency?
[!WARNING] The actionables for this section are:
- Specify the inference algorithm.
[!WARNING] The actionables for this section are:
- Specify how (if at all) we can infer dependent quantifiers.
[!WARNING] The actionables for this section are:
- Specify the type checking algorithm.