docs/runtime-roadmap.md
This roadmap consists of longer, open-ended tasks that are required to make Enso better in the long term. The tasks here are not in any order that indicates priority, but the dependencies between tasks are described.
Enso interpreter is written in a mixture of Scala and Java. Scala was originally used due to the capabilities of its type system in comparison to Java's. Modern Java (as provided by JDK 21 or Frgaal compiler) meets most of the needs too. The ultimate goal is to write everything in Java and also keep up with most recent long term supported JDK/GraalVM releases.
Enso is a fairly dynamic language, but this doesn't mean that it doesn't admit static analysis. There are a number of areas that can be made better (read: more intuitive, more performant, and so on). These, again, are not in order of priority, but where there are dependencies these are indicated.
The current compiler IR is a bit of a mess. Due to time constraints, we ended up moving on with it though it was firmly unsuited to the direction we wanted to evolve the compiler. While many of the features listed below are possible in the current IR, they are difficult and inelegant compared to doing them on an IR suited to the task.
Currently, the IR is:
A new IR for Enso would have to:
While it is a daunting task to wholesale move the entire compiler to a new IR,
it can instead be done in an incremental fashion. First, it makes sense to
design and build the new IR, and then write a translation from the current IR to
the new IR. With that done, the boundary between the two in the compiler can be
gradually shuffled, starting with codegen (IrToTruffle), until no usages of
the old IR remain.
If it were up to us, we'd consider basing the new IR on a mutable graph as this easily admits many common compiler operations, and also is likely to reduce memory usage of the compiler overall. Care should be taken with introducing mutability, however. While the current IR is mutable in limited ways (primarily the metadata on nodes), a fully mutable IR will have to have comprehensive utilities for deep copying and dealing with cycles. That said, Marcin thinks that it may be worthwhile to stick to an immutable structure.
These two approaches offer a trade-off in terms of what they make easy. While it's very easy to reason about tree-like structures (within a module), it makes certain operations (e.g. Alias Analysis) more painful than they would otherwise be (we had to make a graph on top of the tree to get this working).
Unreliably, we can guestimate at:
Though we're not suggesting moving to a fully-type-checked language any time soon, the current system doesn't make use of most of the information contained in the type signatures. This should involve:
While you do not need to update the IR to do this analysis and subsequent optimisation, it would certainly make many of them easier. If you are writing more passes on top of the old IR, it's just piling on technical debt. Please be aware of this.
With improved static analysis capabilities, we gain the ability to do lots more optimisations statically.
There are multiple points in the language where we create new scopes where this isn't strictly necessary. Eliminating these extra scopes eliminates the need for allocations and dynamic calls.
if-then-else. Rather than
inserting a function call for each branch, we can hoist (with renaming)
variables into the same scope. This means we don't need to perform a function
call or allocate a new scope.For simple programs, GraalVM can usually optimise these additional scopes away. However, doing this flattening process removes the need to optimise these things and may actually admit more optimisations (claim unverified). This means that we think Graal will spend more time optimising the parts of the programs that matter.
Currently we don't perform any optimisation when desugaring nested pattern matches. This means that the IR (and resultant generated truffle code) is far larger than it needs to be.
if branches need to occur
to resolve the actual target function of the pattern match.Currently Enso keeps every variable alive for as long as it's in scope. This means that we have two major pitfalls:
While we originally proposed to perform scope pruning when capturing variables in closures, a far more sensible approach is to perform liveness analysis:
Frame#clear)
for informing GraalVM about this for increased performance in compiled code.Debug.breakpoint,
Debug.eval may be used in this code. Under such circumstances, all in-scope
variables should be retained for the duration of the call.Note that scope pruning could still be a win in rarer circumstances, but is not needed for the majority of improvement here.
There are multiple features in Enso that generate dynamic calls that do not
always need to (e.g. when the concrete type of an atom is known at compile time,
its accessors can be inlined, or when the types of a is known in a + b are
known, we can devirtualise the + implementation that specializes based on the
type of b. If we know the type of b we can do even better and compile the
specific add implementation). In conjunction with the
better static analysis it should
become possible to devirtualise multiple types of calls statically, and allow
you to inline the generated code instead.
We recommend a combination of the two, using the latter for non-introspected scopes, and the former for scopes being observed by the IDE. That said, if the first brings enough of a win, there may be little point to the second.
While Enso is fairly semantically complete, there are still a number of things that have proven awkward to work with.
Enso has a concept of extension methods. These are methods that are not defined "alongside" the type (in the same compilation unit). Currently, we have no way to define methods that are not extensions on builtin types without defining them in Java. This is awkward, and leads to a poor experience for both developers of Enso, and the users (where there is a special case rule for certain types, and also a hacky form of documentation for these same types).
For types defined in Java, their methods defined in Enso are extensions and are
hence not available without importing Base. Currently if I have a Text and
don't have Base imported, I can't call split on it as it's an extension.
This is particularly important for polyglot, as polyglot calls are not handed extension methods. Polyglot calls only have access to the methods defined on the type.
To rectify this situation, we recommend implementing a system we have termed "shadow definitions":
Builtins.enso.Vector and Time are currently defined in Base, and are
therefore not (Truffle) interop friendly. With this system, we could implement
these types in such a way that they can be handled properly in interop, making
it much more seamless to use them with other truffle languages.With this done, it may still be necessary to create a Java DSL for implementing built-in methods and types, but that is unclear at this point.
While Enso is performant when it gets JITted by GraalVM, the performance when running in purely interpreted mode is poor. That said, there are still performance improvements that can be made that will benefit compiled code as well.
This can be greatly improved.
HashMap and similar implementation decisions can easily
be improved.As Enso's primary mode of use is in the IDE, there are a number of important improvements to the runtime and compiler that will greatly improve the user experience there.
Currently it is virtually impossible to define types for users in the IDE. This is due to a semantic issue with the IDE's value cache. When defining a type and creating an instance of it, the value of that instance is cached. When later defining a method on it, the cached value is retained with the old scope.
See #1662 for more details and options.
Currently, the IDE cache is fairly dumb, maintaining soft references to as many in-scope values as possible. When memory runs out, the entire cache gets evicted, which is costly.
Currently, IDE visualizations are evaluated eagerly on their candidate data. This is a nightmare when working with huge amounts of data (e.g. tables with millions of rows), and can easily lock up both the runtime and IDE. The current solution artificially limits the amount of data sent to the IDE.
In the future, we want to support the ability to cache inside visualization code such that the preprocessor doesn't have to be recomputed every time the IDE changes the parameters. This will enable the ability to view the full data in the IDE without having to send it all at once, or recompute potentially costly preprocessors.