docs/style-guide/scala.md
Like many style guides, this Scala style guide exists for two primary reasons. The first is to provide guidelines that result in a consistent code style across all of the Enso codebases, while the second is to guide people towards a style that is expressive while still easy to read and understand.
In general, it aims to create a set of 'zero-thought' rules in order to ease the programmer burden; there is usually only one way to lay out code correctly.
<!-- MarkdownTOC levels="2,3" autolink="true" --> <!-- /MarkdownTOC -->This section explains the rules for visually laying out your code. They provide a robust set of guidelines for creating a consistent visual to the code.
Primary formatting is dealt with through use of the Scala formatting tool
scalafmt, which enforces rules around
whitespace, line-wrapping, and alignment. The Enso repository contains the main
.scalafmt.conf configuration file, and this is what
should be used for all new Scala projects.
All files must be formatted using scalafmt before commit, and this should be
set up as either a precommit hook, or using the integration in IntelliJ. If you
use the IntelliJ integration, please note that you need only have the official
Scala Plugin
installed, and be using IntelliJ 2019.1 or later. You should not use the
independent Scalafmt plugin.
Enso has some fairly simple general naming conventions, though the sections w may provide more rules for use in specific cases.
UpperCamelCase.camelCase.httpRequest or makeHTTPRequest.a and b should only be used in contexts where
there is no other appropriate name, and should never be used to refer to
temporary data in a function.def head[T](ts: List[T]): T, which fails if the list is empty),
must be named using the word 'unsafe' (e.g. unsafeHead). For more
information on unsafe function usage, see the section on safety.
The one exception to this rule is for functions which fail intentionally on a
broken implementation (e.g. "should not happen"-style fatal crashes).Enso follows the
Java convention for naming packages:
package name components may contain only lower case characters and, if
necessary, an underscore character. All Enso package names should be prefixed
with org.enso. For example, the package for implementation of File Manager
project should be named org.enso.filemanager.
When the name of the file in the package is the same as the final component of
the package name, the file should be moved one level up. For examples, if
File Manager project contains FileManager.scala file, then the file should
be placed directly in the org.enso package instead of org.enso.filemanager.
This is to avoid repetitious constructs like org.enso.filemanager.FileManager.
In order to produce as flexible a codebase as possible, we tend not to make use
of access modifiers in our code (public, private, and so on). Instead, we
use the concept of Internal modules to separate public from private.
If you are writing code in a package X.Y.MyType and would like to signal that
a particular construct (e.g. a function) is for internal use in that package,
you should create a X.Y.MyType.Internal package. You can then write the
relevant language construct in that package instead of the source package.
There are, however, a few notable exceptions to the above:
private and private[this]) should be
used to enforce an API contract around safety. An example of this is Scala's
immutable List, which contains a private mutable buffer for performance
reasons.Internal module is a separate module, there
can (under some circumstances) be some overhead for its use. If you are
writing code on a performance-critical path, you may instead make use of
access modifiers.All Scala projects in the Enso organisation should manage their dependencies and build setup using SBT.
If you are using IntelliJ, please ensure that you select to use the SBT shell for both imports and builds.
Comments in code are a tricky area to get right as we have found that comments often expire quickly, and in absence of a way to validate them, remain incorrect for long periods of time. In order to best deal with this problem, we make the keeping of comments up-to-date into an integral part of our programming practice while also limiting the types and kinds of comments we allow.
Comments across the Enso codebases fall into three main types:
When we write comments, we try to follow one general guideline. A comment should explain what and why, without mentioning how. The how should be self-explanatory from reading the code, and if you find that it is not, that is a sign that the code in question needs refactoring.
Code should be written in such a way that it guides you over what it does, and comments should not be used as a crutch for badly-designed code.
One of the primary forms of comment that we allow across the Enso codebases is the doc comment. We use these comments to document the public API of a module, as defined in The Public API. For constructs that are part of the public API, the following should be documented:
trait T and
object T), you need only document it once. Under these circumstances it is
recommended to document the in the order of: trait, class, object.An example of a valid set of comments is as follows:
package org.enso.syntax.graph
/** An [[Action]] is a representation of an operation that can be made on a
* [[SpanTree]].
*/
sealed trait Action
object Action {
object Insert extends Action
object Erase extends Action
object Set extends Action
}
/** Values representing sets of [[Action]]s at a given point.
*/
object Actions {
val All: Set[Action] = Set(Action.Insert, Action.Erase, Action.Set)
val Function: Set[Action] = Set(Action.Set)
val Root: Set[Action] = Set(Action.Set)
/** Makes a set from the provided actions.
*
* @param actions a variable number of actions
* @return a set containing the provided actions
*/
def mkActionSet(actions: Action*): Set[Action] = {
// ...
}
}
Documentation comments are intended for consumption by the users of the API, and are written using the standard scaladoc syntax. Doc comments should contain:
@param
annotations. If one or more parameters require explanation (for things not
expressed in their name or type), then all parameters must be annotated.An example comment that requires a description is as follows (but omits the
necessary comment on Tree for brevity):
trait Tree[T}] {
/** Provides a sequence representation of the tree.
*
* The function provides configurable behaviour for the order in which the
* tree is walked. See [[WalkStrategy]] for the provided options.
*
* @param order the strategy by which the tree's elements are traversed
* @return the elements contained in the tree arranged according to the
* provided `order`
*/
def walkToSequence(order: WalkStrategy[Tree[T]]): Seq[T]
}
A simpler example that does not require a description is as follows (but omits
the necessary comment on Tree for brevity):
trait Tree[T] {
/** Provides a sequence representation of the tree.
*
* @return the elements of the tree arranged in preorder-walk sequence
*/
def toSeq(): Seq[T]
}
Documentation comments should not reference internal implementation details, or be used to explain choices made in the implementation. For this kind of info, you should use Source Notes as described below.
You may document more than what is specified here, but this is the minimum required for acceptance at code-review time.
Source Notes is a mechanism for moving detailed design information about a piece of code out of the code itself. In doing so, it retains the key information about the design while not impeding the flow of the code. They are used in the following circumstances:
Source notes are detailed comments that, like all comments, explain both the what and the why of the code being described. In very rare cases, it may include some how, but only to refer to why a particular method was chosen to achieve the goals in question.
A source note comment is broken into two parts:
// Note [Note Name], where
Note Name is a unique identifier across the codebase. These names should be
descriptive, and make sure you search for it before using it, in case it is
already in use./* ... */, and the first line names the
note using the same referrer as above: /* Note [Note Name]. The name(s) in
the note are underlined using a string of the ~ (tilde) character.A source note may contain sections within it where necessary. These are titled
using the following syntax: == Note [Note Name (Section Name)], and can be
referred to from a referrer much as the main source note can be.
Sometimes it is necessary to reference a source note in another module, but this should never be done in-line. Instead, a piece of code should reference a source note in the same module that references the other note while providing additional context to that reference.
An example, based on some code in the GHC codebase, can be seen below:
{
def prepRHS (env : SimplEnv, outExpr : OutExpr) : SimplM[SimplEnv, OutExpr] = {
val (ty1, _ty2) = coercionKind(env) // Note [Float Coercions]
if (!isUnliftedType(ty1)) {
val newTy1 = convertTy(ty1) // Note [Float Coercions (Unlifted)]
...more expressions defining prepRHS...
}
}
/* Note [Float Coercions]
* ~~~~~~~~~~~~~~~~~~~~~~
* When we find the binding
* x = cast(e, co)
* we'd like to transform it to
* x' = e
* x = cast(x, co) // A trivial binding
* There's a chance that e will be a constructor application or function, or
* something like that, so moving the coercion to the usage site may well cancel
* the coercions and lead to further optimisation.
* ...more stuff about coercion floating...
*
* Note [Float Coercions (Unlifted)]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ...explanations of floating for unlifted types...
*/
}
We follow a simple convention for TODO comments in our codebases:
TODO or FIXME.[ARA], or for multiple people
[ARA, MK], in square brackets.For example:
{
// TODO [ARA] This is a bit of a kludge. Instead of X it should to Y, accounting
// for the fact that Z.
}
There are, of course, a few other situations where commenting is very useful:
Any good style guide goes beyond purely stylistic rules, and also talks about design styles to use in code.
While we often have to write complex functionality, we want to ensure that the code itself is kept as simple and easy to read as possible. To do this, please use the following rules:
It is incredibly important that we can trust the code that we use, and hence we tend to disallow the definition of unsafe functions in our public API. When defining an unsafe function, you must account for the following:
unsafeX, as mentioned above in naming.Furthermore, we do not allow for code containing pattern matches that can fail.
New code should always be accompanied by tests. These can be unit, integration, or some combination of the two, and they should always aim to test the new code in a rigorous fashion.
Any performance-critical code should also be accompanied by a set of benchmarks. These are intended to allow us to catch performance regressions as the code evolves, but also ensure that we have some idea of the code's performance in general.
Do not benchmark a development build as the data you get will often be entirely useless.
In general, we aim for a codebase that is free of warnings and lints, and we do this using the following ideas:
New code should introduce no new warnings onto main. You may build with warnings on your own branch, but the code that is submitted as part of a PR should not introduce new warnings. You should also endeavour to fix any warnings that you come across during development.
Sometimes it is impossible to fix a warning (often in situations involving the use of macros). In such cases, you are allowed to suppress the warning locally, but this must be accompanied by a source note explaining why you are doing so.