rfcs/001-js-grammar.md
BinaryExpression left and right and not expression and expression. Ungrammar supports the label: Type` syntax. All we need to do is to respect the naming when generating the fields.==, +, -, .... That's why it's defined as operator: ( '==' | '+' | '-' ...). The source gen should only generate a single field for the operator that returns a SyntaxTokenJsClassMemberName = JsObjectMemberName | JsPrivateClassMemberName where JsObjectMemberName is an union type as well. This is convenient to maintain the grammar but makes the Facade more awkward to use because it requires two matches: first on the outer union and then on the inner union. We can avoid this if we flatten unions inside of the source gen and automatically generate From<InnerUnion> implementations.Js extensions from field names to reduce the need for explicit labels. For example, ExpressionStatement must use an explicit label for its inner expression so that it isn't named js_expression.JsElement = ..., JsNode = ..., JsToken = ...Js to avoid conflicts with other languagesJs if they are not representing a specific character sequence. For example, prefix js_string because different languages use different quotes for strings but don't prefix ; or true because these are "pure" character sequences. true might be somewhat controversial because different languages use different casing TRUE vs true vs True. I don't think the casing is relevant. All these represent a true token and the corresponding parser shouldn't generate TRUE for a language expecting true and each language-impl can provide helpers to build common tokens.It's common in languages to have a sequence of elements that all are separated by a specific separator. Examples in JavaScript where elements are comma separated are:
There are two ways how we can model this. I'll use a simplified version of JS arrays that ignores array holes.
Wrap elements in a node
One option is to create a new node representing an ArrayElement that contains the value and an optional trailing comma.
JsArray = '[' elements: JsArrayElement* ']
JsArrayElement =
value: JsArrayElementValue
trailing_comma: ','?
JsArrayElementValue = JsExpression | JsSpread
Creating a new node has a few downsides:
Element type?I saw that Roslyn' uses a SeparatedList. It's just a nice view over a list of children and I think we can adopt a similar approach in the AST Facade by returning AstSeparatedChildren for such element lists.
AstSeparatedChildren is a view over a range of children that alternate between element and separator. It may provide the following methods:
nodes() -> Iterable<Result<T>>: Returns the nodes inside of the list. The return type is a result in case a node is missing between two separatorsseparators() -> Iterable<SyntaxToken>: returns all the separators. There should be no need for an Option or Result because the parser never starts a new element if a separator is missingelements() -> Iterable<(Result<T>, Option<SyntaxToken>)>: Returns a list of elements with their separator, for example for use inside of the formatter. The separator is optional because the last element may or may not have a trailing separator.We could add additional methods for mutating a separated list, querying a specific node/separator/element etc.
I believe this gives us the same advantages as having distinct elements without the repetition:
JsArray = '[' elements: (JsArrayElement (',' JsArrayElement )* ','?) ']
JsArrayElement = JsExpression | JsSpread
We should even be able to automatically generate the proper return type on the field by testing whenever we see the (T (',' T)* ','?) pattern inside of the grammar, something that rust-analyzer does too already does.
Should we call object and class members Members or Properties?
MemberExpression has an object and property field today which feels inconsistent. Shouldn't property be called member or should it be a PropertyExpressionThe main downside I see is that property is a commonly used term in the JS ecosystem. Changing property to member could be an entry barrier for people familiar with the term property.
How strict should our AST API be?
BlockStatements or should we have separate FunctionBody and BlockStatement nodes where only FunctionBody allows directives?script allow module statements inside of the body?module have an explicit field for import statements at the top?There's probably no correct answer to this question but we can favour one over the other.
A strict AST has the benefit that the mutation API prevents users from constructing invalid trees but it comes at the cost that our parser must fall back to Unknown* nodes more often and a more complex AST.
I would recommend being more lenient so that the parser can flag the use of import/export inside a script but we can still provide auto-completion on imports/exports before the user converts the script to a module. But it's probably something we must decide case by case but would love to hear your thoughts.
Having fine granular nodes has the advantage that the API allows querying for a very specific node when, for example, using node.descendants::<PreIncrementExpression>().
Having more coarse-grained nodes on the other hand has the advantage that it's easier to implement common behaviour that, for example, applies to all Binary expressions.
I guess, this isn't a specific question, but we should think about if we favour fine granular nodes, being even as extreme as having a different node for each BinaryExpression operator.
Add support for /// comments that we can use to document nodes and tokens.