docs/ide/api-designs/Virtual Character System.md
Embedded languages within string literals present a fundamental challenge: they need precise position tracking, but string escaping collapses source text in non-trivial ways. IDE features like colorization, brace matching, completion, and diagnostics require bidirectional mapping between logical characters and their source representations.
Without this mapping, features can only operate on the processed string value (token.ValueText), losing the connection
to what the user actually typed.
Consider a normal string like "Hello\tWorld". In source, the tab character appears as the two-character escape
sequence \t, but the logical string value contains an actual tab character. When the IDE needs to provide features
like completion or diagnostics, it must be able to map that single logical tab character back to its two-character
source representation at span [5, 7).
Verbatim strings present a different challenge. In @"He said ""Hello""", the double-quote character in the logical
string He said "Hello" comes from a doubled quote "" in the source. Again, we need bidirectional mapping between
the single logical character and its two-character source representation.
Unicode escapes add yet another layer of complexity. In "Test\u0041B", the six-character escape sequence \u0041
represents a single A character. The logical string is "TestAB", but features must be able to map the A back to
its full six-character escape sequence in the source.
Even XML documentation uses character escaping. Inside a <code>int x < 5;</code> block, the entity reference <
(four characters) represents a single less-than character. When parsing C# code from documentation, we need to map that
logical < back to the < entity reference.
The VirtualChar system enables IDE features for multiple embedded language scenarios:
Regex APIs[|...|] and {|Name:...|}<code> blocks in XML doc commentsAll benefit from the same VirtualChar foundation.
VirtualChars solve this problem by providing:
The VirtualChar system follows Roslyn's green/red architecture pattern for memory efficiency and immutability.
Design characteristics:
The VirtualCharGreen is designed to be immutable and position-independent. Once created, it's never modified, and its offset is stored relative to the token start rather than as an absolute file position. This makes instances shareable across different contexts and enables efficient caching. The structure is also highly memory-optimized, packing both the offset and width into a single integer field.
Key fields:
internal readonly record struct VirtualCharGreen
{
private const int MaxWidth = 12;
private const int WidthMask = 0b1111; // 4 bits for width (max 10)
private const int OffsetShift = 4; // remaining 28 bits for offset
public readonly char Char; // The logical character
private readonly int _offsetAndWidth; // Packed offset + width
public int Offset => _offsetAndWidth >> OffsetShift;
public int Width => _offsetAndWidth & WidthMask;
public VirtualCharGreen(char ch, int offset, int width)
{
Char = ch;
_offsetAndWidth = (offset << OffsetShift) | width;
}
}
Packing details:
The width field is limited to 4 bits, allowing values from 0 to 15. This is sufficient because the longest possible
escape sequence in C# is \UHHHHLLLL for a surrogate pair, which requires 10 characters. The remaining 28 bits are
used for the offset, which supports tokens up to 268 million characters long—far more than any realistic string literal.
Examples:
// Regular character 'a' in "abc"
new VirtualCharGreen('a', offset: ..., width: 1)
// Tab character from "\t" escape
new VirtualCharGreen('\t', offset: ..., width: 2)
// 'A' from Unicode escape "\u0041"
new VirtualCharGreen('A', offset: ..., width: 6)
Red wrapper properties:
internal readonly record struct VirtualChar
{
internal VirtualCharGreen Green { get; }
internal int TokenStart { get; }
public char Value => Green.Char;
public TextSpan Span => new(TokenStart + Green.Offset, Green.Width);
public static implicit operator char(VirtualChar ch) => ch.Value;
}
The VirtualChar structure wraps a VirtualCharGreen and adds a TokenStart field that provides absolute file position context. The span is computed on-demand by combining the token's absolute start position with the green node's relative offset and width. This wrapper is lightweight—just a green reference plus one integer—and supports implicit conversion to char for convenient usage.
Design rationale: Following Roslyn's green/red split, the green component is immutable, shareable, and position-independent (efficient for caching), while the red wrapper adds positional context on-demand (efficient for consumption).
Represents the complete processed contents of a string token as a sequence of VirtualChars.
Structure:
internal partial struct VirtualCharGreenSequence
{
private readonly Chunk _leafCharacters; // The actual character storage
private readonly TextSpan _span; // Slice into _leafCharacters [inclusive, exclusive)
public int Length => _span.Length;
public VirtualCharGreen this[int index] => _leafCharacters[_span.Start + index];
public VirtualCharGreenSequence Slice(int start, int length)
=> new(_leafCharacters, new TextSpan(_span.Start + start, length));
}
internal readonly struct VirtualCharSequence
{
private readonly int _tokenStart;
private readonly VirtualCharGreenSequence _sequence;
public VirtualChar this[int index] => new(_sequence[index], _tokenStart);
public VirtualCharSequence Slice(int start, int length)
=> new(_tokenStart, _sequence.Slice(start, length));
}
Slicing support: Efficient subsequence extraction without copying
"Hello" → slice [1..^1] → Hello (no allocation)Two implementations optimized for different scenarios:
StringChunk (Common case: no escapes)
// For tokens like "Hello World" (no escape sequences)
VirtualCharGreenSequence.Create("Hello World")
// Zero allocation: No array materialized
// Direct indexing: VirtualChar created on-demand from string
// Each character has width=1, offset matches string position
ImmutableSegmentedListChunk (Escapes present)
// For tokens like "Hello\tWorld" (contains escapes)
ImmutableSegmentedList<VirtualCharGreen> sequence = [
new('H', 0, 1));
new('e', 1, 1));
new('l', 2, 1));
new('l', 3, 1));
new('o', 4, 1));
new('\t', 5, 2)); // \t spans 2 source chars
new('W', 7, 1));
// ... etc
];
// Materialized storage: Array holds pre-computed VirtualCharGreens
// Preserves escape info: Each element stores width for original escape sequence
Performance impact:
For unescaped strings (the common case), there's no heap allocation beyond the string token itself. When escapes are present, we allocate a single array to hold the materialized VirtualCharGreens. Both implementations support efficient slicing without copying the underlying data.
Each character in token.ValueText maps to exactly one VirtualChar in the sequence.
The union of all VirtualChar.Spans covers the entire token.Text excluding quotes.
For regular strings, VirtualChar[i].Span.End == VirtualChar[i+1].Span.Start
Exception: Multi-line raw string literals may have gaps (whitespace/newlines stripped)
Conversion succeeds only for tokens without diagnostics (and thus well-formed escape sequences).
Failure cases (returns default):
Language-specific interface for converting string tokens to VirtualChar sequences.
Key methods:
VirtualCharSequence TryConvertToVirtualChars(SyntaxToken token)bool TryGetEscapeCharacter(VirtualChar ch, out char escapeChar)Responsibilities:
The service is responsible for validating that the token is a well-formed string literal without diagnostics, processing the language-specific escape sequences, generating the VirtualChar sequence while maintaining all invariants, and providing reverse mapping from logical characters back to their escape form.
Implementations:
CSharpVirtualCharService: Handles C# string escaping rulesVisualBasicVirtualCharService: Handles VB string escaping rulesThe system must identify which string literals contain embedded languages and which specific language they contain.
[StringSyntax] AttributeThe most explicit detection mechanism uses .NET 7+ System.Diagnostics.CodeAnalysis.StringSyntaxAttribute.
Example: void ProcessRegex([StringSyntax("Regex")] string pattern)
The detector checks for this attribute in several locations: method and constructor parameters, field declarations, property declarations, and attribute constructor arguments. The algorithm parses the argument syntax, resolves the parameter symbol via the semantic model, checks for the StringSyntax attribute, and extracts the language identifier from the first constructor argument. Note that this list can be extended as needed as we discover more cases in the future.
Lightweight annotation using special comment syntax: // lang=<identifier>[,<option1>,<option2>,...]
Example:
// lang=regex
var pattern = "\\d+";
The comment applies to the next statement or declaration. The detector scans both the leading trivia of the statement and the trailing trivia of the previous token to find these annotations. Comments can include comma-separated options that are passed to the parser configuration. Importantly, comments take precedence over attribute detection, allowing developers to override the default detection when needed.
Recognition of framework types and methods known to accept embedded language strings.
Regex APIs: Regex.IsMatch, Regex.Replace, new Regex(...), etc.
The recognition logic maintains a hash set of well-known method names, verifies that the invoked symbol belongs to the
expected type (like System.Text.RegularExpressions.Regex), and then finds parameters with specific names (like
"pattern") to match against arguments. This API registry is built once at the compilation level from the type's
members.
Special handling for format strings in interpolated string expressions.
Example: $"{date:yyyy-MM-dd}"
Detection flow:
:yyyy-MM-dd)DateTime)IFormattable.ToString implementation[StringSyntax] attributeDetectors are compilation-scoped services that efficiently identify embedded language tokens. They cache type symbols
(like the Regex type) so they only need to be resolved once, and similarly cache the set of well-known method names
computed from the type's members. The detectors don't cache parsed trees, however—those are built on-demand only for
tokens that are currently visible in the editor.
The system uses a generic EmbeddedLanguageDetector infrastructure that works with language identifiers and the various
detection strategies described above.
For direct parameter annotation, the flow is straightforward: a string literal appears as an argument, which maps to a
parameter decorated with [StringSyntax], and detection succeeds immediately.
Local variable flow tracking is more complex. When a string literal is assigned to a local variable, no detection occurs
initially. Later, when that variable is passed to a well-known API like Regex.IsMatch, the detector backtracks from
the usage site to the original assignment and marks the string literal as containing an embedded language. This allows
a local to be declared without a comment, while still lighting it up as a language token. For example:
var v = "[a-z]*";
Regex.IsMatch(str, pattern: v); // Usage of 'v' here in this API will inform IDE that v's value is a Regex.
For fields with const or readonly modifiers, the detector finds the attribute on the field declaration, then scans the containing type for all references to that field and checks those usage sites as well to determine if the field's initializer should be treated as an embedded language string.
Comment overrides take precedence over all other strategies, allowing developers to explicitly specify the language when the automatic detection might be incorrect.
Rationale: Comments allow local overrides for edge cases
Embedded language parsers produce syntax trees that mirror Roslyn's core design principles while remaining independent and language-specific.
| Roslyn Construct | Embedded Language Equivalent |
|---|---|
SyntaxToken | EmbeddedSyntaxToken<TSyntaxKind> |
SyntaxNode | EmbeddedSyntaxNode<TSyntaxKind, TSyntaxNode> |
SyntaxTrivia | EmbeddedSyntaxTrivia<TSyntaxKind> |
SyntaxTree | EmbeddedSyntaxTree<...> |
| Green/Red split | VirtualChar only (not yet for nodes) |
Design principles shared with Roslyn:
ChildAt(index)Key difference: Currently no green/red split for embedded syntax nodes/tokens (only VirtualChar has this separation). See §7.1 for future optimization.
Represents a token within an embedded language, backed by VirtualChars.
Core properties:
internal readonly struct EmbeddedSyntaxToken<TSyntaxKind>
{
public readonly TSyntaxKind Kind;
public readonly ImmutableArray<EmbeddedSyntaxTrivia<TSyntaxKind>> LeadingTrivia;
public readonly VirtualCharSequence VirtualChars;
public readonly ImmutableArray<EmbeddedSyntaxTrivia<TSyntaxKind>> TrailingTrivia;
public readonly ImmutableArray<EmbeddedDiagnostic> Diagnostics;
public readonly object Value; // Optional semantic interpretation
// Helper properties and methods
}
Trivia handling: Limited compared to Roslyn
IgnorePatternWhitespace mode//), multi-line (/* */) commentsPosition: Derived from VirtualChars (first char start to last char end)
Value examples:
int or doublestringnullAbstract base for all non-terminal nodes in embedded syntax trees.
Key characteristics:
internal abstract class EmbeddedSyntaxNode<TSyntaxKind, TSyntaxNode>
{
public readonly TSyntaxKind Kind;
internal abstract int ChildCount { get; }
internal abstract EmbeddedSyntaxNodeOrToken<TSyntaxKind, TSyntaxNode> ChildAt(int index);
// Helper methods, like:
public TextSpan GetSpan() { ... }
public bool Contains(VirtualChar virtualChar) { ... }
}
ChildAt(index) returns nodes or tokensEnumeration: Supports foreach over children
Specialized structure for comma-delimited (or bar-delimited) constructs.
Storage pattern: Alternating nodes and separators in ImmutableArray
Indexer: Returns node at index * 2 (skips separators)
Example: JSON array [1, 2, 3] stored as [Node₁, Comma, Node₂, Comma, Node₃]
The root tree structure contains three main components. The VirtualCharSequence Text field is the source of truth for
all position information. The TCompilationUnitSyntax Root is the top-level syntax node, which is always present (never
null) and contains the entire parsed structure. Finally, ImmutableArray<EmbeddedDiagnostic> Diagnostics holds all
errors and warnings found during parsing, deduplicated so that the same diagnostic doesn't appear twice at the same
position.
Concrete instantiations of this structure include RegexTree, which adds language-specific dictionaries for capture
names and numbers, and JsonTree, which is a pure tree without additional properties.
The parsing pipeline flows from a source SyntaxToken through IVirtualCharService to produce a VirtualCharSequence,
which is then consumed by Parser.TryParse() to yield an EmbeddedSyntaxTree.
Parsers are designed to always succeed except in the rare case of stack overflow. They maintain full fidelity by representing every VirtualChar in the resulting tree. When errors are encountered, the parser synthesizes missing tokens and attaches diagnostics rather than failing. An important characteristic is that diagnostics reasonably replicate the error messages that native parsers would produce, ensuring consistency with runtime behavior.
Note that the exact same error messages are not necessary. This can sometimes be onerous in terms of all the varying checks the native parsers perform, as well as how they choose the exact position to place the diagnostic. As long as the diagnostics are reasonably close in terminology and location, we consider the result acceptable.
Lexer structure:
internal struct JsonLexer
{
public readonly VirtualCharSequence Text;
public int Position;
public JsonToken ScanNextToken()
{
var leadingTrivia = ScanTrivia(leading: true);
if (Position == Text.Length)
return CreateToken(JsonKind.EndOfFile, leadingTrivia,
VirtualCharSequence.Empty, []);
var (chars, kind, diagnostic) = ScanNextTokenWorker();
var trailingTrivia = ScanTrivia(leading: false);
var token = CreateToken(kind, leadingTrivia, chars, trailingTrivia);
return diagnostic == null
? token
: token.AddDiagnosticIfNone(diagnostic.Value);
}
private (VirtualCharSequence, JsonKind, EmbeddedDiagnostic?) ScanNextTokenWorker()
{
return this.CurrentChar.Value switch
{
'{' => ScanSingleCharToken(JsonKind.OpenBraceToken),
'}' => ScanSingleCharToken(JsonKind.CloseBraceToken),
'[' => ScanSingleCharToken(JsonKind.OpenBracketToken),
']' => ScanSingleCharToken(JsonKind.CloseBracketToken),
',' => ScanSingleCharToken(JsonKind.CommaToken),
':' => ScanSingleCharToken(JsonKind.ColonToken),
'\'' or '"' => ScanString(),
_ => ScanText(),
};
}
private (VirtualCharSequence, JsonKind, EmbeddedDiagnostic?) ScanString() { ... }
}
Parser structure:
internal partial struct JsonParser
{
private JsonLexer _lexer;
private JsonToken _currentToken;
public static JsonTree? TryParse(VirtualCharSequence text, JsonOptions options)
{
try
{
if (text.IsDefaultOrEmpty())
return null;
return new JsonParser(text).ParseTree(options);
}
catch (InsufficientExecutionStackException)
{
return null;
}
}
private JsonTree ParseTree(JsonOptions options)
{
var sequence = this.ParseSequence();
var root = new JsonCompilationUnit(sequence, _currentToken);
// Collect diagnostics from tree and run validation passes
var diagnostics = GetDiagnostics(root, options);
return new JsonTree(_lexer.Text, root, diagnostics);
}
private ImmutableArray<JsonValueNode> ParseSequence()
{
var result = ArrayBuilder<JsonValueNode>.GetInstance();
while (ShouldConsumeSequenceElement())
result.Add(ParseValue());
return result.ToImmutableAndFree();
}
private JsonValueNode ParseValue()
{
return _currentToken.Kind switch
{
JsonKind.OpenBraceToken => ParseObject(),
JsonKind.OpenBracketToken => ParseArray(),
_ => ParseLiteral(),
};
}
private JsonObjectNode ParseObject() { ... }
}
The key patterns in this architecture are straightforward, and follow well understood patterns around recursive descent parsing. The lexer consumes VirtualChars and produces tokens that carry VirtualChar spans for precise position tracking. The parser then consumes these tokens to build the tree structure. Diagnostics are attached during parsing as errors are encountered, then aggregated into the final tree. When required tokens are missing, the parser synthesizes them with attached diagnostics to enable error recovery while maintaining a complete tree structure.
Highlights matching delimiters when cursor is adjacent.
The algorithm works in several steps. First, it converts the cursor position to a VirtualChar using
tree.Text.Find(position). Then it walks the tree to find the node containing that character through recursive descent.
Once found, it extracts the open and close tokens from the grouping, character class, or other bracketed node, and
returns the span pair for highlighting.
Example implementation (JSON), also demonstrating consumption of an embedded language tree:
internal sealed class JsonBraceMatcher : IEmbeddedLanguageBraceMatcher
{
public BraceMatchingResult? FindBraces(
SemanticModel semanticModel,
SyntaxToken token,
int position,
CancellationToken cancellationToken)
{
var tree = ParseJsonTree(token, semanticModel, cancellationToken);
if (tree == null)
return null;
// Step 1: Find VirtualChar at cursor position
var virtualChar = tree.Text.Find(position);
if (virtualChar == null)
return null;
var ch = virtualChar.Value;
// Step 2: Only process brace-like characters
if (ch.Value is not ('{' or '[' or '(' or '}' or ']' or ')'))
return null;
// Step 3: Find the node containing this character
return FindBraceMatchingResult(tree.Root, ch);
}
private static BraceMatchingResult? FindBraceMatchingResult(JsonNode node, VirtualChar ch) { ... }
}
Supported constructs: Parentheses, brackets, braces, comment delimiters
Provides syntax highlighting within embedded language strings.
The classification process walks all tokens in the embedded syntax tree and maps each token's kind to a classification
type (for example, RegexKind.NumberToken maps to "regex - quantifier"). The VirtualChar spans from each token are
extracted to produce source TextSpans, which are then published to the IDE for colorization. This approach provides
granular coloring where individual constructs like escape sequences, keywords, and operators each get distinct
highlighting.
Errors and warnings reported with precise source spans.
Diagnostics are attached during parsing to tokens and trivia as they're created, then aggregated into the final tree. The aggregation process ensures deduplication so that the same diagnostic doesn't appear multiple times at the same position. These diagnostics integrate seamlessly with the IDE because the VirtualChar spans map directly to the locations where squiggles should appear.
Context-sensitive suggestions within embedded language strings.
Trigger scenarios:
VirtualChar role: Precise replacement span calculation from character positions
Highlights all references to a symbol (e.g., regex capture group references).
Implementation: Find symbol at position → locate all references in tree → return spans
Parser structure: RegexCompilationUnit → RegexAlternationNode → RegexSequenceNode → expressions
Key node types:
(?:...), (?<name>...), (?=...), etc.[a-z], [^0-9]*, +, ?, {n,m}^, $, \b, \A, \z\t, \d, \w, \p{Lu}Example: "\\d+" → CharacterClassEscapeNode(\d) + OneOrMoreQuantifierNode(+)
Capture tracking: Tree maintains dictionaries mapping capture names/numbers to their definition spans
The regex parser is built through deep introspection of the .NET runtime's System.Text.RegularExpressions.RegexParser
implementation. The goal is to replicate its parsing behavior as precisely as possible, matching both its acceptance of
valid patterns and its rejection of invalid ones with identical error messages. However, there's a critical difference
in what the parsers produce. The .NET runtime parser builds an interpreter or bytecode execution system for pattern
matching at runtime. In contrast, the Roslyn regex parser produces a concrete syntax tree that represents the structure
of the pattern without any execution capability. This syntax tree serves as the foundation for IDE features like syntax
highlighting, brace matching, and diagnostics. To validate this faithful reproduction of .NET's parsing behavior, the
implementation leverages the entire suite of regex tests from the .NET runtime repository. These tests verify two
critical properties:
Acceptance parity: Any regex pattern that the .NET parser accepts without errors must also be parsed successfully by the Roslyn parser without producing diagnostics. Rejection parity: Any regex pattern that causes the .NET parser to report an error must also produce a diagnostic from the Roslyn parser, ideally with the same error message text.
This rigorous testing approach has proven valuable beyond just validating the Roslyn implementation. During development, the team discovered several parsing bugs in the .NET runtime regex parser itself. These bugs were reported to the .NET team and subsequently fixed. While the original goal was to achieve feature parity, the implementation now focuses on matching the current, corrected behavior of the .NET runtime parser.
Parser structure: JsonCompilationUnit → JsonObjectNode / JsonArrayNode → properties/elements
Value types: Literals (string, number, true, false, null), Objects, Arrays
Example tree structure:
// Input: {"key": 123}
JsonCompilationUnit
├─ Sequence
│ └─ JsonObjectNode
│ ├─ OpenBraceToken: '{'
│ ├─ Sequence (separated list)
│ │ ├─ JsonPropertyNode
│ │ │ ├─ NameToken: "key"
│ │ │ ├─ ColonToken: ':'
│ │ │ └─ Value: JsonLiteralNode
│ │ │ └─ LiteralToken: 123 (NumberToken)
│ └─ CloseBraceToken: '}'
└─ EndOfFileToken
Separated lists: Properties in objects, elements in arrays use EmbeddedSeparatedSyntaxNodeList
The JSON parser handles multiple dialects of JSON to accommodate different usage scenarios:
Strict RFC 8259 Mode: When operating in strict mode, the parser enforces the exact requirements of RFC 8259 (the official JSON specification). This includes restrictions on numeric format, disallowing comments, requiring double-quoted strings, and other strict conformance requirements.
Lenient Modes: The parser also supports several relaxed variants:
// single-line and /* */ multi-line), trailing
commas, and other features from Microsoft's modern JSON library.new Date(...)), unquoted property names, single-quoted strings, special numeric values
like NaN and Infinity, and other non-standard constructs.The implementation uses a two-phase approach. First, the parser builds a syntax tree that accepts a superset of all supported JSON dialects—essentially accepting anything that would be valid in any mode. Then, validation passes walk this permissive tree and report diagnostics based on the selected parsing mode. This design keeps the core parsing logic simple while allowing precise error reporting for each dialect.
The strict mode validation is particularly comprehensive, using a regular expression to validate numeric format against the RFC 8259 grammar and checking character-by-character for illegal constructs like improperly quoted strings or invalid whitespace characters. The Json.NET validation pass, on the other hand, enforces that mode's looser but still well-defined rules, such as ensuring numeric literal formats that Json.NET can successfully parse and validating constructor name syntax.
Used for code generation templates, dynamic compilation, IDE scenarios.
Detection: [StringSyntax("C#")] or // lang=C#
C# code within <code> blocks in XML documentation.
Processing: Extract content → process XML entities (< → <) → create VirtualChar sequence → parse as C#
Special variant for writing Roslyn IDE tests with embedded annotations.
Features:
[|...|] marks text spans{|Name:...|} marks spans with identifiers$$ marks cursor positionsProcessing:
Use cases: IntelliSense tests, navigation tests, refactoring tests, diagnostic tests
Current state: Only VirtualChar has green/red separation
Potential optimization: Extend green/red pattern to entire embedded syntax trees
Benefits:
Current status: Not implemented because features only process visible strings (small working set), edits rarely affect multiple strings
Trigger: Would become valuable if features expand to whole-file analysis
Requirements for new language:
VirtualCharSequenceCandidate languages: SQL, GraphQL, Markdown, CSS/SCSS, YAML/TOML
Integration path:
VirtualCharSequence → EmbeddedSyntaxTreeScenario: Nested embedded languages (e.g., SQL containing JSON) Challenge: Multiple escaping layers (C# + SQL + JSON) Potential solution: Compose VirtualChar sequences across language boundaries Current status: Not supported, but architecture could accommodate
Caching opportunities:
Streaming parsing: For very large string literals (not common)
Lazy tree construction: Build on-demand rather than eagerly
Improvements:
Helpers: VirtualChar sequence assertions, tree comparison utilities, diagnostic verification
Shared patterns: Immutability, full fidelity, span precision, uniform traversal, factory patterns Green/red pattern: Currently VirtualChar only; future expansion opportunity
Memory optimizations: StringChunk zero-allocation, bit packing, lazy construction Computational optimizations: Compilation-level caching, no redundant work, binary search for position lookup Pragmatic trade-offs: No parent pointers, no update methods, visible-string focus
Generic abstractions: Language-agnostic core (TSyntaxKind, TSyntaxNode)
Pluggable detection: Multiple strategies, language-specific detectors
Feature contracts: Uniform interfaces for classification, brace matching, completion, etc.
The system enforces critical invariants including 1:1 character correspondence, contiguous span coverage, and well-formed input requirements. Error handling is designed to never throw exceptions—instead, parsers synthesize missing tokens and attach precise diagnostics. The testing strategy focuses on matching native parser behavior, verifying span mappings, and validating round-trip conversion.
The system aims for complete parity with native parsers, ensuring that error messages match exactly what the runtime would produce. Feature quality is measured by character-level precision, granular colorization, and exact diagnostic spans. From a performance perspective, features respond instantly for visible strings, run non-blocking on background threads, and are designed for incremental updates (planned for future).