crates/icu_messageformat_parser/README.md
A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.
The Rust parser (optimized build) significantly outperforms both the JavaScript parser and other Rust implementations:
$ bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
| Message Type | Rust Parser | JavaScript | Speedup vs JS | SWC Parser | vs SWC |
|---|---|---|---|---|---|
| complex_msg | 9.22 µs | 23.85 µs | 2.59x faster | 10.3 µs | 1.11x faster |
| normal_msg | 1.14 µs | 3.27 µs | 2.87x faster | 1.25 µs | 1.10x faster |
| simple_msg | 163 ns | 600 ns | 3.68x faster | 184 ns | 1.13x faster |
| string_msg | 118 ns | 320 ns | 2.71x faster | 126 ns | 1.07x faster |
Note: Always use -c opt for benchmarking to enable release optimizations.
lib.rs - Main library and WASM bindingsparser.rs - Core parser implementationtypes.rs - AST typeserror.rs - Error typesdate_time_pattern_generator.rs - Date/time pattern supportmanipulator.rs - AST manipulation utilitiesprinter.rs - AST printing utilities# Run tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test
# Build library
bazel build //crates/icu_messageformat_parser:icu_messageformat_parser
# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench
The parser can be compiled to WebAssembly using Bazel's platform transition approach.
bazel build //crates/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm
This uses rust_shared_library with platform = "@rules_rust//rust/platform:wasm" to cross-compile to wasm32.
The WASM build includes:
formatjs_icu_messageformat_parser_bg.wasm - The WASM binary (~1.2MB)formatjs_icu_messageformat_parser.js - JavaScript glue code generated by wasm-bindgenformatjs_icu_messageformat_parser.d.ts - TypeScript type definitionsformatjs_icu_messageformat_parser_bg.wasm.d.ts - WASM module typesThe WASM build uses:
cdylib for dynamic library outputwasm feature flag enables wasm-bindgen dependencies@rules_rust//rust/platform:wasm for wasm32 targetwasm-bindgen and serde-wasm-bindgen for JS interopSee BUILD.bazel for the full configuration.
When compiled to WASM, the parser exports two functions:
parse(input: string): MessageFormatElement[]Parse ICU MessageFormat with default options.
import init, {parse} from './formatjs_icu_messageformat_parser.js'
await init()
const ast = parse('Hello {name}!')
console.log(ast)
parse_ignore_tag(input: string): MessageFormatElement[]Parse with ignore_tag option enabled (treats HTML-like tags as literals).
import init, {parse_ignore_tag} from './formatjs_icu_messageformat_parser.js'
await init()
const ast = parse_ignore_tag('<b>Bold {name}</b>')
console.log(ast)
Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.
The WASM binary is used by the @formatjs/icu-messageformat-parser-wasm npm package, which provides a convenient JavaScript wrapper:
import {parse, parseIgnoreTag} from '@formatjs/icu-messageformat-parser-wasm'
// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!')
The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:
rust_shared_library(
name = "formatjs_icu_messageformat_parser_wasm",
platform = "@rules_rust//rust/platform:wasm",
crate_features = ["wasm"],
# ...
)
This approach:
The wasm feature flag in Cargo.toml enables:
wasm-bindgen for JS interopserde-wasm-bindgen for serializing complex types to JSparse and parse_ignore_tag functionsThe Rust code uses #[cfg(feature = "wasm")] to conditionally compile WASM-specific code.
icu - Unicode/ICU functionalityregex - Pattern matchingserde - Serialization frameworkonce_cell - Lazy static initializationWASM-only dependencies (behind wasm feature):
wasm-bindgen - JS interopserde-wasm-bindgen - Serialize to JS values# Regenerate time data
bazel run //crates/icu_messageformat_parser:time-data
# Regenerate regex patterns
bazel run //crates/icu_messageformat_parser:regex
# Run Rust tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test
# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench