Back to Formatjs

ICU MessageFormat Parser (Rust)

crates/icu_messageformat_parser/README.md

4.5.05.8 KB
Original Source

ICU MessageFormat Parser (Rust)

A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.

Features

  • Full ICU MessageFormat syntax support
  • High-performance parsing - 2.6-3.7x faster than JavaScript parser
  • WebAssembly-ready with wasm-bindgen
  • Zero-copy parsing where possible
  • Comprehensive error handling

Performance

The Rust parser (optimized build) significantly outperforms both the JavaScript parser and other Rust implementations:

bash
$ bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
Message TypeRust ParserJavaScriptSpeedup vs JSSWC Parservs SWC
complex_msg9.22 µs23.85 µs2.59x faster10.3 µs1.11x faster
normal_msg1.14 µs3.27 µs2.87x faster1.25 µs1.10x faster
simple_msg163 ns600 ns3.68x faster184 ns1.13x faster
string_msg118 ns320 ns2.71x faster126 ns1.07x faster

Note: Always use -c opt for benchmarking to enable release optimizations.

Project Structure

  • lib.rs - Main library and WASM bindings
  • parser.rs - Core parser implementation
  • types.rs - AST types
  • error.rs - Error types
  • date_time_pattern_generator.rs - Date/time pattern support
  • manipulator.rs - AST manipulation utilities
  • printer.rs - AST printing utilities

Building

Native Rust Library

bash
# Run tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test

# Build library
bazel build //crates/icu_messageformat_parser:icu_messageformat_parser

# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench

WebAssembly

The parser can be compiled to WebAssembly using Bazel's platform transition approach.

Build with Bazel

bash
bazel build //crates/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm

This uses rust_shared_library with platform = "@rules_rust//rust/platform:wasm" to cross-compile to wasm32.

What Gets Built

The WASM build includes:

  • formatjs_icu_messageformat_parser_bg.wasm - The WASM binary (~1.2MB)
  • formatjs_icu_messageformat_parser.js - JavaScript glue code generated by wasm-bindgen
  • formatjs_icu_messageformat_parser.d.ts - TypeScript type definitions
  • formatjs_icu_messageformat_parser_bg.wasm.d.ts - WASM module types

WASM Configuration

The WASM build uses:

  • crate-type: cdylib for dynamic library output
  • features: wasm feature flag enables wasm-bindgen dependencies
  • platform: @rules_rust//rust/platform:wasm for wasm32 target
  • dependencies: wasm-bindgen and serde-wasm-bindgen for JS interop

See BUILD.bazel for the full configuration.

WASM API

When compiled to WASM, the parser exports two functions:

parse(input: string): MessageFormatElement[]

Parse ICU MessageFormat with default options.

javascript
import init, {parse} from './formatjs_icu_messageformat_parser.js'

await init()
const ast = parse('Hello {name}!')
console.log(ast)

parse_ignore_tag(input: string): MessageFormatElement[]

Parse with ignore_tag option enabled (treats HTML-like tags as literals).

javascript
import init, {parse_ignore_tag} from './formatjs_icu_messageformat_parser.js'

await init()
const ast = parse_ignore_tag('<b>Bold {name}</b>')
console.log(ast)

Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.

Usage in Packages

The WASM binary is used by the @formatjs/icu-messageformat-parser-wasm npm package, which provides a convenient JavaScript wrapper:

javascript
import {parse, parseIgnoreTag} from '@formatjs/icu-messageformat-parser-wasm'

// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!')

Implementation Notes

Platform Transition

The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:

python
rust_shared_library(
    name = "formatjs_icu_messageformat_parser_wasm",
    platform = "@rules_rust//rust/platform:wasm",
    crate_features = ["wasm"],
    # ...
)

This approach:

  • ✅ Works entirely within Bazel's hermetic build system
  • ✅ No external tools (like wasm-pack) required at build time
  • ✅ Leverages rules_rust's native WASM support
  • ✅ Automatically uses the wasm32 dummy CC toolchain

WASM Bindgen Integration

The wasm feature flag in Cargo.toml enables:

  • wasm-bindgen for JS interop
  • serde-wasm-bindgen for serializing complex types to JS
  • Exported parse and parse_ignore_tag functions

The Rust code uses #[cfg(feature = "wasm")] to conditionally compile WASM-specific code.

Dependencies

  • icu - Unicode/ICU functionality
  • regex - Pattern matching
  • serde - Serialization framework
  • once_cell - Lazy static initialization

WASM-only dependencies (behind wasm feature):

  • wasm-bindgen - JS interop
  • serde-wasm-bindgen - Serialize to JS values

Development

Regenerate Generated Files

bash
# Regenerate time data
bazel run //crates/icu_messageformat_parser:time-data

# Regenerate regex patterns
bazel run //crates/icu_messageformat_parser:regex

Testing

bash
# Run Rust tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test

# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench

References