src/libraries/System.Text.RegularExpressions/tools/GenRegexNamedBlocks/README.md
This tool generates the named Unicode blocks for RegexCharClass.cs based on the Unicode Character Database (UCD) Blocks.txt file. Named blocks allow regex patterns to match characters in specific Unicode blocks using syntax like \p{IsBasicLatin} or \p{IsGreek}.
The current implementation is based on Unicode 17.0.
To update the named blocks when a new Unicode version is released:
Download the Blocks.txt file from the Unicode Consortium:
https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt
Run the tool from this directory:
dotnet run -- <path-to-Blocks.txt> ../../src/System/Text/RegularExpressions/RegexCharClass.Tables.cs
The tool will generate the RegexCharClass.Tables.cs file with all named blocks
Update tests in RegexCharacterSetTests.cs to include tests for new blocks if needed
Build and test to ensure all tests pass
The tool automatically excludes:
Block names are converted to "Is" + alphanumeric characters + hyphens (e.g., "Greek and Coptic" becomes "IsGreekandCoptic")
The tool sorts blocks alphabetically by name for consistent output
For backward compatibility, some aliases like "IsGreek" (alias for "IsGreekandCoptic") should be manually maintained
src/libraries/System.Text.Encodings.Web/tools/GenUnicodeRanges/