Back to Dotnet

GenerateRegexCasingTable Tool

src/libraries/System.Text.RegularExpressions/tools/Readme.md

11.0.1003.7 KB
Original Source

GenerateRegexCasingTable Tool

Overview

This tool is used for generating RegexCaseEquivalences.Data.cs which contains the three tables that will be used for performing matching operations when using RegexOptions.IgnoreCase. This tool will need to be used every time that we are ingesting a new version of Unicode in the repo. The current table contains the Unicode data from version 16.0.0.

Updating the version of Unicode used

For instructions on how to update Unicode version on the whole repo, you find the instructions here.

These are the steps to follow in order to update the Regex case equivalence table:

  1. Download UnicodeData.txt from the version of Unicode that you are updating to from unicode.org. For example, for version 15.0.0, you can find that file here.
  2. Once you have that file locally, run the following command from the command line: dotnet run -- <pathToUnicodeData.txt>.
  3. A file named RegexCaseEquivalences.Data.cs will be created in this directory. Use it to replace the one at src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/.
  4. Update this Readme Unicode version mentioned in the overview section to point to the version that was used to produce the table.

Updating UnicodeCategoryRanges.cs file

UnicodeCategoryRanges.cs is programmatically generated file which provides serialized Binary Decision Diagram (BDD) Unicode category definitions. Expect some tests can fail if updating the Unicode Categories data in the runtime without updating UnicodeCategoryRanges.cs. Here is some example of such failure:

      System.Text.RegularExpressions.Tests.RegexMatchTests.StandardCharSets_SameMeaningAcrossAllEngines(singleCharPattern: "\\w") [FAIL]
        Expected: True
        Actual:   False
        Stack Trace:
          C:\oss\runtime\src\libraries\System.Text.RegularExpressions\tests\FunctionalTests\Regex.Match.Tests.cs(2500,0): at System.Text.RegularExpressions.Tests.RegexMatchTests.VerifyIsMatch(Regex r, String input, Boolean expected, TimeSpan timeout, String pattern, RegexOptions options)
          C:\oss\runtime\src\libraries\System.Text.RegularExpressions\tests\FunctionalTests\Regex.Match.Tests.cs(2456,0): at System.Text.RegularExpressions.Tests.RegexMatchTests.StandardCharSets_SameMeaningAcrossAllEngines(String singleCharPattern)

To update UnicodeCategoryRanges.cs: