src/libraries/System.Text.RegularExpressions/tools/Readme.md
This tool is used for generating RegexCaseEquivalences.Data.cs which contains the three tables that will be used for performing matching operations when using RegexOptions.IgnoreCase. This tool will need to be used every time that we are ingesting a new version of Unicode in the repo. The current table contains the Unicode data from version 16.0.0.
For instructions on how to update Unicode version on the whole repo, you find the instructions here.
These are the steps to follow in order to update the Regex case equivalence table:
dotnet run -- <pathToUnicodeData.txt>.RegexCaseEquivalences.Data.cs will be created in this directory. Use it to replace the one at src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/.UnicodeCategoryRanges.cs is programmatically generated file which provides serialized Binary Decision Diagram (BDD) Unicode category definitions.
Expect some tests can fail if updating the Unicode Categories data in the runtime without updating UnicodeCategoryRanges.cs. Here is some example of such failure:
System.Text.RegularExpressions.Tests.RegexMatchTests.StandardCharSets_SameMeaningAcrossAllEngines(singleCharPattern: "\\w") [FAIL]
Expected: True
Actual: False
Stack Trace:
C:\oss\runtime\src\libraries\System.Text.RegularExpressions\tests\FunctionalTests\Regex.Match.Tests.cs(2500,0): at System.Text.RegularExpressions.Tests.RegexMatchTests.VerifyIsMatch(Regex r, String input, Boolean expected, TimeSpan timeout, String pattern, RegexOptions options)
C:\oss\runtime\src\libraries\System.Text.RegularExpressions\tests\FunctionalTests\Regex.Match.Tests.cs(2456,0): at System.Text.RegularExpressions.Tests.RegexMatchTests.StandardCharSets_SameMeaningAcrossAllEngines(String singleCharPattern)
To update UnicodeCategoryRanges.cs:
Debug configuration.UnicodeCategoryRangesGenerator.Generate, set the Enabled property to true. Then run the test case System.Text.RegularExpressions.Tests.RegexExperiment.RegenerateUnicodeTables() which will generate the UnicodeCategoryRanges.cs file in the %temp% folder.UnicodeCategoryRanges.cs from the %temp% folder to the path https://github.com/dotnet/runtime/blob/ad9efe886e16b179f2ce8e93221386d420ffe10d/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic