src/libraries/System.Private.CoreLib/Tools/GenUnicodeProp/Updating-Unicode-Versions.md
This repository has several places that need to be updated when we are ingesting a new version of Unicode, mainly because different libraries we have in the runtime depend on specific data that could change with each update (e.g., new characters being added, casing information changing, etc.). Here are the steps that need to be followed when ingesting a new version of Unicode in dotnet/runtime:
First step is that we need to add the Unicode data to somewhere that our dotnet/runtime repo can later ingest. This "somewhere" is a package that we build in the runtime-assets repo. The Unicode data can be downloaded from the Unicode website, and more specifically from the files pushed to the following location: https://www.unicode.org/Public/14.0.0/ (<-- change 14.0.0 for the version that you want to ingest.) Go into the ucd folder and download the following files:
Once you have downloaded all those files, create a fork of the repo https://github.com/dotnet/runtime-assets and send a PR which creates a folder at src/System.Private.Runtime.UnicodeData/<YourUnicodeVersion> and places all of the downloaded files from step 1 there. You can look at a sample PR that did this for Unicode 14.0.0 here: https://github.com/dotnet/runtime-assets/pull/179
This should be done automatically by dependency-flow, so in theory there shouldn't be any user-action in order for this to happen, but we still call it out on these instructions since there could be a problem in the ingestion and that would cause a problem with the process. The way the process works, is that after the PR from the runtime-assets repo gets merged, a new build will be triggered in the runtime-assets pipeline which will produce the new Unicode package, and once that build is done (and assuming it succeeds) it will also trigger the subscription that dotnet/runtime has against the runtime-assets repo, which will generate a dependency PR (like this one) which will ingest the new package version in dotnet/runtime.
CharUnicodeInfoData.cs file and will tell you where you need to copy the generated file. Make sure after compiling the GenUnicodeProp tool, that by inspecting the contents of the produced assembly, it contains all of the updated resources embedded into it, since those embedded resources are what is used to produce CharUnicodeInfoData.cs. You can inspect the embedded resources on the assembly using a tool like ILSpy.UnicodeHelpers.generated.cs and UnicodeRangesTests.generated.cs, which are consumed by both the test and the implementation projects for System.Text.Encodings.Web.<UnicodeUcdVersion> and update it to use the new version. If a project defines this property, then it is very likely it is consuming the runtime-assets package in some form, so it needs to be updated to consume the new version. At the time of the writing of this doc, the project files which need to be updated are:
src/native/minipal/unicodedata.c file. This file is used by most of the reflection stack whenever you specify the BindingFlags.IgnoreCase. In order to regenerate the contents of the unicdedata.c file, you need to run the Program located at src/native/minipal/UnicodeDataGenerator/unicodedata.cs and give a full path to the new UnicodeData.txt as a parameter. e.g. in Unix shell:
# download UnicodeData.txt
$ curl -sSLo /tmp/UnicodeData.txt https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt
# update unicodedata.c
$ cd runtime
$ ./dotnet.sh run --project src/native/minipal/UnicodeDataGenerator /tmp/UnicodeData.txt > src/native/minipal/unicodedata.c
src/libraries/System.Text.RegularExpressions/tools/GenRegexNamedBlocks following the instructions in its README.md.https://www.unicode.org/license.html to the section that has the Unicode license in our notices.