Back to Dotnet

Updating Encodings

src/libraries/System.Text.Encodings.Web/tools/updating-encodings.md

11.0.1004.1 KB
Original Source

Introduction

Note: For full instructions on how to update the Unicode version consumed by the whole repo (as opposed to just System.Text.Encodings.Web) please follow the steps on this guide instead.

This folder contains tools which allow updating the Unicode data within the System.Text.Encodings.Web package. These data files come from the Unicode Consortium's web site (see https://www.unicode.org/Public/UCD/latest/) and are used to generate the UnicodeRanges class and the internal "defined characters" bitmap against which charaters to be escaped are checked.

Current implementation

The current version of the Unicode data checked in is 16.0.0. The archived files can be found at https://unicode.org/Public/16.0.0/.

Updating the implementation

Updating the implementation consists of three steps: checking in a new version of the Unicode data files (into the runtime-assets repo), generating the shared files used by the runtime and the unit tests, and pointing the unit test files to the correct version of the data files.

As a prerequisite for updating the tools, you will need the dotnet tool (version 3.1 or above) available from your local command prompt.

  1. Update the runtime-assets repo with the new Unicode data files. Instructions for generating new packages are listed at the repo root. Preserve the directory structure already present at https://github.com/dotnet/runtime-assets/tree/master/src/System.Private.Runtime.UnicodeData when making the change.

  2. Get the latest UnicodeData.txt and Blocks.txt files from the Unicode Consortium web site. Drop them into a temporary location; they're not going to be committed to the main runtime repo.

  3. Open a command prompt and navigate to the src/libraries/System.Text.Encodings.Web/tools/GenDefinedCharList directory, then run the following command, replacing the first parameter with the path to the UnicodeData.txt file you downloaded in the previous step. This command will update the "defined characters" bitmap within the runtime folder. The test project also consumes the file from the src folder, so running this command will update both the runtime and the test project.

txt
dotnet run -- "path_to_UnicodeData.txt" ../../src/System/Text/Unicode/UnicodeHelpers.generated.cs
  1. Open a command prompt and navigate to the src/libraries/System.Text.Encodings.Web/tools/GenUnicodeRanges directory, then run the following command, replacing the first parameter with the path to the Blocks.txt file you downloaded earlier. This command will update the UnicodeRanges type in the runtime folder and update the unit tests to exercise the new APIs.
txt
dotnet run -- "path_to_Blocks.txt" ../../src/System/Text/Unicode/UnicodeRanges.generated.cs ../../tests/UnicodeRangesTests.generated.cs
  1. Update the ref APIs to reflect any new UnicodeRanges static properties which were added in the previous step, otherwise the unit test project will not be able to reference them. See https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/updating-ref-source.md for instructions on how to update the reference assemblies.

  2. Update the src/libraries/System.Text.Encodings.Web/tests/System.Text.Encodings.Web.Tests.csproj file to reference the new UnicodeData.txt file that was added to the runtime-assets repo in step (1). Open the .csproj file in a text editor and replace the <UnicodeUcdVersion> property value near the top of the file to reference the new UCD version being consumed.

  3. Finally, update the Current implementation section at the beginning of this markdown file to reflect the version of the Unicode data files which were given to the tools. Remember also to update the URL within that section so that these data files can be easily accessed in the future.

  4. Commit to Git the *.cs, *.csproj, and *.md files that were modified as part of the above process.