docs/design/features/localization-options.md
The .NET Core host and runtime contain messages that can be displayed to both end-users and developers. Currently, all such messages are displayed in English.
Other managed components built on top of .NET Runtime (e.g. SDK, WinForms, WPF) have already been localized, so a process already exists for handling translation and localizing assets, while the runtime handles satellite assembly loading. The host and runtime are different in that they have messages that originate from native components and must continue to do so. While the runtime does contain some managed resources, this document focuses on localization of native resources.
The goal is to support:
cs, de, en, es, fr, it, ja, ko, pl, pt-BR, ru, tr, zh-Hans, zh-Hant
On Windows, resource script (.rc) files are used to create resources that will be embedded into a binary. These files define STRINGTABLE resources containing the resource strings. Each string has a resource identifier - a symbol name mapped to an integer value - which can be used to look up the string value.
The LoadString and FormatMessage APIs retrieve a string resources based on a specified identifier (the integer value of the resource identifier) from a specified module. These APIs leave it to their consumer to find and load the appropriate module containing the desired resources. While resources for all languages can be included in the main binary itself, it is common to separate language-specific resources into resource-only libraries.
The GNU gettext APIs and tools are the standard for internationalization and localization on Linux. The tools provide a way to extract strings from C/C++ sources into separate source string (.po) files (which could then be translated) and produce binary (.mo) files from those source string files. The APIs allow retrieval of the translated strings through a msgid (string), where the convention is to use the untranslated string as the msgid.
The gettext API looks for the binary files in a folder of the format:
<directory_name>/<locale>/LC_MESSAGES/<domain_name>.mo
The <directory_name> and <domain_name> can be configured via the dgettext and bindtextdomain APIs. The <locale> is that of the current process. Users can configure the locale through environment variables.
For OSX bundles, separate strings resource (.strings) files are used for string localization. The platform provides a tool to extract strings from sources into .strings files and APIs for retrieving the strings from the .strings files. The .strings files are a mapping key strings to corresponding value strings, where it is common to use the untranslated string as the key.
The Core Foundation framework provides CFCopyLocalizedString* macros for loading string resources. They will look for the strings files in a folder of the format:
<bundle_folder>/<locale>.lproj/<table_name>.strings
The <bundle_folder> and <table_name> are based on the bundle specified in the API call. The <locale> is that of the system.
Bundles are a concept applied to directories laid out in a known structure. Without an actual bundle, CFBundleCreate can still be used to create a bundle from any specified directory and CFCopyLocalizedStringFromTableInBundle can be used to retrieve the localized strings.
All strings are currently hard-coded in the hosting components directly where they will be displayed. There are no utilities or infrastructure around resource strings. All strings that would require localization are from native components.
The host has multiple components that are deployed in different ways, live in separate places, and can be of different versions. This means that there will need to be separation between their resources as well. The approach to localization may also vary based on the different use cases for each component.
On Windows, the English resource strings are in separate mscorrc.debug and mscorrc libraries. On Linux and OSX, the English resource strings are compiled into coreclr itself (as string constants, not an embedded resource). String resources exist in both native (.rc) and managed (.resx).
Some infrastructure is in place for loading of resources on Windows, but is not fully tested. Infrastructure for resource loading on Linux existed (recently removed), but was also untested. There was never any attempt at support for resource loading on OSX.
Each host component will include English resource strings by default. If the resource for the appropriate locale could not be found, the host components can always fall back to English.
dotnet, hostfxr, and hostpolicy will each have separate resources that will be installed by the .NET runtime. These components can be different versions, so their resources must be separate.
apphost will have a single localized message. All languages will be included in the executable itself. The message will direct the user to a URL that will contain localized content.
ijwhost, and nethost intentionally do not show messages by default. They will not be localized.
comhost also intentionally does not show messages, but it does populate an IErrorInfo with an error string, allowing consumers to access any error messages. This can take the same approach as apphost, but would be a lower priority for localization.
dotnet, hostfxr, and hostpolicy are all included as part of a .NET Core install. They can each carry their own separate resources in a known path relative next to their current install locations.
The other entry-point hosts (apphost, comhost, ijwhost, nethost) add some complication as they represent and ship as part of the developer's application or component. They are also the most impactful in terms of file size, as they are not shared across applications the way that other components can be.
The messaging coming from the hosts themselves is a small portion of the host messaging. They are mostly around:
hostfxr (all hosts)hostfxr (all hosts)apphost only)Possible options for hosts:
Deploy resources with each host
Every host comes with its own resources, so compatibility will not be a problem
Hosts will still be localized when there is no runtime
Issues:
Options:
Embedded resources for hosts
Separate resource for each host
nethost adds more complication as it is up to the users to acquire and deploy it)Install resources with .NET Core
If the runtime is not installed/found, hosts will not have localized resources
If the hosts are newer than installed runtime, new messages would not be localized
Issues:
*host components are not normally part of the .NET runtime installOptions:
Separate resource for each host
Option: Shared resource for all hosts (except dotnet)
comhost, ijwhost, and nethost are designed to be consumed by a component that .NET Core does not own and intentionally do not show messages to the user. As such, they sit at a low priority for localization support.
apphost is the end-user-facing host. The amount of logic and messaging in apphost is intentionally limited. The most important message it contains for an end-user is for the missing .NET runtime scenario, so it should not rely on resources installed via the .NET runtime.
Embedding resources in the apphost would make for the most stream-lined user experience (particularly around deployment). Since the apphost is sensitive to size, the number of messages will be pared down to one generic localized message which directs the user to a URL.
Options:
Both (2) and (3) represent similar amounts of work. (2) would ensure that the single message would not need to change and no other messages would need to be added in the future. (3) provides a slightly nicer user experience. In all cases, the user would be shown a URL that would direct to localized content.
The mscorrc.debug and mscorrc resource libraries will be combined into one. All native components use resources from mscorrc.
System.Private.CoreLib will have System.Private.CoreLib.resources satellite assemblies and rely on the satellite assembly loading infrastructure in .NET Core to work.
Localization for native components will be based on the user's locale.
Standard methods for native localization use the user's locale. However, managed satellite assemblies also respect the thread's current culture. If the managed thread's current culture is not the same as the user's locale, this could result in mixed languages. Attempting to have the native components also follow the managed thread's culture would introduce issues and add significant complexity:
AssemblyDependencyResolver) and host components do not have a simple way to access the managed thread's culturegettext on Linux always uses the locale based on environment variables)All localizable resources need to be in the XLIFF file format (.xlf). New tooling will be required to convert from an untranslated base format to language-specific .xlf files and from the language-specific .xlf files to a format (.rc/.po/.strings, UTF-8/UTF-16) that will be compiled into resource libraries (or deployed directly) for each platform.
The existing xliff-tasks tooling supports conversion between managed resource files (.resx) and .xlf files and building satellite resource libraries. It is all MSBuild-based and has no concept of native resources or build processes. Extending it in a way that works naturally with CMake builds across platforms would be non-trivial.
It is also an option to create tooling directly integrated in the dotnet/runtime repo itself. This would not be a generic and reusable component outside of the dotnet/runtime repo and its build system.
Each platform has its own standard way and file formats for handling localization. There are two main approaches that can be taken here:
For single-file support without any separate files, (1) would not be sufficient and a custom solution (2) would be required. For single-file support where localized resources can be included as separate files, the platform-specific solutions could be used.
Resource script (.rc) files will be used as the main source of string values written in the development language (English). These base .rc files will be used to update .xlf files. Those which will then be compiled into language-specific resource libraries.
The host and runtime will follow the typical Windows method for localization of native string resources. Resources for each language will be compiled into a resource-only library, laid out in a language-specific subfolder. For example:
host/fxr/<version>
hostfxr.dll
fr
hostfxr.resources.dll
shared/Microsoft.NETCore.App/<version>
coreclr.dll
hostpolicy.dll
fr
hostpolicy.resources.dll
mscorrc.dll
At run time, the .NET Core host/runtime will find and load the resource DLL from the subfolder corresponding to the user's current locale. If the resource cannot be found, English will be the fallback.
The development language (English) strings will be compiled into the host and runtime directly. The language-specific .xlf files will be converted into .po files and then .mo files. Those binary .mo files will be laid out in a language-specific subfolder. For example:
host/fxr/<version>
libhostfxr.so
fr
LC_MESSAGES
hostfxr.mo
shared/Microsoft.NETCore.App/<version>
libcoreclr.so
libhostpolicy.so
fr
LC_MESSAGES
hostpolicy.mo
mscorrc.mo
The gettext APIs will be used to retrieve the appropriate message using the development language strings as the msgid.
The development language (English) strings will be compiled into the host and runtime directly. The language-specific .xlf files will be converted into .strings files. Those .strings files will be laid out in a language-specific subfolder. For example:
host/fxr/<version>
libhostfxr.dylib
fr.lproj
hostfxr.strings
shared/Microsoft.NETCore.App/<version>
libcoreclr.dylib
libhostpolicy.dylib
fr.lproj
hostpolicy.strings
mscorrc.strings
The CFCopyLocalizedStringFromTableInBundle API will be used to retrieve the appropriate message using the development language strings as the key.
The development language (English) strings will be compiled into the host and runtime directly. The language-specific .xlf files will be converted into a chosen storage format. Those files will be laid out in a language-specific subfolder. For example:
host/fxr/<version>
hostfxr.dll (.so/.dylib)
fr
hostfxr.resources
shared/Microsoft.NETCore.App/<version>
coreclr.dll (.so/.dylib)
hostpolicy.dll (.so/.dylib)
fr
hostpolicy.resources
mscorrc.resources
For single file, the language-specific resources will be bundled into the application's executable.
Cross-platform utilities will be created for resource loading. The reader/parser will support both reading from a file and memory.
Both the host and runtime require support for native localization. Since (with the exception of apphost) they would use the same approach, rather than each having their own copy, it would make sense for them to share utilities around finding and loading resources.
Ideally, the host and runtime could use the same static lib. However, even though they are now in the same repo, their builds are still fairly partitioned. A reasonable middle ground could be to have source files that are compiled into both the host and runtime components. This does have the complication that the host and runtime have separate PALs, so any shared code would need to work properly with both sides.
Installers and packages will need to include the language-specific resource files. This could involve updating all existing installers or could mean the creation of multiple new installers.
Exactly how resources should be delivered is an open question.
Messages from the host and runtime can be user-facing or developer-facing. Some developers do not want to have localized messages, so there should be some way to override localizing to the user's locale. This would need to be a setting that both the host and runtime could easily check.
On Windows, the native component fully controls which resource library to load, so it would be able to check for an override (like an environment variable). The gettext APIs on Linux essentially allow this kind of override through environment variables. There is not a clear way to configure the APIs on OSX to override the locale; the .strings files can be loaded/read as a dictionary directly through the CFPropertyListCreateWithStream API.
The SDK already allows overriding of the locale through the DOTNET_CLI_UI_LANGUAGE environment variable.
Any automated testing would likely also require some form of language override.
The standard way of doing native localization is based on having separate resources files. On Windows, it is possible to embed resources for multiple languages into one library and use FormatMessage or a combination of FindResourceEx and LoadResource to load the resource for a specific language. On Linux and OSX, no such platform support exists.
Extracting files to disk has proven to be extremely problematic across all platforms (permissions, anti-virus, clean up). Adding native resources to that extraction would only exacerbate the existing issues. This means that localized resources would need be read from memory. A custom solution would need to be created and maintained:
If support for localization of native components in single-file scenarios without separate resource files is a priority, it would make sense to just use the custom solution (that could handle both reading from files and memory) for non-single-file scenarios as well.
Since localization would not be needed by all applications, localized resources could also be considered an add-on to single-file and not part of the single-file itself. To support localization, the developer would need to include separate localized resources alongside the single-file executable. In this case, the platform-specific solution would be used. Since Windows does provide a supported way to handle multiple embedded localized resources, the experience for localization could also be improved on Windows such that it does embed all resources into one library.
WPF and WinForms always include all languages in an install. Does the host/runtime do the same or have separate language packs? How are they delivered (e.g. single installer with runtime and options for different languages, separate installer for languages with options for different languages, separate installer per language)?
Self-contained applications would need to include resources for all languages they support. Some developer input would be required to specify the desired language support and the SDK would need to be updated to handle the different options:
Note: Building WPF self-contained applications currently includes resources for all languages.