doc/xnu_build_consolidation.md
XNU is supported on approximately 20 different targets. Whilst in some cases the differences between two given targets are small (e.g. when they both support the same ISA), XNU has traditionally required to have separate builds in cases where the topology of the targets differ (for example, when they feature different core/cluster counts or cache sizes). Similarly, SoC-specific fix-ups are usually conditionally compiled based on the target.
Given the time it takes to compile all three different variants (release, debug and development) for each supported SoC, usually several times a day for various teams across Apple, the goal of this project was to reduce the number of existing builds, as well as to set up a simple framework that makes it easier to share builds across different SoCs moving forward.
Although this effort could be extended to KEXTs, and hence lead to shared KernelCaches across devices, the scope of this document only includes XNU. In cases where KEXTs also differ across targets, or perhaps the required KEXTs are completely different in the first place, the kernel still needs to be linked appropriately with different sets of KEXTs and hence KernelCaches cannot be shared.
The kernel itself is relatively SoC-agnostic, although strongly architecture-dependent; this is because most of the SoC-specific aspects of the KernelCache are abstracted by the KEXTs. Things that pertain to the kernel include:
All of these problems can be solved through a mix of relying more heavily on device tree information and performing runtime checks. The latter is possible because both the ARM architecture and the Apple's extensions provide r/o registers that can be checked at runtime to discover supported features as well as various CPU-specific parameters.
Although not often, the kernel may still require deriving, one way or another, parameters like cache sizes
and number of set/ways. XNU needs most of this information to perform set/way clean/invalidate operations.
Prior to this work, these values were hardcoded for each supported target in proc_reg.h, and used in
caches_asm.s. The ARM architecture provides the CCSIDR_EL1 register, which can be used in conjunction
with CSSELR_EL1 to select the target cache and obtain geometry information.
CPU and revision checks may be required at various places, although the focus here has been the application of tunables at boot time.
Tunables are often applied:
This has led in the past to a number of nested, conditionally-compiled blocks of code that are not easy to understand or manage as new tunables are added or SoCs/revisions are deprecated.
The changes applied as part of this work focus mainly on:
start.s.tunables_h7.h, or tunables_h11.h).All of the macros live in the 64-bit version of proc_reg.h, and are SoC-agnostic; they simply check the
MIDR_EL1 register against a CPU revision that is passed as a parameter to the macro, where applicable.
Similarly, where a block of code is to be executed on a core type, rather than a specific core ID, a couple
of the provided macros can check this against MPIDR_EL1.
Some architectural features are optional, which means that, when disabled at compile-time, this may cause two same-generation SoCs to diverge.
Rather than disabling features, and assuming this does not pose security risks or performance regressions,
the preferred approach is to compile them in, but perform runtime checks to enable/disable them, possibly in
early boot. The way these checks are performed varies from feature to feature (for example, VHE is an ARM
feature, and the ARM ARM specifies how it can be discovered). For Apple-specific features, these are all
advertised through the AIDR_EL1 register. One of the changes is the addition of a function,
ml_feature_supported(), that may be used to check for the presence of a feature at runtime.
One of the aspects that until now has been hardcoded in XNU is the system topology: number of cores/clusters and their physical IDs. This effort piggybacks on other recent XNU changes which aimed to consolidate topology-related information into XNU, by parsing it from the device tree and exporting it to KEXTs through well-defined APIs.
Changes applied as part of the XNU consolidation project include:
ml_* API to extract cluster information from the topology parser. New APIs include the following:
ml_get_max_cluster_number()ml_get_cluster_count()ml_get_first_cpu_id()CPU_COUNT) and cluster counts (ARM_CLUSTER_COUNT) from XNU, and
replacing them with ml_* calls.In some cases, certain statically-allocated arrays/structures need to be cache line-aligned, or have one element per core or cluster. Whilst this information is not known precisely at compile time anymore, the following macros have been added to provide a reasonably close upper bound:
MAX_CPUSMAX_CPU_CLUSTERSMAX_L2_CLINEThese macros are defined in board_config.h, and should be set to the same value for a group of targets
sharing a single build. Note that these no longer reflect actual counts and sizes, and the real values need
to be queried at runtime through the ml_ API.
The L1 cache line size is still hardcoded, and defined as MMU_CLINE. Since this value is always the same
and very often checked at various places across XNU and elsewhere, it made sense to keep it as a compile
time macro rather than relying on runtime checks.
Currently, a family of per-SoC macros are defined at build time to enable XNU to conditionally compile code
for different targets. These are named ARM[64]_BOARD_CONFIG_[TARGET_NAME], and have historically been used
in different places across the kernel; for example, when applying tunables, various fixes, or enabling
disabling features. In order not to create divergences in the future across same-generation SoCs, but also
to keep the codebase consistent, the recommendation is to avoid the use of these macros whenever possible.
Instead, XNU itself defines yet another family of macros that are defined for all targets of a particular
generation. These are named after the P-CORE introduced by each (for example, APPLEMONSOON, or
APPLEVORTEX), and are preferred over the SoC-specific ones. Where a generation macro is not enough to
provide correctness (which happens, for example, when the code block at hand should not be executed on a
given SoC of the same family), appropriate runtime checks can be performed inside the conditionally-compiled
code block. machine_read_midr() and get_arm_cpu_version() may be used for this purpose.