doc/hardware/porting/arch.rst
.. _architecture_porting_guide:
Architecture Porting Guide ##########################
An architecture port is needed to enable Zephyr to run on an :abbr:ISA (instruction set architecture) or an :abbr:ABI (Application Binary Interface) that is not currently supported.
The following are examples of ISAs and ABIs that Zephyr supports:
For information on Kconfig configuration, see
:ref:setting_configuration_values. Architectures use a Kconfig configuration
scheme similar to boards.
An architecture port can be divided in several parts; most are required and some are optional:
The early boot sequence: each architecture has different steps it must take when the CPU comes out of reset (required).
Interrupt and exception handling: each architecture handles asynchronous and unrequested events in a specific manner (required).
Thread context switching: the Zephyr context switch is dependent on the ABI and each ISA has a different set of registers to save (required).
Thread creation and termination: A thread's initial stack frame is ABI and architecture-dependent, and thread abortion possibly as well (required).
Device drivers: most often, the system clock timer and the interrupt controller are tied to the architecture (some required, some optional).
Utility libraries: some common kernel APIs rely on a architecture-specific implementation for performance reasons (required).
CPU idling/power management: most architectures implement instructions for putting the CPU to sleep (partly optional, most likely very desired).
Fault management: for implementing architecture-specific debug help and handling of fatal error in threads (partly optional).
Linker scripts and toolchains: architecture-specific details will most likely be needed in the build system and when linking the image (required).
Memory Management and Memory Mapping: for architecture-specific details on supporting memory management and memory mapping.
Stack Objects: for architecture-specific details on memory protection hardware regarding stack objects.
User Mode Threads: for supporting threads in user mode.
GDB Stub: for supporting GDB stub to enable remote debugging.
Early Boot Sequence
The goal of the early boot sequence is to take the system from the state it is after reset to a state where is can run C code and thus the common kernel initialization sequence. Most of the time, very few steps are needed, while some architectures require a bit more work to be performed.
Common steps for all architectures:
Setup an initial stack.
If running an :abbr:XIP (eXecute-In-Place) kernel, copy initialized data
from ROM to RAM.
If not using an ELF loader, zero the BSS section.
Jump to :code:z_cstart(), the early kernel initialization
z_cstart() is responsible for context switching out of the fake
context running at startup into the main thread.Some examples of architecture-specific steps that have to be taken:
Zephyr exposes several hooks (described in :zephyr_file:include/zephyr/platform/hooks.h)
that allow execution of SoC- or board-specific code at precise moments of the boot process.
The kernel takes care of calling most of the hooks from architecture-agnostic code. However, some hooks must be called during the early boot sequence; since the sequence is implemented in architecture-specific code, the call to the hooks must also be done there. The following gives a rough overview of the early boot sequence and when hooks should be called by architecture-specific code:
#. Execution begins in an architecture-specific entry point whose name
matches :kconfig:option:CONFIG_KERNEL_ENTRY.
#. Architecture-specific state is re-initialized immediately
(if :kconfig:option:CONFIG_INIT_ARCH_HW_AT_BOOT is enabled).
#. :c:func:soc_early_reset_hook is called.
.. note::
It is not necessary to set up a valid stack before calling this hook.
However, the hook is allowed to overwrite the stack pointer before returning.
The architecture-specific code must not expect the stack pointer register
value to be preserved across the call to :c:func:soc_early_reset_hook.
On architectures with multiple stack pointers, there is usually a *"primary"*
stack pointer accessible directly and *"secondary"* stack pointer register(s).
:c:func:`soc_early_reset_hook` implementations may overwrite the *"primary"*
stack pointer must **not** read or modify the value of any *"secondary"* stack
pointer. (This allows the architecture-specific code to set up any *"secondary"*
stack pointer it desires before calling :c:func:`soc_early_reset_hook`)
For example, the ARM Cortex-A architecture defines several execution modes,
each of which has its own stack pointer register :samp:`sp_{mode}`. When
the processor is executing in mode :samp:`{X}`, operations involving the
``sp`` general-purpose register operate on :samp:`sp_{X}`. On this
architecture, assuming that the processor is executing in mode :samp:`{M}`
when :c:func:`soc_early_reset_hook` is called, the hook is allowed to
overwrite :samp:`sp_{M}` (accessible through ``sp``) but **must not** read
or overwrite any other :samp:`sp_{mode}` (where :samp:`{mode} != {M}`).
:c:func:`soc_early_reset_hook` implementations are allowed to not return
execution to the architecture-specific code, in which case they "take over"
the system. Such hooks are not subject to the aforementioned rules and may
read or overwrite any stack pointer. However, when such an implementation
is provided, the rest of the early boot sequence obviously does not execute.
#. An initial stack is set up for next steps of the early boot sequence.
#. Architecture-specific "resume from suspend-to-RAM" logic is executed
.. note::
Refer to :kconfig:option:CONFIG_PM_S2RAM and the architecture-specific
implementation for more details, but note that the rest of the early boot
sequence is not executed if this logic determines that an exit from
suspend-to-RAM is ongoing.
#. :c:func:soc_reset_hook is called.
#. Architecture-specific operations (in assembly) are performed here...
#. :c:func:z_prep_c is called. This architecture-specific function is implemented in C.
#. :c:func:z_prep_c immediately calls :c:func:soc_prep_hook.
#. Architecture-specific operations (in C) are performed here...
#. :c:func:z_cstart is called. Architecture-agnostic code begins executing.
Interrupt and Exception Handling
Each architecture defines interrupt and exception handling differently.
When a device wants to signal the processor that there is some work to be done
on its behalf, it raises an interrupt. When a thread does an operation that is
not handled by the serial flow of the software itself, it raises an exception.
Both, interrupts and exceptions, pass control to a handler. The handler is
known as an :abbr:ISR (Interrupt Service Routine) in the case of
interrupts. The handler performs the work required by the exception or the
interrupt. For interrupts, that work is device-specific. For exceptions, it
depends on the exception, but most often the core kernel itself is responsible
for providing the handler.
The kernel has to perform some work in addition to the work the handler itself performs. For example:
Prior to handing control to the handler:
After getting control back from the handler:
This work is conceptually the same across architectures, but the details are completely different:
It thus needs an architecture-specific implementation, called the interrupt/exception stub.
Another issue is that the kernel defines the signature of ISRs as:
.. code-block:: C
void (*isr)(void *parameter)
Architectures do not have a consistent or native way of handling parameters to an ISR. As such there are two commonly used methods for handling the parameter.
Using some architecture defined mechanism, the parameter value is forced in the stub. This is commonly found in X86-based architectures.
The parameters to the ISR are inserted and tracked via a separate table
requiring the architecture to discover at runtime which interrupt is
executing. A common interrupt handler demuxer is installed for all entries of
the real interrupt vector table, which then fetches the device's ISR and
parameter from the separate table. This approach is commonly used in the ARC
and ARM architectures via the :kconfig:option:CONFIG_GEN_ISR_TABLES implementation.
You can find examples of the stubs by looking at :code:_interrupt_enter() in
x86, :code:_isr_wrapper() in ARM, or the full implementation description for
ARC in :zephyr_file:arch/arc/core/isr_wrapper.S.
Each architecture also has to implement primitives for interrupt control:
irq_lock(), :c:macro:irq_unlock().IRQ_CONNECT().irq_priority_set.irq_enable(), :c:macro:irq_disable()... note::
:c:macro:IRQ_CONNECT is a macro that uses assembler and/or linker script
tricks to connect interrupts at build time, saving boot time and text size.
The vector table should contain a handler for each interrupt and exception that
can possibly occur. The handler can be as simple as a spinning loop. However,
we strongly suggest that handlers at least print some debug information. The
information helps figuring out what went wrong when hitting an exception that
is a fault, like divide-by-zero or invalid memory access, or an interrupt that
is not expected (:dfn:spurious interrupt). See the ARM implementation in
:zephyr_file:arch/arm/core/cortex_m/fault.c for an example.
Thread Context Switching
Multi-threading is the basic purpose to have a kernel at all. Zephyr supports two types of threads: preemptible and cooperative. The rules for determining the next thread to schedule are handled by the kernel. However, it is up to the architecture port to implement the method of the context switch itself.
Zephyr provides two mutually exclusive interfaces for context switching. The
preferred interface to use is :code:arch_switch which is selected when
:kconfig:option:CONFIG_USE_SWITCH is enabled. The alternative interface is
:code:arch_swap--selected when :kconfig:option:CONFIG_USE_SWITCH
is disabled. When porting to a new architecture, only one of these needs to
implemented; however, for SMP platforms it must be :code:arch_switch.
A context switch can happen in several circumstances:
When a thread executes a blocking operation, such as taking a semaphore that is currently unavailable.
When a preemptible thread unblocks a thread of higher priority by releasing the object on which it was blocked.
When an interrupt unblocks a thread of higher priority than the one currently executing, if the currently executing thread is preemptible.
When a thread runs to completion.
When a thread causes a fatal exception and is removed from the running threads. For example, referencing invalid memory,
Therefore, the context switching must thus be able to handle all these cases.
There are two types of context switches: :dfn:cooperative and :dfn:preemptive.
A cooperative context switch happens when a thread willfully gives the control to another thread. There are two cases where this happens
A preemptive context switch happens either because an ISR or a thread causes an operation that schedules a thread of higher priority than the one currently running, if the currently running thread is preemptible. An example of such an operation is releasing an object on which the thread of higher priority was waiting.
.. note::
Control is never taken from cooperative thread when one of them is the running thread.
A cooperative context switch is always done by having a thread call the
internal kernel routine :code:z_swap (or one of its variants). This in turn
will call either :code:arch_switch or :code:arch_swap as appropriate.
When these are called, no checks are done to determine if the context switch is
to happen--the context switch must happen.
.. note::
On x86 and Nios2, :code:arch_swap is generic enough and the architecture
flexible enough that it can be called when exiting an interrupt to provoke
the context switch. This should not be taken as a rule, since
neither the ARM Cortex-M nor ARCv2 port do this.
Since :code:z_swap is cooperative, the caller-saved registers from the ABI are
already on the stack. There is no need to save them in the k_thread structure.
A context switch can also be performed preemptively. This happens upon exiting an ISR, in the kernel interrupt exit stub:
_interrupt_enter on x86 after the handler is called.z_arm_exc_exit and :code:z_arm_int_exit on ARM._firq_exit and :code:_rirq_exit on ARCv2.The decision logic to invoke the context switch is simple and is only performed when exiting a non-nested interrupt:
When :kconfig:option:CONFIG_USE_SWITCH is enabled ...
z_get_next_switch_handle, and
return to the thread context identified by the returned switch handleWhen :kconfig:option:CONFIG_USE_SWITCH is not enabled ...
The interrupt exit code shall fetch the cached thread from the ready queue, and:
This is simple, but crucial: if this is not implemented correctly, the kernel will not function as intended and will experience bizarre crashes, mostly due to stack corruption.
Thread Creation and Termination
To start a new thread, a stack frame must be constructed so that the context
switch can pop it the same way it would pop one from a thread that had been
context switched out. This is to be implemented in an architecture-specific
:code:_new_thread internal routine.
The thread entry point is also not to be called directly, i.e. it should not be
set as the :abbr:PC (program counter) for the new thread. Rather it must be
wrapped in :code:_thread_entry. This means that the PC in the stack
frame shall be set to :code:_thread_entry, and the thread entry point shall
be passed as the first parameter to :code:_thread_entry. The specifics of
this depend on the ABI.
The need for an architecture-specific thread termination implementation depends on the architecture. There is a generic implementation, but it might not work for a given architecture.
One reason that has been encountered for having an architecture-specific implementation of thread termination is that aborting a thread might be different if aborting because of a graceful exit or because of an exception. This is the case for ARM Cortex-M, where the CPU has to be taken out of handler mode if the thread triggered a fatal exception, but not if the thread gracefully exits its entry point function.
This means implementing an architecture-specific version of
:c:func:k_thread_abort, and setting the Kconfig option
:kconfig:option:CONFIG_ARCH_HAS_THREAD_ABORT as needed for the architecture (e.g. see
:zephyr_file:arch/arm/core/cortex_m/Kconfig).
Thread Local Storage
To enable thread local storage on a new architecture:
#. Implement :c:func:arch_tls_stack_setup to setup the TLS storage area in
stack. Refer to the toolchain documentation on how the storage area needs
to be structured. Some helper functions can be used:
z_tls_data_size returns the size
needed for thread local variables (excluding any extra data required by
toolchain and architecture).z_tls_copy prepares the TLS storage area for
thread local variables. This only copies the variable themselves and
does not do architecture and/or toolchain specific data.#. In the context switching, grab the tls field inside the new thread's
struct k_thread and put it into an appropriate register (or some
other variable) for access to the TLS storage area. Refer to toolchain
and architecture documentation on which registers to use.
#. In kconfig, add select CONFIG_ARCH_HAS_THREAD_LOCAL_STORAGE to
kconfig related to the new architecture.
#. Run the tests/kernel/threads/tls to make sure the new code works.
Device Drivers
The kernel requires very few hardware devices to function. In theory, the only required device is the interrupt controller, since the kernel can run without a system clock. In practice, to get access to most, if not all, of the sanity check test suite, a system clock is needed as well. Since these two are usually tied to the architecture, they are part of the architecture port.
There can be significant differences between the interrupt controllers and the interrupt concepts across architectures.
For example, x86 has the concept of an :abbr:IDT (Interrupt Descriptor Table)
and different interrupt controllers. The position of an interrupt in the IDT
determines its priority.
On the other hand, the ARM Cortex-M has the :abbr:NVIC (Nested Vectored Interrupt Controller) as part of the architecture definition. There is no need
for an IDT-like table that is separate from the NVIC vector table. The position
in the table has nothing to do with priority of an IRQ: priorities are
programmable per-entry.
The ARCv2 has its interrupt unit as part of the architecture definition, which is somewhat similar to the NVIC. However, where ARC defines interrupts as having a one-to-one mapping between exception and interrupt numbers (i.e. exception 1 is IRQ1, and device IRQs start at 16), ARM has IRQ0 being equivalent to exception 16 (and weirdly enough, exception 1 can be seen as IRQ-15).
All these differences mean that very little, if anything, can be shared between architectures with regards to interrupt controllers.
x86 has APIC timers and the HPET as part of its architecture definition. ARM Cortex-M has the SYSTICK exception. Finally, ARCv2 has the timer0/1 device.
Kernel timeouts are handled in the context of the system clock timer driver's interrupt handler.
There is one other device that is almost a requirement for an architecture
port, since it is so useful for debugging. It is a simple polling, output-only,
serial port driver on which to send the console (:code:printk,
:code:printf) output.
It is not required, and a RAM console (:kconfig:option:CONFIG_RAM_CONSOLE)
can be used to send all output to a circular buffer that can be read
by a debugger instead.
Utility Libraries
The kernel depends on a few functions that can be implemented with very few instructions or in a lock-less manner in modern processors. Those are thus expected to be implemented as part of an architecture port.
Atomic operators.
If instructions do exist for a given architecture, the implementation is
configured using the :kconfig:option:CONFIG_ATOMIC_OPERATIONS_ARCH Kconfig
option.
If instructions do not exist for a given architecture,
a generic version that wraps :c:func:irq_lock or :c:func:irq_unlock
around non-atomic operations exists. It is configured using the
:kconfig:option:CONFIG_ATOMIC_OPERATIONS_C Kconfig option.
Find-least-significant-bit-set and find-most-significant-bit-set.
It is possible to use compiler built-ins to implement these, but be careful they use the required compiler barriers.
CPU Idling/Power Management
The kernel provides support for CPU power management with two functions:
:c:func:arch_cpu_idle and :c:func:arch_cpu_atomic_idle.
:c:func:arch_cpu_idle can be as simple as calling the power saving
instruction for the architecture with interrupts unlocked, for example
:code:hlt on x86, :code:wfi or :code:wfe on ARM, :code:sleep on ARC.
This function can be called in a loop within a context that does not care if it
get interrupted or not by an interrupt before going to sleep. There are
basically two scenarios when it is correct to use this function:
In a single-threaded system, in the only thread when the thread is not used for doing real work after initialization, i.e. it is sitting in a loop doing nothing for the duration of the application.
In the idle thread.
:c:func:arch_cpu_atomic_idle, on the other hand, must be able to atomically
re-enable interrupts and invoke the power saving instruction. It can thus be
used in real application code, again in single-threaded systems.
Normally, idling the CPU should be left to the idle thread, but in some very special scenarios, these APIs can be used by applications.
Both functions must exist for a given architecture. However, the implementation can be simply the following steps, if desired:
#. unlock interrupts #. NOP
However, a real implementation is strongly recommended.
Fault Management
In the event of an unhandled CPU exception, the architecture
code must call into :c:func:z_fatal_error. This function dumps
out architecture-agnostic information and makes a policy
decision on what to do next by invoking :c:func:k_sys_fatal_error.
This function can be overridden to implement application-specific
policies that could include locking interrupts and spinning forever
(the default implementation) or even powering off the
system (if supported).
Toolchain and Linking
Toolchain support has to be added to the build system.
Some architecture-specific definitions are needed in :zephyr_file:include/zephyr/toolchain/gcc.h.
See what exists in that file for currently supported architectures.
Each architecture also needs its own linker script, even if most sections can be derived from the linker scripts of other architectures. Some sections might be specific to the new architecture, for example the SCB section on ARM and the IDT section on x86.
Memory Management and Memory Mapping
If the target platform enables paging and requires drivers to memory-map
their I/O regions, :kconfig:option:CONFIG_MMU needs to be enabled and the
following API implemented:
arch_mem_maparch_mem_unmaparch_page_phys_getStack Objects
The presence of memory protection hardware affects how stack objects are
created. All architecture ports must specify the required alignment of the
stack pointer, which is some combination of CPU and ABI requirements. This
is defined in architecture headers with :c:macro:ARCH_STACK_PTR_ALIGN and
is typically something small like 4, 8, or 16 bytes.
Two types of thread stacks exist:
"kernel" stacks defined with :c:macro:K_KERNEL_STACK_DEFINE() and related
APIs, which can host kernel threads running in supervisor mode or
used as the stack for interrupt/exception handling. These have significantly
relaxed alignment requirements and use less reserved data. No memory is
reserved for privilege elevation stacks.
"thread" stacks which typically use more memory, but are capable of hosting thread running in user mode, as well as any use-cases for kernel stacks.
If :kconfig:option:CONFIG_USERSPACE is not enabled, "thread" and "kernel" stacks are
equivalent.
Additional macros may be defined in the architecture layer to specify the alignment of the base of stack objects, any reserved data inside the stack object not used for the thread's stack buffer, and how to round up stack sizes to support user mode threads. In the absence of definitions some defaults are assumed:
ARCH_KERNEL_STACK_RESERVED: default no reserved spaceARCH_THREAD_STACK_RESERVED: default no reserved spaceARCH_KERNEL_STACK_OBJ_ALIGN: default align to
:c:macro:ARCH_STACK_PTR_ALIGNARCH_THREAD_STACK_OBJ_ALIGN: default align to
:c:macro:ARCH_STACK_PTR_ALIGNARCH_THREAD_STACK_SIZE_ALIGN: default round up to
:c:macro:ARCH_STACK_PTR_ALIGNAll stack creation macros are defined in terms of these.
Stack objects all have the following layout, with some regions potentially zero-sized depending on configuration. There are always two main parts: reserved memory at the beginning, and then the stack buffer itself. The bounds of some areas can only be determined at runtime in the context of its associated thread object. Other areas are entirely computable at build time.
Some architectures may need to carve-out reserved memory at runtime from the
stack buffer, instead of unconditionally reserving it at build time, or to
supplement an existing reserved area (as is the case with the ARM FPU).
Such carve-outs will always be tracked in thread.stack_info.start.
The region specified by thread.stack_info.start and
thread.stack_info.size is always fully accessible by a user mode thread.
thread.stack_info.delta denotes an offset which can be used to compute
the initial stack pointer from the very end of the stack object, taking into
account storage for TLS and ASLR random offsets.
.. code-block:: none
+---------------------+ <- thread.stack_obj | Reserved Memory | } K_(THREAD|KERNEL)_STACK_RESERVED +---------------------+ | Carved-out memory | |.....................| <- thread.stack_info.start | Unused stack buffer | | | |.....................| <- thread's current stack pointer | Used stack buffer | | | |.....................| <- Initial stack pointer. Computable | ASLR Random offset | with thread.stack_info.delta +---------------------| <- thread.userspace_local_data | Thread-local data | +---------------------+ <- thread.stack_info.start + thread.stack_info.size
At present, Zephyr does not support stacks that grow upward.
If no memory protection is in use, then the defaults are sufficient.
This option uses hardware features to generate a fatal error if a thread in supervisor mode overflows its stack. This is useful for debugging, although for a couple reasons, you can't reliably make any assertions about the state of the system after this happens:
The kernel could have been inside a critical section when the overflow occurs, leaving important global data structures in a corrupted state.
For systems that implement stack protection using a guard memory region, it's possible to overshoot the guard and corrupt adjacent data structures before the hardware detects this situation.
To enable the :kconfig:option:CONFIG_HW_STACK_PROTECTION feature, the system must
provide some kind of hardware-based stack overflow protection, and enable the
:kconfig:option:CONFIG_ARCH_HAS_STACK_PROTECTION option.
Two forms of HW-based stack overflow detection are supported: dedicated CPU features for this purpose, or special read-only guard regions immediately preceding stack buffers.
:kconfig:option:CONFIG_HW_STACK_PROTECTION only catches stack overflows for
supervisor threads. This is not required to catch stack overflow from user
threads; :kconfig:option:CONFIG_USERSPACE is orthogonal.
This feature only detects supervisor mode stack overflows, including stack overflows when handling system calls. It doesn't guarantee that the kernel has not been corrupted. Any stack overflow in supervisor mode should be treated as a fatal error, with no assertions about the integrity of the overall system possible.
Stack overflows in user mode are recoverable (from the kernel's perspective)
and require no special configuration; :kconfig:option:CONFIG_HW_STACK_PROTECTION
only applies to catching overflows when the CPU is in supervisor mode.
If we are detecting stack overflows in supervisor mode via special CPU registers (like ARM's SPLIM), then the defaults are sufficient.
We are detecting supervisor mode stack overflows via special memory protection region located immediately before the stack buffer that generates an exception on write. Reserved memory will be used for the guard region.
:c:macro:ARCH_KERNEL_STACK_RESERVED should be defined to the minimum size
of a memory protection region. On most ARM CPUs this is 32 bytes.
:c:macro:ARCH_KERNEL_STACK_OBJ_ALIGN should also be set to the required
alignment for this region.
MMU-based systems should not reserve RAM for the guard region and instead simply leave an non-present virtual page below every stack when it is mapped into the address space. The stack object will still need to be properly aligned and sized to page granularity.
.. code-block:: none
+-----------------------------+ <- thread.stack_obj | Guard reserved memory | } K_KERNEL_STACK_RESERVED +-----------------------------+ | Guard carve-out | |.............................| <- thread.stack_info.start | Stack buffer | . .
Guard carve-outs for kernel stacks are uncommon and should be avoided if possible. They tend to be needed for two situations:
The same stack may be re-purposed to host a user thread, in which case the guard is unnecessary and shouldn't be unconditionally reserved. This is the case when privilege elevation stacks are not inside the stack object.
The required guard size is variable and depends on context. For example, some ARM CPUs have lazy floating point stacking during exceptions and may decrement the stack pointer by a large value without writing anything, completely overshooting a minimally-sized guard and corrupting adjacent memory. Rather than unconditionally reserving a larger guard, the extra memory is carved out if the thread uses floating point.
Enabling user mode activates two new requirements:
A separate fixed-sized privilege mode stack, specified by
:kconfig:option:CONFIG_PRIVILEGED_STACK_SIZE, must be allocated that the user
thread cannot access. It is used as the stack by the kernel when handling
system calls. If stack guards are implemented, a stack guard region must
be able to be placed before it, with support for carve-outs if necessary.
The memory protection hardware must be able to program a region that exactly
covers the thread's stack buffer, tracked in thread.stack_info. This
implies that :c:macro:ARCH_THREAD_STACK_SIZE_ADJUST() will need to round
up the requested stack size so that a region may cover it, and that
:c:macro:ARCH_THREAD_STACK_OBJ_ALIGN() is also specified per the
granularity of the memory protection hardware.
This becomes more complicated if the memory protection hardware requires that
all memory regions be sized to a power of two, and aligned to their own size.
This is common on older MPUs and is known with
:kconfig:option:CONFIG_MPU_REQUIRES_POWER_OF_TWO_ALIGNMENT.
thread.stack_info always tracks the user-accessible part of the stack
object, it must always be correct to program a memory protection region with
user access using the range stored within.
On systems without power-of-two region requirements, the reserved memory area
for threads stacks defined by :c:macro:K_THREAD_STACK_RESERVED may be used to
contain the privilege mode stack. The layout could be something like:
.. code-block:: none
+------------------------------+ <- thread.stack_obj | Other platform data | +------------------------------+ | Guard region (if enabled) | +------------------------------+ | Guard carve-out (if needed) | |..............................| | Privilege elevation stack | +------------------------------| <- thread.stack_obj + | Stack buffer | K_THREAD_STACK_RESERVED = . . thread.stack_info.start
The guard region, and any carve-out (if needed) would be configured as a read-only region when the thread is created.
If the thread is a supervisor thread, the privilege elevation region is just extra stack memory. An overflow will eventually crash into the guard region.
If the thread is running in user mode, a memory protection region will be configured to allow user threads access to the stack buffer, but nothing before or after it. An overflow in user mode will crash into the privilege elevation stack, which the user thread has no access to. An overflow when handling a system call will crash into the guard region.
On an MMU system there should be no physical guards; the privilege mode stack will be mapped into kernel memory, and the stack buffer in the user part of memory, each with non-present virtual guard pages below them to catch runtime stack overflows.
Other platform data may be stored before the guard region, but this is highly
discouraged if such data could be stored in thread.arch somewhere.
:c:macro:ARCH_THREAD_STACK_RESERVED will need to be defined to capture
the size of the reserved region containing platform data, privilege elevation
stacks, and guards. It must be appropriately sized such that an MPU region
to grant user mode access to the stack buffer can be placed immediately
after it.
Thread stack objects must be sized and aligned to the same power of two, without any reserved memory to allow efficient packing in memory. Thus, any guards in the thread stack must be completely carved out, and the privilege elevation stack must be allocated elsewhere.
:c:macro:ARCH_THREAD_STACK_SIZE_ADJUST() and
:c:macro:ARCH_THREAD_STACK_OBJ_ALIGN() should both be defined to
:c:macro:Z_POW2_CEIL(). :c:macro:K_THREAD_STACK_RESERVED must be 0.
For the privilege stacks, the :kconfig:option:CONFIG_GEN_PRIV_STACKS must be,
enabled. For every thread stack found in the system, a corresponding fixed-size
kernel stack used for handling system calls is generated. The address
of the privilege stacks can be looked up quickly at runtime based on the
thread stack address using :c:func:z_priv_stack_find(). These stacks are
laid out the same way as other kernel-only stacks.
.. code-block:: none
+-----------------------------+ <- z_priv_stack_find(thread.stack_obj) | Reserved memory | } K_KERNEL_STACK_RESERVED +-----------------------------+ | Guard carve-out (if needed) | |.............................| | Privilege elevation stack | | | +-----------------------------+ <- z_priv_stack_find(thread.stack_obj) + K_KERNEL_STACK_RESERVED + CONFIG_PRIVILEGED_STACK_SIZE
+-----------------------------+ <- thread.stack_obj | MPU guard carve-out | | (supervisor mode only) | |.............................| <- thread.stack_info.start | Stack buffer | . .
The guard carve-out in the thread stack object is only used if the thread is
running in supervisor mode. If the thread drops to user mode, there is no guard
and the entire object is used as the stack buffer, with full access to the
associated user mode thread and thread.stack_info updated appropriately.
User Mode Threads
To support user mode threads, several kernel-to-arch APIs need to be
implemented, and the system must enable the :kconfig:option:CONFIG_ARCH_HAS_USERSPACE
option. Please see the documentation for each of these functions for more
details:
:c:func:arch_buffer_validate to test whether the current thread has
access permissions to a particular memory region
:c:func:arch_user_mode_enter which will irreversibly drop a supervisor
thread to user mode privileges. The stack must be wiped.
:c:func:arch_syscall_oops which generates a kernel oops when system
call parameters can't be validated, in such a way that the oops appears to be
generated from where the system call was invoked in the user thread
:c:func:arch_syscall_invoke0 through
:c:func:arch_syscall_invoke6 invoke a system call with the
appropriate number of arguments which must all be passed in during the
privilege elevation via registers.
:c:func:arch_is_user_context return nonzero if the CPU is currently
running in user mode
:c:func:arch_mem_domain_max_partitions_get which indicates the max
number of regions for a memory domain. MMU systems have an unlimited amount,
MPU systems have constraints on this.
Some architectures may need to update software memory management structures
or modify hardware registers on another CPU when memory domain APIs are invoked.
If so, :kconfig:option:CONFIG_ARCH_MEM_DOMAIN_SYNCHRONOUS_API must be selected by the
architecture and some additional APIs must be implemented. This is common
on MMU systems and uncommon on MPU systems:
:c:func:arch_mem_domain_thread_add
:c:func:arch_mem_domain_thread_remove
:c:func:arch_mem_domain_partition_add
:c:func:arch_mem_domain_partition_remove
Please see the doxygen documentation of these APIs for details.
In addition to implementing these APIs, there are some other tasks as well:
:c:func:_new_thread needs to spawn threads with :c:macro:K_USER in
user mode
On context switch, the outgoing thread's stack memory should be marked inaccessible to user mode by making the appropriate configuration changes in the memory management hardware.. The incoming thread's stack memory should likewise be marked as accessible. This ensures that threads can't mess with other thread stacks.
On context switch, the system needs to switch between memory domains for the incoming and outgoing threads.
Thread stack areas must include a kernel stack region. This should be inaccessible to user threads at all times. This stack will be used when system calls are made. This should be fixed size for all threads, and must be large enough to handle any system call.
A software interrupt or some kind of privilege elevation mechanism needs to
be established. This is closely tied to how the _arch_syscall_invoke macros
are implemented. On system call, the appropriate handler function needs to
be looked up in _k_syscall_table. Bad system call IDs should jump to the
:c:enum:K_SYSCALL_BAD handler. Upon completion of the system call, care
must be taken not to leak any register state back to user mode.
GDB Stub
To enable GDB stub for remote debugging on a new architecture:
#. Create a new gdbstub.h header file under appropriate architecture
include directory (:file:include/zephyr/arch/<arch>/gdbstub.h).
Create a new struct struct gdb_ctx as the GDB context.
Must define a member named exception of type unsigned int to
store the GDB exception reason. This value needs to be set before
entering :c:func:z_gdb_main_loop.
Architecture can define as many members as needed for GDB stub to function.
Pointer to this struct needs to be passed to :c:func:z_gdb_main_loop,
where this pointer will be passed to other GDB stub functions.
#. Functions for entering and exiting GDB stub main loop.
If the architecture relies on interrupts to service breakpoints, interrupt service routines (ISR) need to be implemented, which will serve as the entry point to GDB stub main loop.
These functions need to save and restore context so code execution can continue as if no breakpoints have been encountered.
These functions need to call :c:func:z_gdb_main_loop after saving
execution context to go into the GDB stub main loop to receive commands
from GDB.
Before calling :c:func:z_gdb_main_loop, :c:member:gdb_ctx.exception
must be set to specify the exception reason.
#. Implement necessary functions to support GDB stub functionality:
:c:func:arch_gdb_init
This needs to initialize necessary bits to support GDB stub functionality, for example, setting up the GDB context and connecting debug interrupts.
This must stop code execution via architecture specific method (e.g. raising debug interrupts). This allows GDB to connect during boot.
:c:func:arch_gdb_continue
c or continue command
to continue code execution.:c:func:arch_gdb_step
si or stepi command
to execute one machine instruction, before returning to GDB prompt.Hardware register read/write functions:
Since the GDB stub is running on the target, manipulation of hardware registers need to cached to avoid affecting the execution of GDB stub. Think of it as context switching, where the execution context is changed to the GDB stub. So that the register values of the running thread before context switch need to be stored. Manipulation of register values must only be done to this cached copy. The updated values will then be written to hardware registers before switching back to the previous running thread.
:c:func:arch_gdb_reg_readall
This collects all hardware register values that would appear in
a g/G packets which will be sent back to GDB. The format of
the G-packet is architecture specific. Consult GDB on what is
expected.
Note that, for most architectures, a valid G-packet must be returned and sent to GDB. If a packet without incorrect length is sent to GDB, GDB will abort the debugging session.
:c:func:arch_gdb_reg_writeall
:c:func:arch_gdb_reg_readone
:c:func:arch_gdb_reg_writeone
Breakpoints:
:c:func:arch_gdb_add_breakpoint and
:c:func:arch_gdb_remove_breakpoint
GDB may decide to use software breakpoints which modifies
the memory at the breakpoint locations to replace the instruction
with software breakpoint or trap instructions. GDB will then
restore the memory content once execution reaches the breakpoints.
GDB supports this by default and there is usually no need to
handle software breakpoints in the architecture code (where
breakpoint type is 0).
Hardware breakpoints (type 1) are required if the code is
in ROM or flash that cannot be modified at runtime. Consult
the architecture datasheet on how to enable hardware breakpoints.
If hardware breakpoints are not supported by the architecture, there is no need to implement these in architecture code. GDB will then rely on software breakpoints.
#. For architecture where certain memory regions are not accessible,
an array named :c:var:gdb_mem_region_array of type
:c:struct:gdb_mem_region needs to be defined to specify regions
that are accessible. For each array item:
:c:member:gdb_mem_region.start specifies the start of a memory
region.
:c:member:gdb_mem_region.end specifies the end of a memory
region.
:c:member:gdb_mem_region.attributes specifies the permission
of a memory region.
:c:macro:GDB_MEM_REGION_RO: region is read-only.
:c:macro:GDB_MEM_REGION_RW: region is read-write.
:c:member:gdb_mem_region.alignment specifies read/write alignment
of a memory region. Use 0 if there is no alignment requirement
and read/write can be done byte-by-byte.
API Reference
.. doxygengroup:: arch-timing
.. doxygengroup:: arch-threads
.. doxygengroup:: arch-tls
.. doxygengroup:: arch-pm
.. doxygengroup:: arch-smp
.. doxygengroup:: arch-irq
.. doxygengroup:: arch-userspace
.. doxygengroup:: arch-mmu
.. doxygengroup:: arch-misc
.. doxygengroup:: arch-gdbstub