docs/user-manual/framework/run-multi-core.md
F´ provides support for leveraging multi-core processor architectures in embedded systems. This guide covers the fundamentals of multi-core processing, how F´ supports it, and practical guidelines for designing multi-core F´ applications.
Many modern processors have multiple processing cores. Each core executes its own code, allowing multiple execution contexts to run simultaneously. There are two main architectural approaches for managing multi-core systems:
SMP is where one operating system manages all cores as a compute resource. The OS provides APIs to "pin" a thread to a particular core, or to allow the OS to dynamically assign threads based on loading.
In an SMP system:
AMP is where more than one operating system runs, with a subset of cores assigned to each OS. This is usually managed by a hypervisor that partitions the cores among different OS instances.
In an AMP system:
The operating system provides APIs to control thread-to-core assignment, enabling developers to optimize for specific performance characteristics.
F´ provides an API in the OS abstraction layer that allows a thread to be pinned to a specific core. The relevant
interface is defined in Os/Task.hpp as part of the Arguments class used when starting a task:
//! \param cpuAffinity: (optional) cpu affinity of this task
The default behavior is to let the OS dynamically assign the thread to cores. The cpuAffinity parameter is delegated to the operating
system implementation, which handles the platform-specific details of core affinity. For example, the Posix OSAL implementation delegates to the pthread_attr_setaffinity_np function (see source).
Synchronization objects like mutexes are delegated to the OS and are SMP-safe based on the operating system implementation.
F´ is not inherently SMP "safe". It relies on the OS implementation and developer expertise to ensure safe multi-core operation. Developers must understand the threading model and synchronization requirements of their specific deployment.
Atomic operations: Some portions of F´ use U32 types to synchronize between threads. In many systems this is a
safe atomic operation; however, this is not guaranteed in all systems. Projects should verify that their system
behaves as expected. These usages are under review and will be corrected over time.
Like anything else, there are tradeoffs for SMP, and there are different opinions as to the best approach. Here are some guidelines for assigning threads to cores based on different optimization goals.
Assign threads doing a particular function to their own cores. This provides predictable, deterministic behavior.
Examples:
Considerations:
Allow the OS to assign threads to cores dynamically. This maximizes CPU utilization and can improve overall throughput.
Benefits:
Best used when:
Assign threads that share large data sets to the same core. This improves memory access patterns and reduces cache contention.
Benefits:
Important for:
A mixed pattern of pinning some threads and allowing the OS to select for others can work effectively. For example:
Users need to be aware of data-sharing and reentrancy issues when designing multi-core applications. Proper synchronization mechanisms must be used to protect shared resources.
Key considerations:
F´ supports several deployment patterns for multi-core systems, depending on whether you are using SMP or AMP architectures.
For SMP within a single process (e.g., Linux), memory space is shared across all cores.
Characteristics:
Architecture:
flowchart LR
subgraph SMP["SMP"]
subgraph Process["Linux Process"]
C1["C1"] --- C2["C2"]
C2 --- C3["C3"]
C3 --- C4["C4"]
end
end
style SMP fill:#5b9bd5,stroke:#2e5c8a,color:#fff
style Process fill:#70ad47,stroke:#4a7c2f,color:#fff
style C1 fill:#ed7d31,stroke:#c65911,color:#fff
style C2 fill:#ed7d31,stroke:#c65911,color:#fff
style C3 fill:#ed7d31,stroke:#c65911,color:#fff
style C4 fill:#ed7d31,stroke:#c65911,color:#fff
Use cases:
For SMP across multiple processes, use an F´ Generic Hub with a Linux named message queue or a pipe to pass data between processes.
Characteristics:
Architecture:
flowchart LR
subgraph SMP["SMP"]
direction LR
subgraph Process1["Linux Process"]
direction LR
C1["C1"] --- C2["C2"] --- Hub1["Hub"]
end
subgraph Process2["Linux Process"]
direction LR
Hub2["Hub"] --- C3["C3"] --- C4["C4"]
end
end
Hub1 <-->|IPC| Hub2
style SMP fill:#5b9bd5,stroke:#2e5c8a,color:#fff
style Process1 fill:#70ad47,stroke:#4a7c2f,color:#fff
style Process2 fill:#70ad47,stroke:#4a7c2f,color:#fff
style C1 fill:#ed7d31,stroke:#c65911,color:#fff
style C2 fill:#ed7d31,stroke:#c65911,color:#fff
style C3 fill:#ed7d31,stroke:#c65911,color:#fff
style C4 fill:#ed7d31,stroke:#c65911,color:#fff
style Hub1 fill:#ffc000,stroke:#d99000,color:#000
style Hub2 fill:#ffc000,stroke:#d99000,color:#000
Use cases:
Implementation notes:
For AMP, use an F´ Generic Hub with a driver for the hypervisor-provided or other middleware to pass data between AMP partitions.
Characteristics:
Architecture:
flowchart LR
subgraph Hypervisor["Hypervisor"]
direction LR
subgraph Partition1["Linux"]
direction LR
subgraph Process1["Linux Process"]
direction LR
C1["C1"] --- C2["C2"] --- Hub1["Hub"]
end
end
subgraph Partition2["Linux"]
direction LR
subgraph Process2["Linux Process"]
direction LR
Hub2["Hub"] --- C3["C3"] --- C4["C4"]
end
end
end
Hub1 <-->|Hypervisor Channel| Hub2
style Hypervisor fill:#a63c0f,stroke:#8a3e0c,color:#fff
style Partition1 fill:#4472c4,stroke:#2e4c87,color:#fff
style Partition2 fill:#4472c4,stroke:#2e4c87,color:#fff
style Process1 fill:#70ad47,stroke:#4a7c2f,color:#fff
style Process2 fill:#70ad47,stroke:#4a7c2f,color:#fff
style C1 fill:#ed7d31,stroke:#c65911,color:#fff
style C2 fill:#ed7d31,stroke:#c65911,color:#fff
style C3 fill:#ed7d31,stroke:#c65911,color:#fff
style C4 fill:#ed7d31,stroke:#c65911,color:#fff
style Hub1 fill:#ffc000,stroke:#d99000,color:#000
style Hub2 fill:#ffc000,stroke:#d99000,color:#000
Use cases:
Implementation notes:
When running F´ on multi-device systems (separate physical processors), users typically define a deployment for each device in the system. These deployments are then linked over the platform's inter-communication architecture. Should users want F´ execution across these deployments to look like a single F´ deployment, users are advised to adopt the hub pattern to invoke F´ port calls across multiple devices.
This approach is similar to the multi-process and AMP patterns described above, but operates across physically separate processors rather than partitions or processes on the same processor.