tf namespace

taskflow namespace

Classes

template<typename T, unsigned N = 2> class SmallVectorclass to define a vector optimized for small array template<typename T> class Xorshiftclass to create a fast xorshift-based pseudo-random number generator template<typename T> class CachelineAlignedclass to ensure cacheline-aligned storage for an object. template<typename T> class IndexRangeclass to create an index range of integral indices with a step size class Graphclass to create a graph object class TaskParamsclass to create a task parameter object class DefaultTaskParamsclass to create an empty task parameter for compile-time optimization template<typename T> class UnboundedWSQclass to create a lock-free unbounded work-stealing queue template<typename T, size_t LogSize = TF_DEFAULT_BOUNDED_TASK_QUEUE_LOG_SIZE> class BoundedWSQclass to create a lock-free bounded work-stealing queue class FlowBuilderclass to build a task dependency graph class Subflowclass to construct a subflow graph from the execution of a dynamic task class NonblockingNotifierclass to create a non-blocking notifier class Workerclass to create a worker in an executor class WorkerViewclass to create an immutable view of a worker class WorkerInterfaceclass to configure worker behavior in an executor class Executorclass to create an executor class Taskclass to create a task handle over a taskflow node class TaskViewclass to access task information from the observer interface class AsyncTaskclass to hold a dependent asynchronous task with shared ownership class Runtimeclass to create a runtime task class TaskGroupclass to create a task group from a task class Semaphoreclass to create a semophore object for building a concurrency constraint class Taskflowclass to create a taskflow object template<typename T> class Futureclass to access the result of an execution class ObserverInterfaceclass to derive an executor observer class ChromeObserverclass to create an observer based on Chrome tracing format class TFProfObserverclass to create an observer based on the built-in taskflow profiler format class DefaultClosureWrapperclass to create a default closure wrapper template<typename C = DefaultClosureWrapper> class PartitionerBaseclass to derive a partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class GuidedPartitionerclass to create a guided partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class DynamicPartitionerclass to create a dynamic partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class StaticPartitionerclass to construct a static partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class RandomPartitionerclass to construct a random partitioner for scheduling parallel algorithms class Pipeflowclass to create a pipeflow object used by the pipe callable template<typename C = std::function<void(tf::Pipeflow&)>> class Pipeclass to create a pipe object for a pipeline stage template<typename... Ps> class Pipelineclass to create a pipeline scheduling framework template<typename P> class ScalablePipelineclass to create a scalable pipeline object template<typename Input, typename Output, typename C> class DataPipeclass to create a stage in a data-parallel pipeline template<typename... Ps> class DataPipelineclass to create a data-parallel pipeline scheduling framework class cudaScopedDeviceclass to create an RAII-styled context switch class cudaEventCreatorclass to create functors that construct CUDA events class cudaEventDeleterclass to create a functor that deletes a CUDA event template<typename Creator, typename Deleter> class cudaEventBaseclass to create a CUDA event with unique ownership class cudaStreamCreatorclass to create functors that construct CUDA streams class cudaStreamDeleterclass to create a functor that deletes a CUDA stream template<typename Creator, typename Deleter> class cudaStreamBaseclass to create a CUDA stream with unique ownership class cudaTaskclass to create a task handle of a CUDA Graph node class cudaGraphCreatorclass to create functors that construct CUDA graphs class cudaGraphDeleterclass to create a functor that deletes a CUDA graph template<typename Creator, typename Deleter> class cudaGraphBaseclass to create a CUDA graph with uunique ownership class cudaGraphExecCreatorclass to create functors for constructing executable CUDA graphs class cudaGraphExecDeleterclass to create a functor for deleting an executable CUDA graph template<typename Creator, typename Deleter> class cudaGraphExecBaseclass to create an executable CUDA graph with unique ownership

Enums

enum class TaskType: int { PLACEHOLDER = 0, STATIC, RUNTIME, SUBFLOW, CONDITION, MODULE, ASYNC, UNDEFINED }enumeration of all task typesenum class ObserverType: int { TFPROF = 0, CHROME, UNDEFINED }enumeration of all observer typesenum class PartitionerType: int { STATIC, DYNAMIC }enumeration of all partitioner typesenum class PipeType: int { PARALLEL = 1, SERIAL = 2 }enumeration of all pipe types

Typedefs

using DefaultNotifier = NonblockingNotifierthe default notifier type used by Taskflow using observer_stamp_t = std::chrono::time_point<std::chrono::steady_clock> default time point type of observers using DefaultPartitioner = GuidedPartitioner<> default partitioner set to tf::GuidedPartitioner using cudaEvent = cudaEventBase<cudaEventCreator, cudaEventDeleter> default smart pointer type to manage a cudaEvent_t object with unique ownership using cudaStream = cudaStreamBase<cudaStreamCreator, cudaStreamDeleter> default smart pointer type to manage a cudaStream_t object with unique ownership using cudaGraph = cudaGraphBase<cudaGraphCreator, cudaGraphDeleter> default smart pointer type to manage a cudaGraph_t object with unique ownership using cudaGraphExec = cudaGraphExecBase<cudaGraphExecCreator, cudaGraphExecDeleter> default smart pointer type to manage a cudaGraphExec_t object with unique ownership

Functions

template<typename T, std::enable_if_t<(std::is_unsigned_v<std::decay_t<T>> && sizeof(T)==8), void>* = nullptr> auto next_pow2(T x) -> T constexprrounds the given 64-bit unsigned integer to the nearest power of 2 template<typename T, std::enable_if_t<std::is_integral_v<std::decay_t<T>>, void>* = nullptr> auto is_pow2(const T& x) -> bool constexprchecks if the given number is a power of 2 template<size_t N> auto static_floor_log2() -> size_t constexprreturns the floor of log2(N) at compile time template<typename RandItr, typename C> auto median_of_three(RandItr l, RandItr m, RandItr r, C cmp) -> RandItrfinds the median of three numbers pointed to by iterators using the given comparator template<typename RandItr, typename C> auto pseudo_median_of_nine(RandItr beg, RandItr end, C cmp) -> RandItrfinds the pseudo median of a range of items using a spread of nine numbers template<typename Iter, typename Compare> void sort2(Iter a, Iter b, Compare comp)sorts two elements of dereferenced iterators using the given comparison function template<typename Iter, typename Compare> void sort3(Iter a, Iter b, Iter c, Compare comp)Sorts three elements of dereferenced iterators using the given comparison function. template<typename T, std::enable_if_t<std::is_integral_v<T>, void>* = nullptr> auto unique_id() -> Tgenerates a program-wide unique ID of the given type in a thread-safe manner template<typename T> void atomic_max(std::atomic<T>& v, const T& max_v) noexceptupdates an atomic variable with the maximum value template<typename T> void atomic_min(std::atomic<T>& v, const T& min_v) noexceptupdates an atomic variable with the minimum value template<typename T> auto seed() -> T noexceptgenerates a random seed based on the current system clockauto coprime(size_t N) -> size_t constexprcomputes a coprime of a given number template<size_t N> auto make_coprime_lut() -> std::array<size_t, N> constexprgenerates a compile-time array of coprimes for numbers from 0 to N-1auto get_env(const std::string& str) -> std::stringretrieves the value of an environment variableauto has_env(const std::string& str) -> boolchecks whether an environment variable is defined template<typename B, typename E, typename S> auto is_index_range_invalid(B beg, E end, S step) -> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, bool> constexprchecks if the given index range is invalid template<typename B, typename E, typename S> auto distance(B beg, E end, S step) -> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, size_t> constexprcalculates the number of iterations in the given index range template<typename T, typename... ArgsT> auto make_worker_interface(ArgsT && ... args) -> std::shared_ptr<T>helper function to create an instance derived from tf::WorkerInterfaceauto to_string(TaskType type) -> const char*convert a task type to a human-readable stringauto operator<<(std::ostream& os, const Task& task) -> std::ostream&overload of ostream inserter operator for Taskauto to_string(ObserverType type) -> const char*convert an observer type to a human-readable string template<typename Input, typename Output, typename C> auto make_data_pipe(PipeType d, C&& callable) -> autofunction to construct a data pipe (tf::DataPipe) template<typename T> auto make_module_task(T&& graph) -> autocreates a module task using the given graphauto cuda_get_num_devices() -> size_tqueries the number of available devicesauto cuda_get_device() -> intgets the current device associated with the caller threadvoid cuda_set_device(int id)switches to a given device contextvoid cuda_get_device_property(int i, cudaDeviceProp& p)obtains the device propertyauto cuda_get_device_property(int i) -> cudaDevicePropobtains the device propertyvoid cuda_dump_device_property(std::ostream& os, const cudaDeviceProp& p)dumps the device propertyauto cuda_get_device_max_threads_per_block(int d) -> size_tqueries the maximum threads per block on a deviceauto cuda_get_device_max_x_dim_per_block(int d) -> size_tqueries the maximum x-dimension per block on a deviceauto cuda_get_device_max_y_dim_per_block(int d) -> size_tqueries the maximum y-dimension per block on a deviceauto cuda_get_device_max_z_dim_per_block(int d) -> size_tqueries the maximum z-dimension per block on a deviceauto cuda_get_device_max_x_dim_per_grid(int d) -> size_tqueries the maximum x-dimension per grid on a deviceauto cuda_get_device_max_y_dim_per_grid(int d) -> size_tqueries the maximum y-dimension per grid on a deviceauto cuda_get_device_max_z_dim_per_grid(int d) -> size_tqueries the maximum z-dimension per grid on a deviceauto cuda_get_device_max_shm_per_block(int d) -> size_tqueries the maximum shared memory size in bytes per block on a deviceauto cuda_get_device_warp_size(int d) -> size_tqueries the warp size on a deviceauto cuda_get_device_compute_capability_major(int d) -> intqueries the major number of compute capability of a deviceauto cuda_get_device_compute_capability_minor(int d) -> intqueries the minor number of compute capability of a deviceauto cuda_get_device_unified_addressing(int d) -> boolqueries if the device supports unified addressingauto cuda_get_driver_version() -> intqueries the latest CUDA version (1000 * major + 10 * minor) supported by the driverauto cuda_get_runtime_version() -> intqueries the CUDA Runtime version (1000 * major + 10 * minor)auto cuda_get_free_mem(int d) -> size_tqueries the free memory (expensive call)auto cuda_get_total_mem(int d) -> size_tqueries the total available memory (expensive call) template<typename T> auto cuda_malloc_device(size_t N, int d) -> T*allocates memory on the given device for holding N elements of type T template<typename T> auto cuda_malloc_device(size_t N) -> T*allocates memory on the current device associated with the caller template<typename T> auto cuda_malloc_shared(size_t N) -> T*allocates shared memory for holding N elements of type T template<typename T> void cuda_free(T* ptr, int d)frees memory on the GPU device template<typename T> void cuda_free(T* ptr)frees memory on the GPU devicevoid cuda_memcpy_async(cudaStream_t stream, void* dst, const void* src, size_t count)copies data between host and device asynchronously through a streamvoid cuda_memset_async(cudaStream_t stream, void* devPtr, int value, size_t count)initializes or sets GPU memory to the given value byte by byte template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr> auto cuda_get_copy_parms(T* tgt, const T* src, size_t num) -> cudaMemcpy3DParmsgets the memcpy node parameter of a copy taskauto cuda_get_memcpy_parms(void* tgt, const void* src, size_t bytes) -> cudaMemcpy3DParmsgets the memcpy node parameter of a memcpy task (untyped)auto cuda_get_memset_parms(void* dst, int ch, size_t count) -> cudaMemsetParamsgets the memset node parameter of a memcpy task (untyped) template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr> auto cuda_get_fill_parms(T* dst, T value, size_t count) -> cudaMemsetParamsgets the memset node parameter of a fill task (typed) template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr> auto cuda_get_zero_parms(T* dst, size_t count) -> cudaMemsetParamsgets the memset node parameter of a zero task (typed)auto cuda_graph_get_num_root_nodes(cudaGraph_t graph) -> size_tqueries the number of root nodes in a native CUDA graphauto cuda_graph_get_num_nodes(cudaGraph_t graph) -> size_tqueries the number of nodes in a native CUDA graphauto cuda_graph_get_num_edges(cudaGraph_t graph, cudaGraphNode_t* from, cudaGraphNode_t* to) -> size_tHandles compatibility with CUDA <= 12.x and CUDA == 13.x.auto cuda_graph_node_get_dependencies(cudaGraphNode_t node, cudaGraphNode_t* dependencies) -> size_tHandles compatibility with CUDA <= 12.x and CUDA 13.auto cuda_graph_node_get_dependent_nodes(cudaGraphNode_t node, cudaGraphNode_t* dependent_nodes) -> size_tHandles compatibility with CUDA <= 12.x and CUDA 13.void cuda_graph_add_dependencies(cudaGraph_t graph, const cudaGraphNode_t* from, const cudaGraphNode_t* to, size_t numDependencies)Handles compatibility with CUDA <= 12.x and CUDA 13.auto cuda_graph_get_num_edges(cudaGraph_t graph) -> size_tqueries the number of edges in a native CUDA graphauto cuda_graph_get_nodes(cudaGraph_t graph) -> std::vector<cudaGraphNode_t>acquires the nodes in a native CUDA graphauto cuda_graph_get_root_nodes(cudaGraph_t graph) -> std::vector<cudaGraphNode_t>acquires the root nodes in a native CUDA graphauto cuda_graph_get_edges(cudaGraph_t graph) -> std::vector<std::pair<cudaGraphNode_t, cudaGraphNode_t>>acquires the edges in a native CUDA graphauto cuda_get_graph_node_type(cudaGraphNode_t node) -> cudaGraphNodeTypequeries the type of a native CUDA graph nodeauto to_string(cudaGraphNodeType type) -> const char* constexprconvert a cuda_task type to a human-readable stringauto operator<<(std::ostream& os, const cudaTask& ct) -> std::ostream&overload of ostream inserter operator for cudaTaskauto version() -> const char* constexprqueries the version information in a string format major.minor.patch

Variables

template<typename P> bool is_task_params_v constexprdetermines if the given type is a task parameter type template<typename T> bool has_graph_v constexprdetermines if the given type has a member function Graph& graph()std::array<TaskType, 7> TASK_TYPES constexprarray of all task types (used for iterating task types) template<typename C> bool is_static_task_v constexprdetermines if a callable is a static task template<typename C> bool is_subflow_task_v constexprdetermines if a callable is a subflow task template<typename C> bool is_runtime_task_v constexprdetermines if a callable is a runtime task template<typename C> bool is_condition_task_v constexprdetermines if a callable is a condition task template<typename C> bool is_multi_condition_task_v constexprdetermines if a callable is a multi-condition task template<typename P> bool is_partitioner_v constexprdetermines if a type is a partitioner

Enum documentation

enum class tf::TaskType: int

enumeration of all task types

Enumerators
PLACEHOLDER

placeholder task type

| | STATIC |

static task type

| | RUNTIME |

runtime task type

| | SUBFLOW |

dynamic (subflow) task type

| | CONDITION |

condition task type

| | MODULE |

module task type

| | ASYNC |

asynchronous task type

| | UNDEFINED |

undefined task type (for internal use only)

enum class tf::PartitionerType: int

enumeration of all partitioner types

Enumerators
STATIC

static partitioner type

| | DYNAMIC |

dynamic partitioner type

enum class tf::PipeType: int

enumeration of all pipe types

Enumerators
PARALLEL

parallel type

| | SERIAL |

serial type

Typedef documentation

typedef NonblockingNotifier tf::DefaultNotifier

the default notifier type used by Taskflow

By default, Taskflow uses tf::NonblockingNotifier due to its stable performance on most platforms. We do not use tf::AtomicNotifier since on some platforms and compiler versions, the atomic notification may exhibit suboptimal performance due to buggy wake-up mechanisms. These issues have been discussed in GCC bug reports and patch threads related to atomic wait/notify implementations.

using tf::DefaultPartitioner = GuidedPartitioner<>

default partitioner set to tf::GuidedPartitioner

Guided partitioning algorithm can achieve stable and decent performance for most parallel algorithms.

Function documentation

template<typename T, std::enable_if_t<(std::is_unsigned_v<std::decay_t<T>> && sizeof(T)==8), void>* = nullptr> T tf::next_pow2(T x) constexpr

rounds the given 64-bit unsigned integer to the nearest power of 2

rounds the given 32-bit unsigned integer to the nearest power of 2

template<typename T, std::enable_if_t<std::is_integral_v<std::decay_t<T>>, void>* = nullptr> bool tf::is_pow2(const T& x) constexpr

checks if the given number is a power of 2

Template parameters
T
Parameters
---
x
Returns

This function determines if the given integer is a power of 2.

template<typename RandItr, typename C> RandItr tf::median_of_three(RandItr l, RandItr m, RandItr r, C cmp)

finds the median of three numbers pointed to by iterators using the given comparator

Template parameters
RandItr
C
Parameters
---
l
m
r
cmp
Returns

This function determines the median value of the elements pointed to by three random-access iterators using the provided comparator.

template<typename RandItr, typename C> RandItr tf::pseudo_median_of_nine(RandItr beg, RandItr end, C cmp)

finds the pseudo median of a range of items using a spread of nine numbers

Template parameters
RandItr
C
Parameters
---
beg
end
cmp
Returns

This function computes an approximate median of a range of items by sampling nine values spread across the range and finding their median. It uses a combination of the median_of_three function to determine the pseudo median.

template<typename Iter, typename Compare> void tf::sort2(Iter a, Iter b, Compare comp)

sorts two elements of dereferenced iterators using the given comparison function

Template parameters
Iter
Compare
Parameters
---
a
b
comp

This function compares two elements pointed to by iterators and swaps them if they are out of order according to the provided comparator.

template<typename Iter, typename Compare> void tf::sort3(Iter a, Iter b, Iter c, Compare comp)

Sorts three elements of dereferenced iterators using the given comparison function.

Template parameters
Iter
Compare
Parameters
---
a
b
c
comp

This function sorts three elements pointed to by iterators in ascending order according to the provided comparator. The sorting is performed using a sequence of calls to the sort2 function to ensure the correct order of elements.

template<typename T, std::enable_if_t<std::is_integral_v<T>, void>* = nullptr> T tf::unique_id()

generates a program-wide unique ID of the given type in a thread-safe manner

Template parameters
T
Returns

This function provides a globally unique identifier of the specified integral type. It uses a static std::atomic counter to ensure thread safety and increments the counter in a relaxed memory ordering for efficiency.

template<typename T> void tf::atomic_max(std::atomic<T>& v, const T& max_v) noexcept

updates an atomic variable with the maximum value

Template parameters
T
Parameters
---
v
max_v

This function atomically updates the provided atomic variable v to hold the maximum of its current value and max_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.

template<typename T> void tf::atomic_min(std::atomic<T>& v, const T& min_v) noexcept

updates an atomic variable with the minimum value

Template parameters
T
Parameters
---
v
min_v

This function atomically updates the provided atomic variable v to hold the minimum of its current value and min_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.

template<typename T> T tf::seed() noexcept

generates a random seed based on the current system clock

Template parameters
T
Returns

This function returns a seed value derived from the number of clock ticks since the epoch as measured by the system clock. The seed can be used to initialize random number generators.

size_t tf::coprime(size_t N) constexpr

computes a coprime of a given number

Parameters
N
Returns

This function finds the largest number less than N that is coprime (i.e., has a greatest common divisor of 1) with N. If N is less than 3, it returns 1 as a default coprime.

template<size_t N> std::array<size_t, N> tf::make_coprime_lut() constexpr

generates a compile-time array of coprimes for numbers from 0 to N-1

Template parameters
N
Returns

This function constructs a constexpr array where each element at index i contains a coprime of i (the largest number less than i that is coprime to it).

std::string tf::get_env(const std::string& str)

retrieves the value of an environment variable

Parameters
str
Returns

This function fetches the value of an environment variable by name. If the variable is not found, it returns an empty string.

bool tf::has_env(const std::string& str)

checks whether an environment variable is defined

Parameters
str
Returns

This function determines if a specific environment variable exists in the current environment.

template<typename B, typename E, typename S> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, bool> tf::is_index_range_invalid(B beg, E end, S step) constexpr

checks if the given index range is invalid

Template parameters
B
E
S
Parameters
---
beg
end
step
Returns

A range is considered invalid under the following conditions:

The step is zero and the begin and end values are not equal.
A positive range (begin < end) with a non-positive step.
A negative range (begin > end) with a non-negative step.

template<typename B, typename E, typename S> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, size_t> tf::distance(B beg, E end, S step) constexpr

calculates the number of iterations in the given index range

Template parameters
B
E
S
Parameters
---
beg
end
step
Returns

The distance of a range represents the number of required iterations to traverse the range from the beginning index to the ending index (exclusive) with the given step size.

Example 1:

// Range: 0 to 10 with step size 2size\_t dist = distance(0, 10, 2);// Returns 5, the sequence is [0, 2, 4, 6, 8]

Example 2:

// Range: 10 to 0 with step size -2size\_t dist = distance(10, 0, -2);// Returns 5, the sequence is [10, 8, 6, 4, 2]

Example 3:

// Range: 5 to 20 with step size 5size\_t dist = distance(5, 20, 5);// Returns 3, the sequence is [5, 10, 15]

template<typename T, typename... ArgsT> std::shared_ptr<T> tf::make_worker_interface(ArgsT && ... args)

helper function to create an instance derived from tf::WorkerInterface

Template parameters
T
ArgsT
Parameters
---
args

const char* tf::to_string(TaskType type)

convert a task type to a human-readable string

The name of each task type is the litte-case string of its characters.

TaskType::PLACEHOLDER is of string placeholder
TaskType::STATIC is of string static
TaskType::RUNTIME is of string runtime
TaskType::SUBFLOW is of string subflow
TaskType::CONDITION is of string condition
TaskType::MODULE is of string module
TaskType::ASYNC is of string async

template<typename Input, typename Output, typename C> auto tf::make_data_pipe(PipeType d, C&& callable)

function to construct a data pipe (tf::DataPipe)

Template parameters
Input
Output
C

tf::make_data_pipe is a helper function to create a data pipe (tf::DataPipe) in a data-parallel pipeline (tf::DataPipeline). The first argument specifies the direction of the data pipe, either tf::PipeType::SERIAL or tf::PipeType::PARALLEL, and the second argument is a callable to invoke by the pipeline scheduler. Input and output data types are specified via template parameters, which will always be decayed by the library to its original form for storage purpose. The callable must take the input data type in its first argument and returns a value of the output data type.

tf::make\_data\_pipe\<int, std::string\>(tf::PipeType::SERIAL, [](int& input) {return std::to\_string(input + 100);});

The callable can additionally take a reference of tf::Pipeflow, which allows you to query the runtime information of a stage task, such as its line number and token number.

tf::make\_data\_pipe\<int, std::string\>(tf::PipeType::SERIAL, [](int& input, tf::Pipeflow& pf) {printf("token=%lu, line=%lu\n", pf.token(), pf.line());return std::to\_string(input + 100);});

template<typename T> auto tf::make_module_task(T&& graph)

creates a module task using the given graph

Template parameters
T
Parameters
---
graph
Returns

This example demonstrates how to create and launch multiple taskflows in parallel using modules with asynchronous tasking:

tf::Executor executor;tf::Taskflow A;tf::Taskflow B;tf::Taskflow C;tf::Taskflow D;A.emplace([](){ printf("Taskflow A\n"); }); B.emplace([](){ printf("Taskflow B\n"); }); C.emplace([](){ printf("Taskflow C\n"); }); D.emplace([](){ printf("Taskflow D\n"); }); // launch the four taskflows using asynchronous taskingexecutor.async(tf::make\_module\_task(A));executor.async(tf::make\_module\_task(B));executor.async(tf::make\_module\_task(C));executor.async(tf::make\_module\_task(D));executor.wait\_for\_all();

The module task maker, tf::make_module_task, is basically the same as tf::Taskflow::composed_of but provides a more generic interface that can be used beyond Taskflow. For instance, the following two approaches achieve the same functionality.

// approach 1: composition using composed\_oftf::Task m1 = taskflow1.composed\_of(taskflow2);// approach 2: composition using make\_module\_tasktf::Task m1 = taskflow1.emplace(tf::make\_module\_task(taskflow2));

template<typename T> T* tf::cuda_malloc_device(size_t N, int d)

allocates memory on the given device for holding N elements of type T

The function calls cudaMalloc to allocate N*sizeof(T) bytes of memory on the given device d and returns a pointer to the starting address of the device memory.

template<typename T> T* tf::cuda_malloc_device(size_t N)

allocates memory on the current device associated with the caller

The function calls malloc_device from the current device associated with the caller.

template<typename T> T* tf::cuda_malloc_shared(size_t N)

allocates shared memory for holding N elements of type T

The function calls cudaMallocManaged to allocate N*sizeof(T) bytes of memory and returns a pointer to the starting address of the shared memory.

template<typename T> void tf::cuda_free(T* ptr, int d)

frees memory on the GPU device

Template parameters
T
Parameters
---
ptr
d

This methods call cudaFree to free the memory space pointed to by ptr using the given device context.

template<typename T> void tf::cuda_free(T* ptr)

frees memory on the GPU device

Template parameters
T
Parameters
---
ptr

This methods call cudaFree to free the memory space pointed to by ptr using the current device context of the caller.

void tf::cuda_memcpy_async(cudaStream_t stream, void* dst, const void* src, size_t count)

copies data between host and device asynchronously through a stream

Parameters
stream
dst
src
count

The method calls cudaMemcpyAsync with the given stream using cudaMemcpyDefault to infer the memory space of the source and the destination pointers. The memory areas may not overlap.

void tf::cuda_memset_async(cudaStream_t stream, void* devPtr, int value, size_t count)

initializes or sets GPU memory to the given value byte by byte

Parameters
stream
devPtr
value
count

The method calls cudaMemsetAsync with the given stream to fill the first count bytes of the memory area pointed to by devPtr with the constant byte value value.

size_t tf::cuda_graph_node_get_dependencies(cudaGraphNode_t node, cudaGraphNode_t* dependencies)

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters
node
dependencies

size_t tf::cuda_graph_node_get_dependent_nodes(cudaGraphNode_t node, cudaGraphNode_t* dependent_nodes)

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters
node
dependent_nodes

void tf::cuda_graph_add_dependencies(cudaGraph_t graph, const cudaGraphNode_t* from, const cudaGraphNode_t* to, size_t numDependencies)

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters
graph
from
to
numDependencies

cudaGraphNodeType tf::cuda_get_graph_node_type(cudaGraphNode_t node)

queries the type of a native CUDA graph node

valid type values are:

cudaGraphNodeTypeKernel = 0x00
cudaGraphNodeTypeMemcpy = 0x01
cudaGraphNodeTypeMemset = 0x02
cudaGraphNodeTypeHost = 0x03
cudaGraphNodeTypeGraph = 0x04
cudaGraphNodeTypeEmpty = 0x05
cudaGraphNodeTypeWaitEvent = 0x06
cudaGraphNodeTypeEventRecord = 0x07

const char* tf::version() constexpr

queries the version information in a string format major.minor.patch

Release notes are available here: https://taskflow.github.io/taskflow/Releases.html

Variable documentation

template<typename P> bool tf::is_task_params_v constexpr

determines if the given type is a task parameter type

Task parameters can be specified in one of the following types:

template<typename T> bool tf::has_graph_v constexpr

determines if the given type has a member function Graph& graph()

Template parameters
T

This trait determines if the provided type T contains a member function with the exact signature tf::Graph& graph(). It uses SFINAE and std::void_t to detect the presence of the member function and its return type.

Example usage:

struct A {tf::Graph& graph() { return my\_graph; };tf::Graph my\_graph;// other custom members to alter my\_graph};struct C {}; // No graph functionstatic\_assert(has\_graph\_v\<A\>, "A has graph()");static\_assert(!has\_graph\_v\<C\>, "C does not have graph()");

template<typename C> bool tf::is_static_task_v constexpr

determines if a callable is a static task

A static task is a callable object constructible from std::function<void()>.

template<typename C> bool tf::is_subflow_task_v constexpr

determines if a callable is a subflow task

A subflow task is a callable object constructible from std::function<void(Subflow&)>.

template<typename C> bool tf::is_runtime_task_v constexpr

determines if a callable is a runtime task

A runtime task is a callable object constructible from std::function<void(Runtime&)>.

template<typename C> bool tf::is_condition_task_v constexpr

determines if a callable is a condition task

A condition task is a callable object constructible from std::function<int()>.

template<typename C> bool tf::is_multi_condition_task_v constexpr

determines if a callable is a multi-condition task

A multi-condition task is a callable object constructible from std::function<tf::SmallVector<int>()>.

template<typename P> bool tf::is_partitioner_v constexpr

determines if a type is a partitioner

A partitioner is a derived type from tf::PartitionerBase.