Back to Taskflow

Taskflow: A General

docs/namespacetf.html

4.1.071.4 KB
Original Source

| | Taskflow: A General-purpose Task-parallel Programming System |

Loading...

Searching...

No Matches

Classes | Concepts | Typedefs | Enumerations | Functions | Variables

tf Namespace Reference

taskflow namespace More...

|

Classes

| | class | AsyncTask | | | class to hold a dependent asynchronous task with shared ownership More...
| | | | class | BoundedWSQ | | | class to create a lock-free bounded work-stealing queue More...
| | | | class | CachelineAligned | | | class to ensure cacheline-aligned storage for an object. More...
| | | | class | cudaEventBase | | | class to create a CUDA event with unique ownership More...
| | | | class | cudaEventCreator | | | class to create functors that construct CUDA events More...
| | | | class | cudaEventDeleter | | | class to create a functor that deletes a CUDA event More...
| | | | class | cudaGraphBase | | | class to create a CUDA graph with uunique ownership More...
| | | | class | cudaGraphCreator | | | class to create functors that construct CUDA graphs More...
| | | | class | cudaGraphDeleter | | | class to create a functor that deletes a CUDA graph More...
| | | | class | cudaGraphExecBase | | | class to create an executable CUDA graph with unique ownership More...
| | | | class | cudaGraphExecCreator | | | class to create functors for constructing executable CUDA graphs More...
| | | | class | cudaGraphExecDeleter | | | class to create a functor for deleting an executable CUDA graph More...
| | | | class | cudaScopedDevice | | | class to create an RAII-styled context switch More...
| | | | class | cudaStreamBase | | | class to create a CUDA stream with unique ownership More...
| | | | class | cudaStreamCreator | | | class to create functors that construct CUDA streams More...
| | | | class | cudaStreamDeleter | | | class to create a functor that deletes a CUDA stream More...
| | | | class | cudaTask | | | class to create a task handle of a CUDA Graph node More...
| | | | class | DataPipe | | | class to create a stage in a data-parallel pipeline More...
| | | | class | DataPipeline | | | class to create a data-parallel pipeline scheduling framework More...
| | | | class | DefaultClosureWrapper | | | class to create a default closure wrapper More...
| | | | class | DefaultTaskParams | | | class to create an empty task parameter for compile-time optimization More...
| | | | class | DynamicPartitioner | | | class to create a dynamic partitioner for scheduling parallel algorithms More...
| | | | class | Executor | | | class to create an executor More...
| | | | class | FlowBuilder | | | class to build a task dependency graph More...
| | | | class | Future | | | class to access the result of an execution More...
| | | | class | Graph | | | class to create a graph object More...
| | | | class | GuidedPartitioner | | | class to create a guided partitioner for scheduling parallel algorithms More...
| | | | class | IndexRanges | | | class to create an N-dimensional index range of integral indices More...
| | | | class | NonblockingNotifier | | | class to create a non-blocking notifier More...
| | | | class | ObjectPool | | | sharded fixed-size object allocator with a lock-free hot path More...
| | | | class | ObserverInterface | | | class to derive an executor observer More...
| | | | class | PartitionerBase | | | class to derive a partitioner for scheduling parallel algorithms More...
| | | | class | Pipe | | | class to create a pipe object for a pipeline stage More...
| | | | class | Pipeflow | | | class to create a pipeflow object used by the pipe callable More...
| | | | class | Pipeline | | | class to create a pipeline scheduling framework More...
| | | | class | RandomPartitioner | | | class to construct a random partitioner for scheduling parallel algorithms More...
| | | | class | Runtime | | | class to create a runtime task More...
| | | | class | ScalablePipeline | | | class to create a scalable pipeline object More...
| | | | class | Semaphore | | | class to create a semophore object for building a concurrency constraint More...
| | | | class | SmallVector | | | class to define a vector optimized for small array More...
| | | | class | StaticPartitioner | | | class to construct a static partitioner for scheduling parallel algorithms More...
| | | | class | Subflow | | | class to construct a subflow graph from the execution of a dynamic task More...
| | | | struct | TaggedHead128 | | | tagged free-list head using a 128-bit (pointer, version) pair More...
| | | | struct | TaggedHead64 | | | tagged free-list head packed into a single 64-bit word More...
| | | | class | Task | | | class to create a task handle over a taskflow node More...
| | | | class | Taskflow | | | class to create a taskflow object More...
| | | | class | TaskGroup | | | class to create a task group from a task More...
| | | | class | TaskParams | | | class to create a task parameter object More...
| | | | class | TaskView | | | class to access task information from the observer interface More...
| | | | class | UnboundedWSQ | | | class to create a lock-free unbounded work-stealing queue More...
| | | | class | Worker | | | class to create a worker in an executor More...
| | | | class | WorkerInterface | | | class to configure worker behavior in an executor More...
| | | | class | WorkerView | | | class to create an immutable view of a worker More...
| | | | class | Xorshift | | | class to create a fast xorshift-based pseudo-random number generator More...
| | |

|

Concepts

| | concept | IndexRangesLike | | | concept to check if a type is a tf::IndexRanges, regardless of dimensionality
| | | | concept | IndexRanges1DLike | | | concept to check if a type is a tf::IndexRanges<T, 1> (i.e., tf::IndexRange<T>)
| | | | concept | IndexRangesMDLike | | | concept to check if a type is a tf::IndexRanges<T, N> with rank > 1
| | | | concept | StringLike | | | concept that determines if a type is string-like
| | | | concept | TaskParamsLike | | | determines if a type is a task parameter type
| | | | concept | GraphLike | | | concept that determines if a type owns or provides access to a tf::Graph
| | | | concept | StaticTaskLike | | | determines if a callable is a static task
| | | | concept | SubflowTaskLike | | | determines if a callable is a subflow task
| | | | concept | RuntimeTaskLike | | | determines if a callable is a runtime task
| | | | concept | ConditionTaskLike | | | determines if a callable is a condition task
| | | | concept | MultiConditionTaskLike | | | determines if a callable is a multi-condition task
| | | | concept | PartitionerLike | | | determines if a type is a partitioner
| | |

|

Typedefs

| | template<std::integral T> | | using | IndexRange = IndexRanges<T, 1> | | | alias for the common 1D case of tf::IndexRanges
| | | | using | DefaultNotifier = NonblockingNotifier | | | the default notifier type used by Taskflow
| | | | using | observer_stamp_t = std::chrono::time_point<std::chrono::steady_clock> | | | default time point type of observers
| | | | using | DefaultPartitioner = GuidedPartitioner<> | | | default partitioner set to tf::GuidedPartitioner
| | | | using | cudaEvent = cudaEventBase<cudaEventCreator, cudaEventDeleter> | | | default smart pointer type to manage a cudaEvent_t object with unique ownership
| | | | using | cudaStream = cudaStreamBase<cudaStreamCreator, cudaStreamDeleter> | | | default smart pointer type to manage a cudaStream_t object with unique ownership
| | | | using | cudaGraph = cudaGraphBase<cudaGraphCreator, cudaGraphDeleter> | | | default smart pointer type to manage a cudaGraph_t object with unique ownership
| | | | using | cudaGraphExec = cudaGraphExecBase<cudaGraphExecCreator, cudaGraphExecDeleter> | | | default smart pointer type to manage a cudaGraphExec_t object with unique ownership
| | |

|

Enumerations

| | enum class | TaskType : int {
PLACEHOLDER = 0 , STATIC , RUNTIME , SUBFLOW ,
CONDITION , MODULE , ASYNC , UNDEFINED
} | | | enumeration of all task types More...
| | | | enum class | PartitionerType : int { STATIC , DYNAMIC } | | | enumeration of all partitioner types More...
| | | | enum class | PipeType : int { PARALLEL = 1 , SERIAL = 2 } | | | enumeration of all pipe types More...
| | |

|

Functions

| | template<typename T>
requires (std::is_unsigned_v<std::decay_t<T>> && sizeof(T) == 8) | | constexpr T | next_pow2 (T x) | | | rounds the given 64-bit unsigned integer to the nearest power of 2
| | | | template<typename T>
requires (std::is_unsigned_v<std::decay_t<T>> && sizeof(T) == 4) | | constexpr T | next_pow2 (T y) | | | rounds the given 32-bit unsigned integer to the nearest power of 2
| | | | template<std::integral T> | | constexpr bool | is_pow2 (const T &x) | | | checks if the given number is a power of 2
| | | | template<size_t N> | | constexpr size_t | static_floor_log2 () | | | returns the floor of log2(N) at compile time
| | | | template<typename RandItr, typename C> | | RandItr | median_of_three (RandItr l, RandItr m, RandItr r, C cmp) | | | finds the median of three numbers pointed to by iterators using the given comparator
| | | | template<typename RandItr, typename C> | | RandItr | pseudo_median_of_nine (RandItr beg, RandItr end, C cmp) | | | finds the pseudo median of a range of items using a spread of nine numbers
| | | | template<typename Iter, typename Compare> | | void | sort2 (Iter a, Iter b, Compare comp) | | | sorts two elements of dereferenced iterators using the given comparison function
| | | | template<typename Iter, typename Compare> | | void | sort3 (Iter a, Iter b, Iter c, Compare comp) | | | Sorts three elements of dereferenced iterators using the given comparison function.
| | | | template<std::integral T> | | T | unique_id () | | | generates a program-wide unique ID of the given type in a thread-safe manner
| | | | template<typename T> | | void | atomic_max (std::atomic< T > &v, const T &max_v) noexcept | | | updates an atomic variable with the maximum value
| | | | template<typename T> | | void | atomic_min (std::atomic< T > &v, const T &min_v) noexcept | | | updates an atomic variable with the minimum value
| | | | template<typename T> | | T | seed () noexcept | | | generates a random seed based on the current system clock
| | | | constexpr size_t | coprime (size_t N) | | | computes a coprime of a given number
| | | | template<size_t N> | | constexpr std::array< size_t, N > | make_coprime_lut () | | | generates a compile-time array of coprimes for numbers from 0 to N-1
| | | | std::string | get_env (const std::string &str) | | | retrieves the value of an environment variable
| | | | bool | has_env (const std::string &str) | | | checks whether an environment variable is defined
| | | | template<std::integral T> | | constexpr bool | is_index_range_invalid (T beg, T end, T step) | | | checks if the given index range is invalid
| | | | template<std::integral T> | | constexpr size_t | distance (T beg, T end, T step) | | | calculates the number of iterations in the given index range
| | | | template<GraphLike T> | | Graph & | retrieve_graph (T &target) | | | retrieves a reference to the underlying tf::Graph from an object
| | | | template<typename T> | | constexpr auto | wsq_empty_value () | | | returns the empty sentinel for work-stealing steal operations
| | | | template<typename T, typename... ArgsT> | | std::shared_ptr< T > | make_worker_interface (ArgsT &&... args) | | | helper function to create an instance derived from tf::WorkerInterface
| | | | const char * | to_string (TaskType type) | | | convert a task type to a human-readable string
| | | | std::ostream & | operator<< (std::ostream &os, const Task &task) | | | overload of ostream inserter operator for Task
| | | | template<typename Input, typename Output, typename C> | | auto | make_data_pipe (PipeType d, C &&callable) | | | function to construct a data pipe (tf::DataPipe)
| | | | template<GraphLike T> | | auto | make_module_task (T &target) | | | creates a module task using the given graph
| | | | size_t | cuda_get_num_devices () | | | queries the number of available devices
| | | | int | cuda_get_device () | | | gets the current device associated with the caller thread
| | | | void | cuda_set_device (int id) | | | switches to a given device context
| | | | void | cuda_get_device_property (int i, cudaDeviceProp &p) | | | obtains the device property
| | | | cudaDeviceProp | cuda_get_device_property (int i) | | | obtains the device property
| | | | void | cuda_dump_device_property (std::ostream &os, const cudaDeviceProp &p) | | | dumps the device property
| | | | size_t | cuda_get_device_max_threads_per_block (int d) | | | queries the maximum threads per block on a device
| | | | size_t | cuda_get_device_max_x_dim_per_block (int d) | | | queries the maximum x-dimension per block on a device
| | | | size_t | cuda_get_device_max_y_dim_per_block (int d) | | | queries the maximum y-dimension per block on a device
| | | | size_t | cuda_get_device_max_z_dim_per_block (int d) | | | queries the maximum z-dimension per block on a device
| | | | size_t | cuda_get_device_max_x_dim_per_grid (int d) | | | queries the maximum x-dimension per grid on a device
| | | | size_t | cuda_get_device_max_y_dim_per_grid (int d) | | | queries the maximum y-dimension per grid on a device
| | | | size_t | cuda_get_device_max_z_dim_per_grid (int d) | | | queries the maximum z-dimension per grid on a device
| | | | size_t | cuda_get_device_max_shm_per_block (int d) | | | queries the maximum shared memory size in bytes per block on a device
| | | | size_t | cuda_get_device_warp_size (int d) | | | queries the warp size on a device
| | | | int | cuda_get_device_compute_capability_major (int d) | | | queries the major number of compute capability of a device
| | | | int | cuda_get_device_compute_capability_minor (int d) | | | queries the minor number of compute capability of a device
| | | | bool | cuda_get_device_unified_addressing (int d) | | | queries if the device supports unified addressing
| | | | int | cuda_get_driver_version () | | | queries the latest CUDA version (1000 * major + 10 * minor) supported by the driver
| | | | int | cuda_get_runtime_version () | | | queries the CUDA Runtime version (1000 * major + 10 * minor)
| | | | size_t | cuda_get_free_mem (int d) | | | queries the free memory (expensive call)
| | | | size_t | cuda_get_total_mem (int d) | | | queries the total available memory (expensive call)
| | | | template<typename T> | | T * | cuda_malloc_device (size_t N, int d) | | | allocates memory on the given device for holding N elements of type T
| | | | template<typename T> | | T * | cuda_malloc_device (size_t N) | | | allocates memory on the current device associated with the caller
| | | | template<typename T> | | T * | cuda_malloc_shared (size_t N) | | | allocates shared memory for holding N elements of type T
| | | | template<typename T> | | void | cuda_free (T *ptr, int d) | | | frees memory on the GPU device
| | | | template<typename T> | | void | cuda_free (T *ptr) | | | frees memory on the GPU device
| | | | void | cuda_memcpy_async (cudaStream_t stream, void *dst, const void *src, size_t count) | | | copies data between host and device asynchronously through a stream
| | | | void | cuda_memset_async (cudaStream_t stream, void *devPtr, int value, size_t count) | | | initializes or sets GPU memory to the given value byte by byte
| | | | template<typename T, std::enable_if_t<!std::is_same_v< T, void >, void > * = nullptr> | | cudaMemcpy3DParms | cuda_get_copy_parms (T *tgt, const T *src, size_t num) | | | gets the memcpy node parameter of a copy task
| | | | cudaMemcpy3DParms | cuda_get_memcpy_parms (void *tgt, const void *src, size_t bytes) | | | gets the memcpy node parameter of a memcpy task (untyped)
| | | | cudaMemsetParams | cuda_get_memset_parms (void *dst, int ch, size_t count) | | | gets the memset node parameter of a memcpy task (untyped)
| | | | template<typename T, std::enable_if_t< is_pod_v< T > &&(sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void > * = nullptr> | | cudaMemsetParams | cuda_get_fill_parms (T *dst, T value, size_t count) | | | gets the memset node parameter of a fill task (typed)
| | | | template<typename T, std::enable_if_t< is_pod_v< T > &&(sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void > * = nullptr> | | cudaMemsetParams | cuda_get_zero_parms (T *dst, size_t count) | | | gets the memset node parameter of a zero task (typed)
| | | | size_t | cuda_graph_get_num_root_nodes (cudaGraph_t graph) | | | queries the number of root nodes in a native CUDA graph
| | | | size_t | cuda_graph_get_num_nodes (cudaGraph_t graph) | | | queries the number of nodes in a native CUDA graph
| | | | size_t | cuda_graph_get_num_edges (cudaGraph_t graph, cudaGraphNode_t *from, cudaGraphNode_t *to) | | | Handles compatibility with CUDA <= 12.x and CUDA == 13.x.
| | | | size_t | cuda_graph_node_get_dependencies (cudaGraphNode_t node, cudaGraphNode_t *dependencies) | | | Handles compatibility with CUDA <= 12.x and CUDA 13.
| | | | size_t | cuda_graph_node_get_dependent_nodes (cudaGraphNode_t node, cudaGraphNode_t *dependent_nodes) | | | Handles compatibility with CUDA <= 12.x and CUDA 13.
| | | | void | cuda_graph_add_dependencies (cudaGraph_t graph, const cudaGraphNode_t *from, const cudaGraphNode_t *to, size_t numDependencies) | | | Handles compatibility with CUDA <= 12.x and CUDA 13.
| | | | size_t | cuda_graph_get_num_edges (cudaGraph_t graph) | | | queries the number of edges in a native CUDA graph
| | | | std::vector< cudaGraphNode_t > | cuda_graph_get_nodes (cudaGraph_t graph) | | | acquires the nodes in a native CUDA graph
| | | | std::vector< cudaGraphNode_t > | cuda_graph_get_root_nodes (cudaGraph_t graph) | | | acquires the root nodes in a native CUDA graph
| | | | std::vector< std::pair< cudaGraphNode_t, cudaGraphNode_t > > | cuda_graph_get_edges (cudaGraph_t graph) | | | acquires the edges in a native CUDA graph
| | | | cudaGraphNodeType | cuda_get_graph_node_type (cudaGraphNode_t node) | | | queries the type of a native CUDA graph node
| | | | constexpr const char * | to_string (cudaGraphNodeType type) | | | convert a cuda_task type to a human-readable string
| | | | std::ostream & | operator<< (std::ostream &os, const cudaTask &ct) | | | overload of ostream inserter operator for cudaTask
| | | | constexpr const char * | version () | | | queries the version information in a string format major.minor.patch
| | |

|

Variables

| | template<typename> | | constexpr bool | is_index_ranges_v = false | | | base type trait to detect if a type is a tf::IndexRanges
| | | | template<typename T, size_t N> | | constexpr bool | is_index_ranges_v< IndexRanges< T, N > > = true | | | specialization of the detector for tf::IndexRanges<T, N>
| | | | template<typename P> | | constexpr bool | is_task_params_v = TaskParamsLike<P> | | | determines if a type is a task parameter type (variable template)
| | | | template<typename C> | | constexpr bool | is_static_task_v = StaticTaskLike<C> | | | determines if a callable is a static task (variable template)
| | | | template<typename C> | | constexpr bool | is_subflow_task_v = SubflowTaskLike<C> | | | determines if a callable is a subflow task (variable template)
| | | | template<typename C> | | constexpr bool | is_runtime_task_v = RuntimeTaskLike<C> | | | determines if a callable is a runtime task (variable template)
| | | | template<typename C> | | constexpr bool | is_condition_task_v = ConditionTaskLike<C> | | | determines if a callable is a condition task (variable template)
| | | | template<typename C> | | constexpr bool | is_multi_condition_task_v = MultiConditionTaskLike<C> | | | determines if a callable is a multi-condition task (variable template)
| | | | template<typename P> | | constexpr bool | is_partitioner_v = PartitionerLike<P> | | | determines if a type is a partitioner (variable template)
| | |

Detailed Description

taskflow namespace

Typedef Documentation

DefaultNotifier

| using tf::DefaultNotifier = NonblockingNotifier |

the default notifier type used by Taskflow

By default, Taskflow uses tf::NonblockingNotifier due to its stable performance on most platforms. We do not use tf::AtomicNotifier since on some platforms and compiler versions, the atomic notification may exhibit suboptimal performance due to buggy wake-up mechanisms. These issues have been discussed in GCC bug reports and patch threads related to atomic wait/notify implementations.

See also:

DefaultPartitioner

| using tf::DefaultPartitioner = GuidedPartitioner<> |

default partitioner set to tf::GuidedPartitioner

Guided partitioning algorithm can achieve stable and decent performance for most parallel algorithms.

IndexRange

template<std::integral T>

| using tf::IndexRange = IndexRanges<T, 1> |

alias for the common 1D case of tf::IndexRanges

Template Parameters

| T | the integral type of the indices |

tf::IndexRange<T> is equivalent to tf::IndexRanges<T, 1>. Class template argument deduction works through the alias, so the three-argument constructor can be used without an explicit template argument:

tf::IndexRange r(0, 10, 2); // deduced as tf::IndexRanges<int, 1>

tf::IndexRange<int> s(0, 100, 5); // same type, written explicitly

tf::IndexRange

IndexRanges< T, 1 > IndexRange

alias for the common 1D case of tf::IndexRanges

Definition iterator.hpp:971

Enumeration Type Documentation

PartitionerType

|

| enum class tf::PartitionerType : int |

| strong |

enumeration of all partitioner types

Enumerator
STATIC

static partitioner type

| | DYNAMIC |

dynamic partitioner type

|

PipeType

|

| enum class tf::PipeType : int |

| strong |

enumeration of all pipe types

Enumerator
PARALLEL

parallel type

| | SERIAL |

serial type

|

TaskType

|

| enum class tf::TaskType : int |

| strong |

enumeration of all task types

Enumerator
PLACEHOLDER

placeholder task type

| | STATIC |

static task type

| | RUNTIME |

runtime task type

| | SUBFLOW |

dynamic (subflow) task type

| | CONDITION |

condition task type

| | MODULE |

module task type

| | ASYNC |

asynchronous task type

| | UNDEFINED |

undefined task type (for internal use only)

|

Function Documentation

atomic_max()

template<typename T>

|

| void tf::atomic_max | ( | std::atomic< T > & | v, | | | | const T & | max_v ) |

| inlinenoexcept |

updates an atomic variable with the maximum value

Template Parameters

| T | The type of the atomic variable. Must be trivially copyable and comparable. |

Parameters

| v | The atomic variable to update. | | max_v | The value to compare with the current value of v. |

This function atomically updates the provided atomic variable v to hold the maximum of its current value and max_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.

std::atomic<int> v{5};

tf::atomic_max(v, 10);

// v.load() == 10

tf::atomic_max

void atomic_max(std::atomic< T > &v, const T &max_v) noexcept

updates an atomic variable with the maximum value

Definition math.hpp:278

NoteIf multiple threads call this function concurrently, the value of v will be the maximum value seen across all threads.

atomic_min()

template<typename T>

|

| void tf::atomic_min | ( | std::atomic< T > & | v, | | | | const T & | min_v ) |

| inlinenoexcept |

updates an atomic variable with the minimum value

Template Parameters

| T | The type of the atomic variable. Must be trivially copyable and comparable. |

Parameters

| v | The atomic variable to update. | | min_v | The value to compare with the current value of v. |

This function atomically updates the provided atomic variable v to hold the minimum of its current value and min_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.

std::atomic<int> v{5};

tf::atomic_min(v, 2);

// v.load() == 2

tf::atomic_min

void atomic_min(std::atomic< T > &v, const T &min_v) noexcept

updates an atomic variable with the minimum value

Definition math.hpp:307

NoteIf multiple threads call this function concurrently, the value of v will be the minimum value seen across all threads.

coprime()

|

| size_t tf::coprime | ( | size_t | N | ) | |

| constexpr |

computes a coprime of a given number

Parameters

| N | input number for which a coprime is to be found. |

Returnsthe largest number < N that is coprime to N

This function finds the largest number less than N that is coprime (i.e., has a greatest common divisor of 1) with N. If N is less than 3, it returns 1 as a default coprime.

tf::coprime(10); // returns 9

tf::coprime

constexpr size_t coprime(size_t N)

computes a coprime of a given number

Definition math.hpp:352

cuda_free() [1/2]

template<typename T>

| void tf::cuda_free | ( | T * | ptr | ) | |

frees memory on the GPU device

Template Parameters

| T | pointer type |

Parameters

| ptr | device pointer to memory to free |

This methods call cudaFree to free the memory space pointed to by ptr using the current device context of the caller.

cuda_free() [2/2]

template<typename T>

| void tf::cuda_free | ( | T * | ptr, | | | | int | d ) |

frees memory on the GPU device

Template Parameters

| T | pointer type |

Parameters

| ptr | device pointer to memory to free | | d | device context identifier |

This methods call cudaFree to free the memory space pointed to by ptr using the given device context.

cuda_get_graph_node_type()

|

| cudaGraphNodeType tf::cuda_get_graph_node_type | ( | cudaGraphNode_t | node | ) | |

| inline |

queries the type of a native CUDA graph node

valid type values are:

  • cudaGraphNodeTypeKernel = 0x00
  • cudaGraphNodeTypeMemcpy = 0x01
  • cudaGraphNodeTypeMemset = 0x02
  • cudaGraphNodeTypeHost = 0x03
  • cudaGraphNodeTypeGraph = 0x04
  • cudaGraphNodeTypeEmpty = 0x05
  • cudaGraphNodeTypeWaitEvent = 0x06
  • cudaGraphNodeTypeEventRecord = 0x07

cuda_graph_add_dependencies()

|

| void tf::cuda_graph_add_dependencies | ( | cudaGraph_t | graph, | | | | const cudaGraphNode_t * | from, | | | | const cudaGraphNode_t * | to, | | | | size_t | numDependencies ) |

| inline |

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters

| graph | | | from | | | to | | | numDependencies | |

cuda_graph_node_get_dependencies()

|

| size_t tf::cuda_graph_node_get_dependencies | ( | cudaGraphNode_t | node, | | | | cudaGraphNode_t * | dependencies ) |

| inline |

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters

| node | | | dependencies | |

Returns

cuda_graph_node_get_dependent_nodes()

|

| size_t tf::cuda_graph_node_get_dependent_nodes | ( | cudaGraphNode_t | node, | | | | cudaGraphNode_t * | dependent_nodes ) |

| inline |

Handles compatibility with CUDA <= 12.x and CUDA 13.

Parameters

| node | | | dependent_nodes | |

Returns

cuda_malloc_device() [1/2]

template<typename T>

| T * tf::cuda_malloc_device | ( | size_t | N | ) | |

allocates memory on the current device associated with the caller

The function calls malloc_device from the current device associated with the caller.

cuda_malloc_device() [2/2]

template<typename T>

| T * tf::cuda_malloc_device | ( | size_t | N, | | | | int | d ) |

allocates memory on the given device for holding N elements of type T

The function calls cudaMalloc to allocate N*sizeof(T) bytes of memory on the given device d and returns a pointer to the starting address of the device memory.

cuda_malloc_shared()

template<typename T>

| T * tf::cuda_malloc_shared | ( | size_t | N | ) | |

allocates shared memory for holding N elements of type T

The function calls cudaMallocManaged to allocate N*sizeof(T) bytes of memory and returns a pointer to the starting address of the shared memory.

cuda_memcpy_async()

|

| void tf::cuda_memcpy_async | ( | cudaStream_t | stream, | | | | void * | dst, | | | | const void * | src, | | | | size_t | count ) |

| inline |

copies data between host and device asynchronously through a stream

Parameters

| stream | stream identifier | | dst | destination memory address | | src | source memory address | | count | size in bytes to copy |

The method calls cudaMemcpyAsync with the given stream using cudaMemcpyDefault to infer the memory space of the source and the destination pointers. The memory areas may not overlap.

cuda_memset_async()

|

| void tf::cuda_memset_async | ( | cudaStream_t | stream, | | | | void * | devPtr, | | | | int | value, | | | | size_t | count ) |

| inline |

initializes or sets GPU memory to the given value byte by byte

Parameters

| stream | stream identifier | | devPtr | pointer to GPU memory | | value | value to set for each byte of the specified memory | | count | size in bytes to set |

The method calls cudaMemsetAsync with the given stream to fill the first count bytes of the memory area pointed to by devPtr with the constant byte value value.

distance()

template<std::integral T>

|

| size_t tf::distance | ( | T | beg, | | | | T | end, | | | | T | step ) |

| constexpr |

calculates the number of iterations in the given index range

Template Parameters

| T | integral type of the indices and step |

Parameters

| beg | starting index of the range | | end | ending index of the range | | step | step size to traverse the range |

Returnsreturns the number of required iterations to traverse the range

The distance of a range represents the number of required iterations to traverse the range from the beginning index to the ending index (exclusive) with the given step size.

Example 1:

// Range: 0 to 10 with step size 2

size_t dist = distance(0, 10, 2); // Returns 5, the sequence is [0, 2, 4, 6, 8]

tf::distance

constexpr size_t distance(T beg, T end, T step)

calculates the number of iterations in the given index range

Definition iterator.hpp:71

Example 2:

// Range: 10 to 0 with step size -2

size_t dist = distance(10, 0, -2); // Returns 5, the sequence is [10, 8, 6, 4, 2]

Example 3:

// Range: 5 to 20 with step size 5

size_t dist = distance(5, 20, 5); // Returns 3, the sequence is [5, 10, 15]

AttentionAn invalid index range will return 0.

get_env()

|

| std::string tf::get_env | ( | const std::string & | str | ) | |

| inline |

retrieves the value of an environment variable

Parameters

| str | The name of the environment variable to retrieve. |

ReturnsThe value of the environment variable as a string, or an empty string if not found.

This function fetches the value of an environment variable by name. If the variable is not found, it returns an empty string.

std::string path = tf::get_env("PATH");

tf::get_env

std::string get_env(const std::string &str)

retrieves the value of an environment variable

Definition os.hpp:274

NoteThe implementation differs between Windows and POSIX platforms:

  • On Windows, it uses _dupenv_s to fetch the value.
  • On POSIX, it uses std::getenv.

has_env()

|

| bool tf::has_env | ( | const std::string & | str | ) | |

| inline |

checks whether an environment variable is defined

Parameters

| str | The name of the environment variable to check. |

Returnstrue if the environment variable exists, false otherwise.

This function determines if a specific environment variable exists in the current environment.

if(tf::has_env("TF_NUM_THREADS")) {

// ...

}

tf::has_env

bool has_env(const std::string &str)

checks whether an environment variable is defined

Definition os.hpp:310

NoteThe implementation differs between Windows and POSIX platforms:

  • On Windows, it uses _dupenv_s to check for the variable's presence.
  • On POSIX, it uses std::getenv to check for the variable's presence.

is_index_range_invalid()

template<std::integral T>

|

| bool tf::is_index_range_invalid | ( | T | beg, | | | | T | end, | | | | T | step ) |

| constexpr |

checks if the given index range is invalid

Template Parameters

| T | integral type of the indices and step |

Parameters

| beg | starting index of the range | | end | ending index of the range | | step | step size to traverse the range |

Returnsreturns true if the range is invalid; false otherwise.

A range is considered invalid under the following conditions:

  • The step is zero and the begin and end values are not equal.
  • A positive range (begin < end) with a non-positive step.
  • A negative range (begin > end) with a non-negative step.

is_pow2()

template<std::integral T>

|

| bool tf::is_pow2 | ( | const T & | x | ) | |

| constexpr |

checks if the given number is a power of 2

Template Parameters

| T | integral type of the input |

Parameters

| x | The integer to check. |

Returnstrue if x is a power of 2, otherwise false.

This function determines if the given integer is a power of 2 by testing that exactly one bit is set, i.e., x & (x - 1) == 0, while also excluding zero.

tf::is_pow2(8); // true

tf::is_pow2(10); // false

tf::is_pow2

constexpr bool is_pow2(const T &x)

checks if the given number is a power of 2

Definition math.hpp:92

NoteThis function is constexpr and can be evaluated at compile time.

make_coprime_lut()

template<size_t N>

|

| std::array< size_t, N > tf::make_coprime_lut | ( | | ) | |

| constexpr |

generates a compile-time array of coprimes for numbers from 0 to N-1

Template Parameters

| N | the size of the array to generate (should be greater than 0). |

Returnsa constexpr array of size N where each index holds a coprime of its value.

This function constructs a constexpr array where each element at index i contains a coprime of i (the largest number less than i that is coprime to it).

constexpr auto lut = tf::make_coprime_lut<8>();

// lut[5] holds a coprime of 5

tf::make_coprime_lut

constexpr std::array< size_t, N > make_coprime_lut()

generates a compile-time array of coprimes for numbers from 0 to N-1

Definition math.hpp:379

make_data_pipe()

template<typename Input, typename Output, typename C>

| auto tf::make_data_pipe | ( | PipeType | d, | | | | C && | callable ) |

function to construct a data pipe (tf::DataPipe)

Template Parameters

| Input | input data type | | Output | output data type | | C | callable type |

tf::make_data_pipe is a helper function to create a data pipe (tf::DataPipe) in a data-parallel pipeline (tf::DataPipeline). The first argument specifies the direction of the data pipe, either tf::PipeType::SERIAL or tf::PipeType::PARALLEL, and the second argument is a callable to invoke by the pipeline scheduler. Input and output data types are specified via template parameters, which will always be decayed by the library to its original form for storage purpose. The callable must take the input data type in its first argument and returns a value of the output data type.

tf::make_data_pipe<int, std::string>(

tf::PipeType::SERIAL,

[](int& input) {

return std::to_string(input + 100);

}

);

tf::make_data_pipe

auto make_data_pipe(PipeType d, C &&callable)

function to construct a data pipe (tf::DataPipe)

Definition data_pipeline.hpp:171

tf::PipeType::SERIAL

@ SERIAL

serial type

Definition pipeline.hpp:117

The callable can additionally take a reference of tf::Pipeflow, which allows you to query the runtime information of a stage task, such as its line number and token number.

tf::make_data_pipe<int, std::string>(

tf::PipeType::SERIAL,

[](int& input, tf::Pipeflow& pf) {

printf("token=%lu, line=%lu\n", pf.token(), pf.line());

return std::to_string(input + 100);

}

);

tf::Pipeflow

class to create a pipeflow object used by the pipe callable

Definition pipeline.hpp:43

tf::Pipeflow::token

size_t token() const

queries the token identifier

Definition pipeline.hpp:78

tf::Pipeflow::line

size_t line() const

queries the line identifier of the present token

Definition pipeline.hpp:64

make_module_task()

template<GraphLike T>

| auto tf::Algorithm::make_module_task | ( | T & | target | ) | |

creates a module task using the given graph

Template Parameters

| T | type satisfying tf::GraphLike |

Parameters

| target | the target object used to create the module task |

Returnsa module task that can be used by Taskflow or asynchronous tasking

This example demonstrates how to create and launch multiple taskflows in parallel using modules with asynchronous tasking:

tf::Executor executor;

tf::Taskflow A;

tf::Taskflow B;

tf::Taskflow C;

tf::Taskflow D;

A.emplace({ printf("Taskflow A\n"); });

B.emplace({ printf("Taskflow B\n"); });

C.emplace({ printf("Taskflow C\n"); });

D.emplace({ printf("Taskflow D\n"); });

// launch the four taskflows using asynchronous tasking

executor.async(tf::make_module_task(A));

executor.async(tf::make_module_task(B));

executor.async(tf::make_module_task(C));

executor.async(tf::make_module_task(D));

executor.wait_for_all();

tf::Executor

class to create an executor

Definition executor.hpp:62

tf::Executor::wait_for_all

void wait_for_all()

waits for all tasks to complete

tf::Executor::async

auto async(P &&params, F &&func)

creates a parameterized asynchronous task to run the given function

tf::FlowBuilder::emplace

Task emplace(C &&callable)

creates a static task

Definition flow_builder.hpp:1571

tf::Taskflow

class to create a taskflow object

Definition taskflow.hpp:64

tf::make_module_task

auto make_module_task(T &target)

creates a module task using the given graph

Definition module.hpp:74

The module task maker, tf::make_module_task, is basically the same as tf::Taskflow::composed_of but provides a more generic interface that can be used beyond Taskflow. For instance, the following two approaches achieve the same functionality.

// approach 1: composition using composed_of

tf::Task m1 = taskflow1.composed_of(taskflow2);

// approach 2: composition using make_module_task

tf::Task m1 = taskflow1.emplace(tf::make_module_task(taskflow2));

tf::Task

class to create a task handle over a taskflow node

Definition task.hpp:569

tf::Task::composed_of

Task & composed_of(T &object)

creates a module task from a taskflow

Definition task.hpp:1290

AttentionUsers are responsible for ensuring that the given target remains valid throughout its execution. The executor does not assume ownership of the target object.

make_worker_interface()

template<typename T, typename... ArgsT>

| std::shared_ptr< T > tf::make_worker_interface | ( | ArgsT &&... | args | ) | |

helper function to create an instance derived from tf::WorkerInterface

Template Parameters

| T | type derived from tf::WorkerInterface | | ArgsT | argument types to construct T |

Parameters

| args | arguments to forward to the constructor of T |

median_of_three()

template<typename RandItr, typename C>

| RandItr tf::median_of_three | ( | RandItr | l, | | | | RandItr | m, | | | | RandItr | r, | | | | C | cmp ) |

finds the median of three numbers pointed to by iterators using the given comparator

Template Parameters

| RandItr | The type of the random-access iterator. | | C | The type of the comparator. |

Parameters

| l | Iterator to the first element. | | m | Iterator to the second element. | | r | Iterator to the third element. | | cmp | The comparator used to compare the dereferenced iterator values. |

ReturnsThe iterator pointing to the median value among the three elements.

This function determines the median value of the elements pointed to by three random-access iterators using the provided comparator.

std::vector<int> v = {5, 1, 3};

auto it = tf::median_of_three(v.begin(), v.begin()+1, v.begin()+2, std::less<int>{});

// *it == 3

tf::median_of_three

RandItr median_of_three(RandItr l, RandItr m, RandItr r, C cmp)

finds the median of three numbers pointed to by iterators using the given comparator

Definition math.hpp:143

next_pow2() [1/2]

template<typename T>
requires (std::is_unsigned_v<std::decay_t<T>> && sizeof(T) == 8)

|

| T tf::next_pow2 | ( | T | x | ) | |

| constexpr |

rounds the given 64-bit unsigned integer to the nearest power of 2

Template Parameters

| T | 64-bit unsigned integral type |

Parameters

| x | the number to round up |

Returnsthe smallest power of 2 that is greater than or equal to x

This overload participates in overload resolution only when T is an 8-byte unsigned integral type. It repeatedly fills in the lower bits of x - 1 until all bits below the highest set bit are 1, then adds 1 to obtain the next power of 2.

tf::next_pow2(uint64_t{17}); // returns 32

tf::next_pow2(uint64_t{32}); // returns 32

tf::next_pow2

constexpr T next_pow2(T x)

rounds the given 64-bit unsigned integer to the nearest power of 2

Definition math.hpp:30

next_pow2() [2/2]

template<typename T>
requires (std::is_unsigned_v<std::decay_t<T>> && sizeof(T) == 4)

|

| T tf::next_pow2 | ( | T | y | ) | |

| constexpr |

rounds the given 32-bit unsigned integer to the nearest power of 2

Template Parameters

| T | 32-bit unsigned integral type |

Parameters

| y | the number to round up |

Returnsthe smallest power of 2 that is greater than or equal to y

This overload participates in overload resolution only when T is a 4-byte unsigned integral type. It uses the same bit-filling technique as the 64-bit overload, but only propagates bits up to the 32-bit width.

tf::next_pow2(uint32_t{17}); // returns 32

tf::next_pow2(uint32_t{32}); // returns 32

pseudo_median_of_nine()

template<typename RandItr, typename C>

| RandItr tf::pseudo_median_of_nine | ( | RandItr | beg, | | | | RandItr | end, | | | | C | cmp ) |

finds the pseudo median of a range of items using a spread of nine numbers

Template Parameters

| RandItr | The type of the random-access iterator. | | C | The type of the comparator. |

Parameters

| beg | Iterator to the beginning of the range. | | end | Iterator to the end of the range. | | cmp | The comparator used to compare the dereferenced iterator values. |

ReturnsThe iterator pointing to the pseudo median of the range.

This function computes an approximate median of a range of items by sampling nine values spread across the range and finding their median. It uses a combination of the median_of_three function to determine the pseudo median.

std::vector<int> v = {9, 4, 1, 7, 3, 8, 2, 6, 5};

auto it = tf::pseudo_median_of_nine(v.begin(), v.end(), std::less<int>{});

tf::pseudo_median_of_nine

RandItr pseudo_median_of_nine(RandItr beg, RandItr end, C cmp)

finds the pseudo median of a range of items using a spread of nine numbers

Definition math.hpp:171

NoteThe pseudo median is an approximation of the true median and may not be the exact middle value of the range.

retrieve_graph()

template<GraphLike T>

| Graph & tf::retrieve_graph | ( | T & | target | ) | |

retrieves a reference to the underlying tf::Graph from an object

This helper function abstracts the retrieval of a graph reference. It uses compile-time introspection to determine if the object provides a graph() member function or if it should be treated as a tf::Graph directly.

Template Parameters

| T | type satisfying the tf::GraphLike concept |

Parameters

| target | object from which to retrieve the graph |

Returnsa reference to the underlying tf::Graph

// Case 1: T has a .graph() member (composition)

struct CustomGraph1 {

tf::Graph& graph() { return _graph; }

tf::Graph _graph;

};

// Case 2: T is derived from tf::Graph (inheritance)

struct CustomGraph2 : public tf::Graph {

// ...

};

CustomGraph1 custom_graph1;

CustomGraph2 custom_graph2;

tf::Graph& g1 = tf::retrieve_graph(custom_graph1);

tf::Graph& g2 = tf::retrieve_graph(custom_graph2);

tf::Graph

class to create a graph object

Definition graph.hpp:47

tf::retrieve_graph

Graph & retrieve_graph(T &target)

retrieves a reference to the underlying tf::Graph from an object

Definition graph.hpp:1067

NoteThis function is evaluated at compile time via if constexpr, resulting in zero runtime overhead.

seed()

template<typename T>

|

| T tf::seed | ( | | ) | |

| inlinenoexcept |

generates a random seed based on the current system clock

Template Parameters

| T | The type of the returned seed. Must be an integral type. |

ReturnsA seed value based on the system clock.

This function returns a seed value derived from the number of clock ticks since the epoch as measured by the system clock. The seed can be used to initialize random number generators.

auto s = tf::seed<size_t>();

tf::Xorshift<uint64_t> rng(s);

tf::Xorshift

class to create a fast xorshift-based pseudo-random number generator

Definition math.hpp:412

tf::seed

T seed() noexcept

generates a random seed based on the current system clock

Definition math.hpp:331

sort2()

template<typename Iter, typename Compare>

| void tf::sort2 | ( | Iter | a, | | | | Iter | b, | | | | Compare | comp ) |

sorts two elements of dereferenced iterators using the given comparison function

Template Parameters

| Iter | The type of the iterator. | | Compare | The type of the comparator. |

Parameters

| a | Iterator to the first element. | | b | Iterator to the second element. | | comp | The comparator used to compare the dereferenced iterator values. |

This function compares two elements pointed to by iterators and swaps them if they are out of order according to the provided comparator.

std::vector<int> v = {3, 1};

tf::sort2(v.begin(), v.begin()+1, std::less<int>{});

// v == {1, 3}

tf::sort2

void sort2(Iter a, Iter b, Compare comp)

sorts two elements of dereferenced iterators using the given comparison function

Definition math.hpp:201

sort3()

template<typename Iter, typename Compare>

| void tf::sort3 | ( | Iter | a, | | | | Iter | b, | | | | Iter | c, | | | | Compare | comp ) |

Sorts three elements of dereferenced iterators using the given comparison function.

Template Parameters

| Iter | The type of the iterator. | | Compare | The type of the comparator. |

Parameters

| a | Iterator to the first element. | | b | Iterator to the second element. | | c | Iterator to the third element. | | comp | The comparator used to compare the dereferenced iterator values. |

This function sorts three elements pointed to by iterators in ascending order according to the provided comparator. The sorting is performed using a sequence of calls to the sort2 function to ensure the correct order of elements.

std::vector<int> v = {3, 1, 2};

tf::sort3(v.begin(), v.begin()+1, v.begin()+2, std::less<int>{});

// v == {1, 2, 3}

tf::sort3

void sort3(Iter a, Iter b, Iter c, Compare comp)

Sorts three elements of dereferenced iterators using the given comparison function.

Definition math.hpp:226

static_floor_log2()

template<size_t N>

|

| size_t tf::static_floor_log2 | ( | | ) | |

| constexpr |

returns the floor of log2(N) at compile time

Template Parameters

| N | the input value |

Returnsthe largest integer k such that 2^k <= N

This function recursively halves N until it is smaller than 2, counting the number of halving steps performed, which equals the floor of the base-2 logarithm of N.

tf::static_floor_log2<16>(); // returns 4

tf::static_floor_log2<17>(); // returns 4

tf::static_floor_log2

constexpr size_t static_floor_log2()

returns the floor of log2(N) at compile time

Definition math.hpp:112

to_string()

|

| const char * tf::to_string | ( | TaskType | type | ) | |

| inline |

convert a task type to a human-readable string

The name of each task type is the litte-case string of its characters.

unique_id()

template<std::integral T>

| T tf::unique_id | ( | | ) | |

generates a program-wide unique ID of the given type in a thread-safe manner

Template Parameters

| T | integral type of the ID to generate |

ReturnsA unique ID of type T.

This function provides a globally unique identifier of the specified integral type. It uses a static std::atomic counter to ensure thread safety and increments the counter in a relaxed memory ordering for efficiency.

size_t id1 = tf::unique_id<size_t>();

size_t id2 = tf::unique_id<size_t>();

// id1 != id2

tf::unique_id

T unique_id()

generates a program-wide unique ID of the given type in a thread-safe manner

Definition math.hpp:252

NoteThe uniqueness of the ID is guaranteed only within the program's lifetime. The function does not throw exceptions.

version()

|

| const char * tf::version | ( | | ) | |

| constexpr |

queries the version information in a string format major.minor.patch

Release notes are available here: https://taskflow.github.io/taskflow/Releases.html

wsq_empty_value()

template<typename T>

|

| auto tf::wsq_empty_value | ( | | ) | |

| constexpr |

returns the empty sentinel for work-stealing steal operations

For pointer types T, returns nullptr. For non-pointer types, returns std::nullopt. A steal operation returning this value means no task could be stolen (either the queue was empty or the CAS was lost to another thief).

Variable Documentation

is_condition_task_v

template<typename C>

|

| bool tf::is_condition_task_v = ConditionTaskLike<C> |

| constexpr |

determines if a callable is a condition task (variable template)

Template Parameters

| C | callable type to check |

Equivalent to tf::ConditionTaskLike<C>.

is_index_ranges_v

template<typename>

|

| bool tf::is_index_ranges_v = false |

| constexpr |

base type trait to detect if a type is a tf::IndexRanges

Template Parameters

| T | The type to inspect. |

is_index_ranges_v< IndexRanges< T, N > >

template<typename T, size_t N>

|

| bool tf::is_index_ranges_v< IndexRanges< T, N > > = true |

| constexpr |

specialization of the detector for tf::IndexRanges<T, N>

Matches an IndexRanges of ANY dimensionality (1D, 2D, 3D, etc.).

Template Parameters

| T | the underlying coordinate type (e.g., size_t, int) | | N | the number of dimensions |

is_multi_condition_task_v

template<typename C>

|

| bool tf::is_multi_condition_task_v = MultiConditionTaskLike<C> |

| constexpr |

determines if a callable is a multi-condition task (variable template)

Template Parameters

| C | callable type to check |

Equivalent to tf::MultiConditionTaskLike<C>.

is_partitioner_v

template<typename P>

|

| bool tf::is_partitioner_v = PartitionerLike<P> |

| inlineconstexpr |

determines if a type is a partitioner (variable template)

Template Parameters

| P | type to check |

Equivalent to tf::PartitionerLike

. Provided for backward compatibility.

is_runtime_task_v

template<typename C>

|

| bool tf::is_runtime_task_v = RuntimeTaskLike<C> |

| constexpr |

determines if a callable is a runtime task (variable template)

Template Parameters

| C | callable type to check |

Equivalent to tf::RuntimeTaskLike<C>.

is_static_task_v

template<typename C>

|

| bool tf::is_static_task_v = StaticTaskLike<C> |

| constexpr |

determines if a callable is a static task (variable template)

Template Parameters

| C | callable type to check |

Equivalent to tf::StaticTaskLike<C>.

is_subflow_task_v

template<typename C>

|

| bool tf::is_subflow_task_v = SubflowTaskLike<C> |

| constexpr |

determines if a callable is a subflow task (variable template)

Template Parameters

| C | callable type to check |

Equivalent to tf::SubflowTaskLike<C>.

is_task_params_v

template<typename P>

|

| bool tf::is_task_params_v = TaskParamsLike<P> |

| constexpr |

determines if a type is a task parameter type (variable template)

Template Parameters

| P | type to check |

Equivalent to tf::TaskParameters

. Provided for backward compatibility.