docs/namespacetf.html
taskflow namespace
template<typename T, unsigned N = 2> class SmallVectorclass to define a vector optimized for small array template<typename T> class Xorshiftclass to create a fast xorshift-based pseudo-random number generator template<typename T> class CachelineAlignedclass to ensure cacheline-aligned storage for an object. template<typename T> class IndexRangeclass to create an index range of integral indices with a step size class Graphclass to create a graph object class TaskParamsclass to create a task parameter object class DefaultTaskParamsclass to create an empty task parameter for compile-time optimization template<typename T> class UnboundedWSQclass to create a lock-free unbounded work-stealing queue template<typename T, size_t LogSize = TF_DEFAULT_BOUNDED_TASK_QUEUE_LOG_SIZE> class BoundedWSQclass to create a lock-free bounded work-stealing queue class FlowBuilderclass to build a task dependency graph class Subflowclass to construct a subflow graph from the execution of a dynamic task class NonblockingNotifierclass to create a non-blocking notifier class Workerclass to create a worker in an executor class WorkerViewclass to create an immutable view of a worker class WorkerInterfaceclass to configure worker behavior in an executor class Executorclass to create an executor class Taskclass to create a task handle over a taskflow node class TaskViewclass to access task information from the observer interface class AsyncTaskclass to hold a dependent asynchronous task with shared ownership class Runtimeclass to create a runtime task class TaskGroupclass to create a task group from a task class Semaphoreclass to create a semophore object for building a concurrency constraint class Taskflowclass to create a taskflow object template<typename T> class Futureclass to access the result of an execution class ObserverInterfaceclass to derive an executor observer class ChromeObserverclass to create an observer based on Chrome tracing format class TFProfObserverclass to create an observer based on the built-in taskflow profiler format class DefaultClosureWrapperclass to create a default closure wrapper template<typename C = DefaultClosureWrapper> class PartitionerBaseclass to derive a partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class GuidedPartitionerclass to create a guided partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class DynamicPartitionerclass to create a dynamic partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class StaticPartitionerclass to construct a static partitioner for scheduling parallel algorithms template<typename C = DefaultClosureWrapper> class RandomPartitionerclass to construct a random partitioner for scheduling parallel algorithms class Pipeflowclass to create a pipeflow object used by the pipe callable template<typename C = std::function<void(tf::Pipeflow&)>> class Pipeclass to create a pipe object for a pipeline stage template<typename... Ps> class Pipelineclass to create a pipeline scheduling framework template<typename P> class ScalablePipelineclass to create a scalable pipeline object template<typename Input, typename Output, typename C> class DataPipeclass to create a stage in a data-parallel pipeline template<typename... Ps> class DataPipelineclass to create a data-parallel pipeline scheduling framework class cudaScopedDeviceclass to create an RAII-styled context switch class cudaEventCreatorclass to create functors that construct CUDA events class cudaEventDeleterclass to create a functor that deletes a CUDA event template<typename Creator, typename Deleter> class cudaEventBaseclass to create a CUDA event with unique ownership class cudaStreamCreatorclass to create functors that construct CUDA streams class cudaStreamDeleterclass to create a functor that deletes a CUDA stream template<typename Creator, typename Deleter> class cudaStreamBaseclass to create a CUDA stream with unique ownership class cudaTaskclass to create a task handle of a CUDA Graph node class cudaGraphCreatorclass to create functors that construct CUDA graphs class cudaGraphDeleterclass to create a functor that deletes a CUDA graph template<typename Creator, typename Deleter> class cudaGraphBaseclass to create a CUDA graph with uunique ownership class cudaGraphExecCreatorclass to create functors for constructing executable CUDA graphs class cudaGraphExecDeleterclass to create a functor for deleting an executable CUDA graph template<typename Creator, typename Deleter> class cudaGraphExecBaseclass to create an executable CUDA graph with unique ownership
enum class TaskType: int { PLACEHOLDER = 0, STATIC, RUNTIME, SUBFLOW, CONDITION, MODULE, ASYNC, UNDEFINED }enumeration of all task typesenum class ObserverType: int { TFPROF = 0, CHROME, UNDEFINED }enumeration of all observer typesenum class PartitionerType: int { STATIC, DYNAMIC }enumeration of all partitioner typesenum class PipeType: int { PARALLEL = 1, SERIAL = 2 }enumeration of all pipe types
using DefaultNotifier = NonblockingNotifierthe default notifier type used by Taskflow using observer_stamp_t = std::chrono::time_point<std::chrono::steady_clock> default time point type of observers using DefaultPartitioner = GuidedPartitioner<> default partitioner set to tf::GuidedPartitioner using cudaEvent = cudaEventBase<cudaEventCreator, cudaEventDeleter> default smart pointer type to manage a cudaEvent_t object with unique ownership using cudaStream = cudaStreamBase<cudaStreamCreator, cudaStreamDeleter> default smart pointer type to manage a cudaStream_t object with unique ownership using cudaGraph = cudaGraphBase<cudaGraphCreator, cudaGraphDeleter> default smart pointer type to manage a cudaGraph_t object with unique ownership using cudaGraphExec = cudaGraphExecBase<cudaGraphExecCreator, cudaGraphExecDeleter> default smart pointer type to manage a cudaGraphExec_t object with unique ownership
template<typename T, std::enable_if_t<(std::is_unsigned_v<std::decay_t<T>> && sizeof(T)==8), void>* = nullptr>
auto next_pow2(T x) -> T constexprrounds the given 64-bit unsigned integer to the nearest power of 2
template<typename T, std::enable_if_t<std::is_integral_v<std::decay_t<T>>, void>* = nullptr>
auto is_pow2(const T& x) -> bool constexprchecks if the given number is a power of 2
template<size_t N>
auto static_floor_log2() -> size_t constexprreturns the floor of log2(N) at compile time
template<typename RandItr, typename C>
auto median_of_three(RandItr l, RandItr m, RandItr r, C cmp) -> RandItrfinds the median of three numbers pointed to by iterators using the given comparator
template<typename RandItr, typename C>
auto pseudo_median_of_nine(RandItr beg, RandItr end, C cmp) -> RandItrfinds the pseudo median of a range of items using a spread of nine numbers
template<typename Iter, typename Compare>
void sort2(Iter a, Iter b, Compare comp)sorts two elements of dereferenced iterators using the given comparison function
template<typename Iter, typename Compare>
void sort3(Iter a, Iter b, Iter c, Compare comp)Sorts three elements of dereferenced iterators using the given comparison function.
template<typename T, std::enable_if_t<std::is_integral_v<T>, void>* = nullptr>
auto unique_id() -> Tgenerates a program-wide unique ID of the given type in a thread-safe manner
template<typename T>
void atomic_max(std::atomic<T>& v, const T& max_v) noexceptupdates an atomic variable with the maximum value
template<typename T>
void atomic_min(std::atomic<T>& v, const T& min_v) noexceptupdates an atomic variable with the minimum value
template<typename T>
auto seed() -> T noexceptgenerates a random seed based on the current system clockauto coprime(size_t N) -> size_t constexprcomputes a coprime of a given number
template<size_t N>
auto make_coprime_lut() -> std::array<size_t, N> constexprgenerates a compile-time array of coprimes for numbers from 0 to N-1auto get_env(const std::string& str) -> std::stringretrieves the value of an environment variableauto has_env(const std::string& str) -> boolchecks whether an environment variable is defined
template<typename B, typename E, typename S>
auto is_index_range_invalid(B beg, E end, S step) -> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, bool> constexprchecks if the given index range is invalid
template<typename B, typename E, typename S>
auto distance(B beg, E end, S step) -> std::enable_if_t<std::is_integral_v<std::decay_t<B>> && std::is_integral_v<std::decay_t<E>> && std::is_integral_v<std::decay_t<S>>, size_t> constexprcalculates the number of iterations in the given index range
template<typename T, typename... ArgsT>
auto make_worker_interface(ArgsT && ... args) -> std::shared_ptr<T>helper function to create an instance derived from tf::WorkerInterfaceauto to_string(TaskType type) -> const char*convert a task type to a human-readable stringauto operator<<(std::ostream& os, const Task& task) -> std::ostream&overload of ostream inserter operator for Taskauto to_string(ObserverType type) -> const char*convert an observer type to a human-readable string
template<typename Input, typename Output, typename C>
auto make_data_pipe(PipeType d, C&& callable) -> autofunction to construct a data pipe (tf::DataPipe)
template<typename T>
auto make_module_task(T&& graph) -> autocreates a module task using the given graphauto cuda_get_num_devices() -> size_tqueries the number of available devicesauto cuda_get_device() -> intgets the current device associated with the caller threadvoid cuda_set_device(int id)switches to a given device contextvoid cuda_get_device_property(int i, cudaDeviceProp& p)obtains the device propertyauto cuda_get_device_property(int i) -> cudaDevicePropobtains the device propertyvoid cuda_dump_device_property(std::ostream& os, const cudaDeviceProp& p)dumps the device propertyauto cuda_get_device_max_threads_per_block(int d) -> size_tqueries the maximum threads per block on a deviceauto cuda_get_device_max_x_dim_per_block(int d) -> size_tqueries the maximum x-dimension per block on a deviceauto cuda_get_device_max_y_dim_per_block(int d) -> size_tqueries the maximum y-dimension per block on a deviceauto cuda_get_device_max_z_dim_per_block(int d) -> size_tqueries the maximum z-dimension per block on a deviceauto cuda_get_device_max_x_dim_per_grid(int d) -> size_tqueries the maximum x-dimension per grid on a deviceauto cuda_get_device_max_y_dim_per_grid(int d) -> size_tqueries the maximum y-dimension per grid on a deviceauto cuda_get_device_max_z_dim_per_grid(int d) -> size_tqueries the maximum z-dimension per grid on a deviceauto cuda_get_device_max_shm_per_block(int d) -> size_tqueries the maximum shared memory size in bytes per block on a deviceauto cuda_get_device_warp_size(int d) -> size_tqueries the warp size on a deviceauto cuda_get_device_compute_capability_major(int d) -> intqueries the major number of compute capability of a deviceauto cuda_get_device_compute_capability_minor(int d) -> intqueries the minor number of compute capability of a deviceauto cuda_get_device_unified_addressing(int d) -> boolqueries if the device supports unified addressingauto cuda_get_driver_version() -> intqueries the latest CUDA version (1000 * major + 10 * minor) supported by the driverauto cuda_get_runtime_version() -> intqueries the CUDA Runtime version (1000 * major + 10 * minor)auto cuda_get_free_mem(int d) -> size_tqueries the free memory (expensive call)auto cuda_get_total_mem(int d) -> size_tqueries the total available memory (expensive call)
template<typename T>
auto cuda_malloc_device(size_t N, int d) -> T*allocates memory on the given device for holding N elements of type T
template<typename T>
auto cuda_malloc_device(size_t N) -> T*allocates memory on the current device associated with the caller
template<typename T>
auto cuda_malloc_shared(size_t N) -> T*allocates shared memory for holding N elements of type T
template<typename T>
void cuda_free(T* ptr, int d)frees memory on the GPU device
template<typename T>
void cuda_free(T* ptr)frees memory on the GPU devicevoid cuda_memcpy_async(cudaStream_t stream, void* dst, const void* src, size_t count)copies data between host and device asynchronously through a streamvoid cuda_memset_async(cudaStream_t stream, void* devPtr, int value, size_t count)initializes or sets GPU memory to the given value byte by byte
template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr>
auto cuda_get_copy_parms(T* tgt, const T* src, size_t num) -> cudaMemcpy3DParmsgets the memcpy node parameter of a copy taskauto cuda_get_memcpy_parms(void* tgt, const void* src, size_t bytes) -> cudaMemcpy3DParmsgets the memcpy node parameter of a memcpy task (untyped)auto cuda_get_memset_parms(void* dst, int ch, size_t count) -> cudaMemsetParamsgets the memset node parameter of a memcpy task (untyped)
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
auto cuda_get_fill_parms(T* dst, T value, size_t count) -> cudaMemsetParamsgets the memset node parameter of a fill task (typed)
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
auto cuda_get_zero_parms(T* dst, size_t count) -> cudaMemsetParamsgets the memset node parameter of a zero task (typed)auto cuda_graph_get_num_root_nodes(cudaGraph_t graph) -> size_tqueries the number of root nodes in a native CUDA graphauto cuda_graph_get_num_nodes(cudaGraph_t graph) -> size_tqueries the number of nodes in a native CUDA graphauto cuda_graph_get_num_edges(cudaGraph_t graph, cudaGraphNode_t* from, cudaGraphNode_t* to) -> size_tHandles compatibility with CUDA <= 12.x and CUDA == 13.x.auto cuda_graph_node_get_dependencies(cudaGraphNode_t node, cudaGraphNode_t* dependencies) -> size_tHandles compatibility with CUDA <= 12.x and CUDA 13.auto cuda_graph_node_get_dependent_nodes(cudaGraphNode_t node, cudaGraphNode_t* dependent_nodes) -> size_tHandles compatibility with CUDA <= 12.x and CUDA 13.void cuda_graph_add_dependencies(cudaGraph_t graph, const cudaGraphNode_t* from, const cudaGraphNode_t* to, size_t numDependencies)Handles compatibility with CUDA <= 12.x and CUDA 13.auto cuda_graph_get_num_edges(cudaGraph_t graph) -> size_tqueries the number of edges in a native CUDA graphauto cuda_graph_get_nodes(cudaGraph_t graph) -> std::vector<cudaGraphNode_t>acquires the nodes in a native CUDA graphauto cuda_graph_get_root_nodes(cudaGraph_t graph) -> std::vector<cudaGraphNode_t>acquires the root nodes in a native CUDA graphauto cuda_graph_get_edges(cudaGraph_t graph) -> std::vector<std::pair<cudaGraphNode_t, cudaGraphNode_t>>acquires the edges in a native CUDA graphauto cuda_get_graph_node_type(cudaGraphNode_t node) -> cudaGraphNodeTypequeries the type of a native CUDA graph nodeauto to_string(cudaGraphNodeType type) -> const char* constexprconvert a cuda_task type to a human-readable stringauto operator<<(std::ostream& os, const cudaTask& ct) -> std::ostream&overload of ostream inserter operator for cudaTaskauto version() -> const char* constexprqueries the version information in a string format major.minor.patch
template<typename P>
bool is_task_params_v constexprdetermines if the given type is a task parameter type
template<typename T>
bool has_graph_v constexprdetermines if the given type has a member function Graph& graph()std::array<TaskType, 7> TASK_TYPES constexprarray of all task types (used for iterating task types)
template<typename C>
bool is_static_task_v constexprdetermines if a callable is a static task
template<typename C>
bool is_subflow_task_v constexprdetermines if a callable is a subflow task
template<typename C>
bool is_runtime_task_v constexprdetermines if a callable is a runtime task
template<typename C>
bool is_condition_task_v constexprdetermines if a callable is a condition task
template<typename C>
bool is_multi_condition_task_v constexprdetermines if a callable is a multi-condition task
template<typename P>
bool is_partitioner_v constexprdetermines if a type is a partitioner
enumeration of all task types
| Enumerators | |
|---|---|
| PLACEHOLDER |
placeholder task type
| | STATIC |
static task type
| | RUNTIME |
runtime task type
| | SUBFLOW |
dynamic (subflow) task type
| | CONDITION |
condition task type
| | MODULE |
module task type
| | ASYNC |
asynchronous task type
| | UNDEFINED |
undefined task type (for internal use only)
|
enumeration of all partitioner types
| Enumerators | |
|---|---|
| STATIC |
static partitioner type
| | DYNAMIC |
dynamic partitioner type
|
enumeration of all pipe types
| Enumerators | |
|---|---|
| PARALLEL |
parallel type
| | SERIAL |
serial type
|
the default notifier type used by Taskflow
By default, Taskflow uses tf::NonblockingNotifier due to its stable performance on most platforms. We do not use tf::AtomicNotifier since on some platforms and compiler versions, the atomic notification may exhibit suboptimal performance due to buggy wake-up mechanisms. These issues have been discussed in GCC bug reports and patch threads related to atomic wait/notify implementations.
See also:
default partitioner set to tf::GuidedPartitioner
Guided partitioning algorithm can achieve stable and decent performance for most parallel algorithms.
rounds the given 64-bit unsigned integer to the nearest power of 2
rounds the given 32-bit unsigned integer to the nearest power of 2
checks if the given number is a power of 2
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| x |
| Returns |
This function determines if the given integer is a power of 2.
finds the median of three numbers pointed to by iterators using the given comparator
| Template parameters |
|---|
| RandItr |
| C |
| Parameters |
| --- |
| l |
| m |
| r |
| cmp |
| Returns |
This function determines the median value of the elements pointed to by three random-access iterators using the provided comparator.
finds the pseudo median of a range of items using a spread of nine numbers
| Template parameters |
|---|
| RandItr |
| C |
| Parameters |
| --- |
| beg |
| end |
| cmp |
| Returns |
This function computes an approximate median of a range of items by sampling nine values spread across the range and finding their median. It uses a combination of the median_of_three function to determine the pseudo median.
sorts two elements of dereferenced iterators using the given comparison function
| Template parameters |
|---|
| Iter |
| Compare |
| Parameters |
| --- |
| a |
| b |
| comp |
This function compares two elements pointed to by iterators and swaps them if they are out of order according to the provided comparator.
Sorts three elements of dereferenced iterators using the given comparison function.
| Template parameters |
|---|
| Iter |
| Compare |
| Parameters |
| --- |
| a |
| b |
| c |
| comp |
This function sorts three elements pointed to by iterators in ascending order according to the provided comparator. The sorting is performed using a sequence of calls to the sort2 function to ensure the correct order of elements.
generates a program-wide unique ID of the given type in a thread-safe manner
| Template parameters |
|---|
| T |
| Returns |
This function provides a globally unique identifier of the specified integral type. It uses a static std::atomic counter to ensure thread safety and increments the counter in a relaxed memory ordering for efficiency.
updates an atomic variable with the maximum value
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| v |
| max_v |
This function atomically updates the provided atomic variable v to hold the maximum of its current value and max_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.
updates an atomic variable with the minimum value
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| v |
| min_v |
This function atomically updates the provided atomic variable v to hold the minimum of its current value and min_v. The update is performed using a relaxed memory ordering for efficiency in non-synchronizing contexts.
generates a random seed based on the current system clock
| Template parameters |
|---|
| T |
| Returns |
This function returns a seed value derived from the number of clock ticks since the epoch as measured by the system clock. The seed can be used to initialize random number generators.
computes a coprime of a given number
| Parameters |
|---|
| N |
| Returns |
This function finds the largest number less than N that is coprime (i.e., has a greatest common divisor of 1) with N. If N is less than 3, it returns 1 as a default coprime.
generates a compile-time array of coprimes for numbers from 0 to N-1
| Template parameters |
|---|
| N |
| Returns |
This function constructs a constexpr array where each element at index i contains a coprime of i (the largest number less than i that is coprime to it).
retrieves the value of an environment variable
| Parameters |
|---|
| str |
| Returns |
This function fetches the value of an environment variable by name. If the variable is not found, it returns an empty string.
checks whether an environment variable is defined
| Parameters |
|---|
| str |
| Returns |
This function determines if a specific environment variable exists in the current environment.
checks if the given index range is invalid
| Template parameters |
|---|
| B |
| E |
| S |
| Parameters |
| --- |
| beg |
| end |
| step |
| Returns |
A range is considered invalid under the following conditions:
calculates the number of iterations in the given index range
| Template parameters |
|---|
| B |
| E |
| S |
| Parameters |
| --- |
| beg |
| end |
| step |
| Returns |
The distance of a range represents the number of required iterations to traverse the range from the beginning index to the ending index (exclusive) with the given step size.
Example 1:
// Range: 0 to 10 with step size 2size\_t dist = distance(0, 10, 2);// Returns 5, the sequence is [0, 2, 4, 6, 8]
Example 2:
// Range: 10 to 0 with step size -2size\_t dist = distance(10, 0, -2);// Returns 5, the sequence is [10, 8, 6, 4, 2]
Example 3:
// Range: 5 to 20 with step size 5size\_t dist = distance(5, 20, 5);// Returns 3, the sequence is [5, 10, 15]
helper function to create an instance derived from tf::WorkerInterface
| Template parameters |
|---|
| T |
| ArgsT |
| Parameters |
| --- |
| args |
convert a task type to a human-readable string
The name of each task type is the litte-case string of its characters.
placeholderstaticruntimesubflowconditionmoduleasyncfunction to construct a data pipe (tf::DataPipe)
| Template parameters |
|---|
| Input |
| Output |
| C |
tf::make_data_pipe is a helper function to create a data pipe (tf::DataPipe) in a data-parallel pipeline (tf::DataPipeline). The first argument specifies the direction of the data pipe, either tf::PipeType::SERIAL or tf::PipeType::PARALLEL, and the second argument is a callable to invoke by the pipeline scheduler. Input and output data types are specified via template parameters, which will always be decayed by the library to its original form for storage purpose. The callable must take the input data type in its first argument and returns a value of the output data type.
tf::make\_data\_pipe\<int, std::string\>(tf::PipeType::SERIAL, [](int& input) {return std::to\_string(input + 100);});
The callable can additionally take a reference of tf::Pipeflow, which allows you to query the runtime information of a stage task, such as its line number and token number.
tf::make\_data\_pipe\<int, std::string\>(tf::PipeType::SERIAL, [](int& input, tf::Pipeflow& pf) {printf("token=%lu, line=%lu\n", pf.token(), pf.line());return std::to\_string(input + 100);});
creates a module task using the given graph
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| graph |
| Returns |
This example demonstrates how to create and launch multiple taskflows in parallel using modules with asynchronous tasking:
tf::Executor executor;tf::Taskflow A;tf::Taskflow B;tf::Taskflow C;tf::Taskflow D;A.emplace([](){ printf("Taskflow A\n"); }); B.emplace([](){ printf("Taskflow B\n"); }); C.emplace([](){ printf("Taskflow C\n"); }); D.emplace([](){ printf("Taskflow D\n"); }); // launch the four taskflows using asynchronous taskingexecutor.async(tf::make\_module\_task(A));executor.async(tf::make\_module\_task(B));executor.async(tf::make\_module\_task(C));executor.async(tf::make\_module\_task(D));executor.wait\_for\_all();
The module task maker, tf::make_module_task, is basically the same as tf::Taskflow::composed_of but provides a more generic interface that can be used beyond Taskflow. For instance, the following two approaches achieve the same functionality.
// approach 1: composition using composed\_oftf::Task m1 = taskflow1.composed\_of(taskflow2);// approach 2: composition using make\_module\_tasktf::Task m1 = taskflow1.emplace(tf::make\_module\_task(taskflow2));
allocates memory on the given device for holding N elements of type T
The function calls cudaMalloc to allocate N*sizeof(T) bytes of memory on the given device d and returns a pointer to the starting address of the device memory.
allocates memory on the current device associated with the caller
The function calls malloc_device from the current device associated with the caller.
allocates shared memory for holding N elements of type T
The function calls cudaMallocManaged to allocate N*sizeof(T) bytes of memory and returns a pointer to the starting address of the shared memory.
frees memory on the GPU device
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| ptr |
| d |
This methods call cudaFree to free the memory space pointed to by ptr using the given device context.
frees memory on the GPU device
| Template parameters |
|---|
| T |
| Parameters |
| --- |
| ptr |
This methods call cudaFree to free the memory space pointed to by ptr using the current device context of the caller.
copies data between host and device asynchronously through a stream
| Parameters |
|---|
| stream |
| dst |
| src |
| count |
The method calls cudaMemcpyAsync with the given stream using cudaMemcpyDefault to infer the memory space of the source and the destination pointers. The memory areas may not overlap.
initializes or sets GPU memory to the given value byte by byte
| Parameters |
|---|
| stream |
| devPtr |
| value |
| count |
The method calls cudaMemsetAsync with the given stream to fill the first count bytes of the memory area pointed to by devPtr with the constant byte value value.
Handles compatibility with CUDA <= 12.x and CUDA 13.
| Parameters |
|---|
| node |
| dependencies |
Handles compatibility with CUDA <= 12.x and CUDA 13.
| Parameters |
|---|
| node |
| dependent_nodes |
Handles compatibility with CUDA <= 12.x and CUDA 13.
| Parameters |
|---|
| graph |
| from |
| to |
| numDependencies |
queries the type of a native CUDA graph node
valid type values are:
queries the version information in a string format major.minor.patch
Release notes are available here: https://taskflow.github.io/taskflow/Releases.html
determines if the given type is a task parameter type
Task parameters can be specified in one of the following types:
determines if the given type has a member function Graph& graph()
| Template parameters |
|---|
| T |
This trait determines if the provided type T contains a member function with the exact signature tf::Graph& graph(). It uses SFINAE and std::void_t to detect the presence of the member function and its return type.
Example usage:
struct A {tf::Graph& graph() { return my\_graph; };tf::Graph my\_graph;// other custom members to alter my\_graph};struct C {}; // No graph functionstatic\_assert(has\_graph\_v\<A\>, "A has graph()");static\_assert(!has\_graph\_v\<C\>, "C does not have graph()");
determines if a callable is a static task
A static task is a callable object constructible from std::function<void()>.
determines if a callable is a subflow task
A subflow task is a callable object constructible from std::function<void(Subflow&)>.
determines if a callable is a runtime task
A runtime task is a callable object constructible from std::function<void(Runtime&)>.
determines if a callable is a condition task
A condition task is a callable object constructible from std::function<int()>.
determines if a callable is a multi-condition task
A multi-condition task is a callable object constructible from std::function<tf::SmallVector<int>()>.
determines if a type is a partitioner
A partitioner is a derived type from tf::PartitionerBase.