docs/release-3-10-0.html
| | Taskflow: A General-purpose Task-parallel Programming System |
Loading...
Searching...
No Matches
Release 3.10.0 (2025/05/01)
This release improves scheduling performance through optimized work-stealing threshold tuning and a constrained decentralized buffer. It also introduces index-range-based parallel-for and parallel-reduction algorithms and modifies subflow tasking behavior to significantly enhance the performance of recursive parallelism.
Taskflow 3.10.0 can be downloaded from here.
To use Taskflow v3.10.0, you need a compiler that supports C++17:
Taskflow works on Linux, Windows, and Mac OS X.
AttentionAlthough Taskflow supports primarily C++17, you can enable C++20 compilation through -std=c++20 to achieve better performance due to new C++20 features.
// initialize data1 and data2 to 10 using two different approaches
std::vector<int> data1(100), data2(100);
// Approach 1: initialize data1 using explicit index range
taskflow.for_each_index(0, 100, 1, [&](int i){ data1[i] = 10; });
// Approach 2: initialize data2 using tf::IndexRange
tf::IndexRange<int> range(0, 100, 1);
taskflow.for_each_by_index(range, [&](tf::IndexRange<int>& subrange){
for(int i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) {
data2[i] = 10;
}
});
T end() const
queries the ending index of the range (only available when N == 1)
Definition iterator.hpp:358
T begin() const
queries the starting index of the range (only available when N == 1)
Definition iterator.hpp:346
T step_size() const
queries the step size of the range (only available when N == 1)
Definition iterator.hpp:370
IndexRanges< T, 1 > IndexRange
alias for the common 1D case of tf::IndexRanges
Definition iterator.hpp:971
std::vector<double> data(100000);
double res = 1.0;
taskflow.reduce_by_index(
// index range
tf::IndexRange<size_t>(0, N, 1),
// final result
res,
// local reducer
[&](tf::IndexRange<size_t> subrange, std::optional<double> running_total) {
double residual = running_total ? *running_total : 0.0;
for(size_t i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) {
data[i] = 1.0;
residual += data[i];
}
printf("partial sum = %lf\n", residual);
return residual;
},
// global reducer
std::plus<double>()
);
static keyword to the executor creation in taskflow benchmarksnum_empty_steals (#681)corrected the terminology by replacing 'dependents' with 'predecessors'
disabled the support for tf::Subflow::detach due to multiple intricate and unresolved issues:
changed the default behavior of tf::Subflow to no longer retain its task graph after join
tf::Taskflow taskflow;
tf::Executor executor;
taskflow.emplace([&](tf::Subflow& sf){
sf.retain(true); // retain the subflow after join for visualization
auto A = sf.emplace({ std::cout << "A\n"; });
auto B = sf.emplace({ std::cout << "B\n"; });
auto C = sf.emplace({ std::cout << "C\n"; });
A.precede(B, C); // A runs before B and C
}); // subflow implicitly joins here
executor.run(taskflow).wait();
// The subflow graph is now retained and can be visualized using taskflow.dump(...)
taskflow.dump(std::cout);
class to create an executor
Definition executor.hpp:62
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:1571
class to construct a subflow graph from the execution of a dynamic task
Definition flow_builder.hpp:1735
void retain(bool flag) noexcept
specifies whether to keep the subflow after it is joined
Definition flow_builder.hpp:1844
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:1258
class to create a taskflow object
Definition taskflow.hpp:64
void dump(std::ostream &ostream) const
dumps the taskflow to a DOT format through a std::ostream target
Definition taskflow.hpp:433
// programming tf::cudaGraph is consistent with Nvidia CUDA Graph but offers a simpler
// and more intuitive interface by abstracting away low-level CUDA Graph boilerplate.
tf::cudaGraph cg;
cg.kernel(...); // same as cudaFlow/cudaFlowCapturer
// unlike cudaFlow/cudaFlowCapturer, you need to explicitly instantiate an executable
// CUDA graph now and submit it to a stream for execution
tf::cudaGraphExec exec(cg);
tf::cudaStream stream;
stream.run(exec).synchronize();
cudaTask kernel(dim3 g, dim3 b, size_t s, F f, ArgsT... args)
creates a kernel task
Definition cuda_graph.hpp:1010
tf::cudaStreamBase::synchronize
cudaStreamBase & synchronize()
synchronizes the associated stream
Definition cuda_stream.hpp:232
cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)
runs the given executable CUDA graph
cudaGraphExecBase< cudaGraphExecCreator, cudaGraphExecDeleter > cudaGraphExec
default smart pointer type to manage a cudaGraphExec_t object with unique ownership
Definition cudaflow.hpp:23
cudaGraphBase< cudaGraphCreator, cudaGraphDeleter > cudaGraph
default smart pointer type to manage a cudaGraph_t object with unique ownership
Definition cudaflow.hpp:18
cudaStreamBase< cudaStreamCreator, cudaStreamDeleter > cudaStream
default smart pointer type to manage a cudaStream_t object with unique ownership
Definition cuda_stream.hpp:340
If you are interested in collaborating with us on applying Taskflow to your projects, please feel free to reach out to Dr. Tsung-Wei Huang!