docs/release-3-10-0.html
This release improves scheduling performance through optimized work-stealing threshold tuning and a constrained decentralized buffer. It also introduces index-range-based parallel-for and parallel-reduction algorithms and modifies subflow tasking behavior to significantly enhance the performance of recursive parallelism.
Taskflow 3.10.0 can be downloaded from here.
To use Taskflow v3.10.0, you need a compiler that supports C++17:
Taskflow works on Linux, Windows, and Mac OS X.
// initialize data1 and data2 to 10 using two different approachesstd::vector\<int\> data1(100), data2(100);// Approach 1: initialize data1 using explicit index rangetaskflow.for\_each\_index(0, 100, 1, [&](int i){ data1[i] = 10; });// Approach 2: initialize data2 using tf::IndexRangetf::IndexRange\<int\> range(0, 100, 1);taskflow.for\_each\_by\_index(range, [&](tf::IndexRange\<int\>& subrange){for(int i=subrange.begin(); i\<subrange.end(); i+=subrange.step\_size()) {data2[i] = 10;}});
std::vector\<double\> data(100000);double res = 1.0;taskflow.reduce\_by\_index(// index rangetf::IndexRange\<size\_t\>(0, N, 1),// final resultres,// local reducer[&](tf::IndexRange\<size\_t\> subrange, std::optional\<double\> running\_total) { double residual = running\_total ? \*running\_total : 0.0;for(size\_t i=subrange.begin(); i\<subrange.end(); i+=subrange.step\_size()) {data[i] = 1.0;residual += data[i];}printf("partial sum = %lf\n", residual);return residual;},// global reducerstd::plus\<double\>());
static keyword to the executor creation in taskflow benchmarksnum_empty_steals (#681)corrected the terminology by replacing 'dependents' with 'predecessors'
disabled the support for tf::Subflow::detach due to multiple intricate and unresolved issues:
changed the default behavior of tf::Subflow to no longer retain its task graph after join
tf::Taskflow taskflow;tf::Executor executor;taskflow.emplace([&](tf::Subflow& sf){sf.retain(true);// retain the subflow after join for visualizationauto A = sf.emplace([](){ std::cout \<\< "A\n"; });auto B = sf.emplace([](){ std::cout \<\< "B\n"; });auto C = sf.emplace([](){ std::cout \<\< "C\n"; });A.precede(B, C);// A runs before B and C});// subflow implicitly joins hereexecutor.run(taskflow).wait();// The subflow graph is now retained and can be visualized using taskflow.dump(...)taskflow.dump(std::cout);
// programming tf::cudaGraph is consistent with Nvidia CUDA Graph but offers a simpler // and more intuitive interface by abstracting away low-level CUDA Graph boilerplate.tf::cudaGraph cg;cg.kernel(...);// same as cudaFlow/cudaFlowCapturer// unlike cudaFlow/cudaFlowCapturer, you need to explicitly instantiate an executable // CUDA graph now and submit it to a stream for executiontf::cudaGraphExec exec(cg);tf::cudaStream stream;stream.run(exec).synchronize();
If you are interested in collaborating with us on applying Taskflow to your projects, please feel free to reach out to Dr. Tsung-Wei Huang!