| | Taskflow: A General-purpose Task-parallel Programming System |

Searching...

No Matches

Release 3.6.0 (2023/05/07)

Taskflow 3.6.0 is the 7th release in the 3.x line! This release includes several new changes, such as dynamic task graph parallelism, improved parallel algorithms, modified GPU tasking interface, documentation, examples, and unit tests.

Download

Taskflow 3.6.0 can be downloaded from here.

System Requirements

To use Taskflow v3.6.0, you need a compiler that supports C++17:

GNU C++ Compiler at least v8.4 with -std=c++17
Clang C++ Compiler at least v6.0 with -std=c++17
Microsoft Visual Studio at least v19.27 with /std:c++17
AppleClang Xcode Version at least v12.0 with -std=c++17
Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
Intel C++ Compiler at least v19.0.1 with -std=c++17
Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Release Summary

This release contains several changes to largely enhance the programmability of GPU tasking and standard parallel algorithms. More importantly, we have introduced a new dependent asynchronous tasking model that offers great flexibility for expressing dynamic task graph parallelism.

New Features

Taskflow Core

Added new async methods to support dynamic task graph creation
Added new async and join methods to tf::Runtime
Added a new partitioner interface to optimize parallel algorithms
Added parallel-scan algorithms to Taskflow
Added parallel-find algorithms to Taskflow
- tf::Taskflow::find_if(B first, E last, T& result, UOP predicate, P&& part)
- tf::Taskflow::find_if_not(B first, E last, T& result, UOP predicate, P&& part)
- tf::Taskflow::min_element(B first, E last, T& result, C comp, P&& part)
- tf::Taskflow::max_element(B first, E last, T& result, C comp, P&& part)
Modified tf::Subflow as a derived class from tf::Runtime
Extended parallel algorithms to support different partitioning algorithms
- tf::Taskflow::for_each_index(B first, E last, S step, C callable, P&& part)
- tf::Taskflow::for_each(B first, E last, C callable, P&& part)
- tf::Taskflow::transform(B first1, E last1, O d_first, C c, P&& part)
- tf::Taskflow::transform(B1 first1, E1 last1, B2 first2, O d_first, C c, P&& part)
- tf::Taskflow::reduce(B first, E last, T& result, O bop, P&& part)
- tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part)
Improved the performance of tf::Taskflow::sort for plain-old-data (POD) type

cudaFlow

removed algorithms that require buffer from tf::cudaFlow due to update limitation
removed support for a dedicated cudaFlow task in Taskflow
- all usage of tf::cudaFlow and tf::cudaFlowCapturer are standalone now

Utilities

Added all_same templates to check if a parameter pack has the same type

Taskflow Profiler (TFProf)

Removed cudaFlow and syclFlow tasks

Bug Fixes

Fixed the compilation error caused by clashing MAX_PRIORITY wtih winspool.h (#459)
Fixed the compilation error caused by tf::TaskView::for_each_successor and tf::TaskView::for_each_dependent
Fixed the infinite-loop bug when corunning a module task from tf::Runtime

If you encounter any potential bugs, please submit an issue at issue tracker.

Breaking Changes

Dropped support for cancelling asynchronous tasks

// previous - no longer supported

tf::Future<int> fu = executor.async({

return 1;

});

fu.cancel();

std::optional<int> res = fu.get(); // res may be std::nullopt or 1

// now - use std::future instead

std::future<int> fu = executor.async({

return 1;

});

int res = fu.get();

tf::Future

class to access the result of an execution

Definition taskflow.hpp:630

tf::Future::cancel

bool cancel()

cancels the execution of the running taskflow associated with this future object

Definition taskflow.hpp:721

Dropped in-place support for running tf::cudaFlow from a dedicated task

// previous - no longer supported

taskflow.emplace([](tf::cudaFlow& cf){

cf.offload();

});

// now - user to fully control tf::cudaFlow for maximum flexibility

taskflow.emplace({

tf::cudaFlow cf;

// offload the cudaflow asynchronously through a stream

tf::cudaStream stream;

cf.run(stream);

// wait for the cudaflow completes

stream.synchronize();

});

tf::cudaStreamBase::synchronize

cudaStreamBase & synchronize()

synchronizes the associated stream

Definition cuda_stream.hpp:232

tf::cudaStream

cudaStreamBase< cudaStreamCreator, cudaStreamDeleter > cudaStream

default smart pointer type to manage a cudaStream_t object with unique ownership

Definition cuda_stream.hpp:340

Dropped in-place support for running tf::cudaFlowCapturer from a dedicated task

// previous - now longer supported

taskflow.emplace([](tf::cudaFlowCapturer& cf){

cf.offload();

});

// now - user to fully control tf::cudaFlowCapturer for maximum flexibility

taskflow.emplace({

tf::cudaFlowCapturer cf;

// offload the cudaflow asynchronously through a stream

tf::cudaStream stream;

cf.run(stream);

// wait for the cudaflow completes

stream.synchronize();

});

tf::cudaStreamBase::run

cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)

runs the given executable CUDA graph

Dropped in-place support for running tf::syclFlow from a dedicated task
- SYCL can just be used out of box together with Taskflow
Move all buffer query methods of CUDA standard algorithms inside execution policy
- tf::cudaExecutionPolicy<NT, VT>::reduce_bufsz
- tf::cudaExecutionPolicy<NT, VT>::scan_bufsz
- tf::cudaExecutionPolicy<NT, VT>::merge_bufsz
- tf::cudaExecutionPolicy<NT, VT>::min_element_bufsz
- tf::cudaExecutionPolicy<NT, VT>::max_element_bufsz

// previous - no longer supported

tf::cuda_reduce_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);

// now (and similarly for other parallel algorithms)

tf::cudaDefaultExecutionPolicy policy(stream);

policy.reduce_bufsz<int>(N);

Renamed tf::Executor::run_and_wait to tf::Executor::corun for expressiveness
Renamed tf::Executor::loop_until to tf::Executor::corun_until for expressiveness
Renamed tf::Runtime::run_and_wait to tf::Runtime::corun for expressiveness
Disabled argument support for all asynchronous tasking features
- users are responsible for creating their own wrapper to make the callable

// previous - async allows passing arguments to the callable

executor.async([](int i){ std::cout << i << std::endl; }, 4);

// now - users are responsible of wrapping the arumgnets into a callable

executor.async([i=4]( std::cout << i << std::endl; ){});

Replaced named_async with an overload that takes the name string on the first argument

// previous - explicitly calling named_async to assign a name to an async task

executor.named_async("name", {});

// now - overlaod

executor.async("name", {});

Documentation

Revised Request Cancellation to remove support of cancelling async tasks
Revised Asynchronous Tasking to include asynchronous tasking from tf::Runtime
- Launch Async Tasks from a Runtime
Revised Taskflow algorithms to include execution policy
Added Parallel Scan
Added Asynchronous Tasking with Dependencies

Miscellaneous Items

We have published Taskflow in the following venues:

Dian-Lun Lin, Yanqing Zhang, Haoxing Ren, Shih-Hsin Wang, Brucek Khailany and Tsung-Wei Huang, "GenFuzz: GPU-accelerated Hardware Fuzzing using Genetic Algorithm with Multiple Inputs," ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, 2023
Tsung-Wei Huang, "qTask: Task-parallel Quantum Circuit Simulation with Incrementality," IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, Florida, 2023
Elmir Dzaka, Dian-Lun Lin, and Tsung-Wei Huang, "Parallel And-Inverter Graph Simulation Using a Task-graph Computing System," IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), St. Petersburg, Florida, 2023

Please do not hesitate to contact Dr. Tsung-Wei Huang if you intend to collaborate with us on using Taskflow in your scientific computing projects.

Release Notes
Maintained by Dr. Tsung-Wei Huang — Generated by 1.13.1