Back to Taskflow

Download

docs/release-3-6-0.html

4.1.013.8 KB
Original Source

| | Taskflow: A General-purpose Task-parallel Programming System |

Loading...

Searching...

No Matches

Release 3.6.0 (2023/05/07)

Taskflow 3.6.0 is the 7th release in the 3.x line! This release includes several new changes, such as dynamic task graph parallelism, improved parallel algorithms, modified GPU tasking interface, documentation, examples, and unit tests.

Download

Taskflow 3.6.0 can be downloaded from here.

System Requirements

To use Taskflow v3.6.0, you need a compiler that supports C++17:

  • GNU C++ Compiler at least v8.4 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xcode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler at least v19.0.1 with -std=c++17
  • Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Release Summary

This release contains several changes to largely enhance the programmability of GPU tasking and standard parallel algorithms. More importantly, we have introduced a new dependent asynchronous tasking model that offers great flexibility for expressing dynamic task graph parallelism.

New Features

Taskflow Core

cudaFlow

  • removed algorithms that require buffer from tf::cudaFlow due to update limitation
  • removed support for a dedicated cudaFlow task in Taskflow
    • all usage of tf::cudaFlow and tf::cudaFlowCapturer are standalone now

Utilities

  • Added all_same templates to check if a parameter pack has the same type

Taskflow Profiler (TFProf)

  • Removed cudaFlow and syclFlow tasks

Bug Fixes

  • Fixed the compilation error caused by clashing MAX_PRIORITY wtih winspool.h (#459)
  • Fixed the compilation error caused by tf::TaskView::for_each_successor and tf::TaskView::for_each_dependent
  • Fixed the infinite-loop bug when corunning a module task from tf::Runtime

If you encounter any potential bugs, please submit an issue at issue tracker.

Breaking Changes

  • Dropped support for cancelling asynchronous tasks

// previous - no longer supported

tf::Future<int> fu = executor.async({

return 1;

});

fu.cancel();

std::optional<int> res = fu.get(); // res may be std::nullopt or 1

// now - use std::future instead

std::future<int> fu = executor.async({

return 1;

});

int res = fu.get();

tf::Future

class to access the result of an execution

Definition taskflow.hpp:630

tf::Future::cancel

bool cancel()

cancels the execution of the running taskflow associated with this future object

Definition taskflow.hpp:721

  • Dropped in-place support for running tf::cudaFlow from a dedicated task

// previous - no longer supported

taskflow.emplace([](tf::cudaFlow& cf){

cf.offload();

});

// now - user to fully control tf::cudaFlow for maximum flexibility

taskflow.emplace({

tf::cudaFlow cf;

// offload the cudaflow asynchronously through a stream

tf::cudaStream stream;

cf.run(stream);

// wait for the cudaflow completes

stream.synchronize();

});

tf::cudaStreamBase::synchronize

cudaStreamBase & synchronize()

synchronizes the associated stream

Definition cuda_stream.hpp:232

tf::cudaStream

cudaStreamBase< cudaStreamCreator, cudaStreamDeleter > cudaStream

default smart pointer type to manage a cudaStream_t object with unique ownership

Definition cuda_stream.hpp:340

  • Dropped in-place support for running tf::cudaFlowCapturer from a dedicated task

// previous - now longer supported

taskflow.emplace([](tf::cudaFlowCapturer& cf){

cf.offload();

});

// now - user to fully control tf::cudaFlowCapturer for maximum flexibility

taskflow.emplace({

tf::cudaFlowCapturer cf;

// offload the cudaflow asynchronously through a stream

tf::cudaStream stream;

cf.run(stream);

// wait for the cudaflow completes

stream.synchronize();

});

tf::cudaStreamBase::run

cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)

runs the given executable CUDA graph

  • Dropped in-place support for running tf::syclFlow from a dedicated task

    • SYCL can just be used out of box together with Taskflow
  • Move all buffer query methods of CUDA standard algorithms inside execution policy

    • tf::cudaExecutionPolicy<NT, VT>::reduce_bufsz
    • tf::cudaExecutionPolicy<NT, VT>::scan_bufsz
    • tf::cudaExecutionPolicy<NT, VT>::merge_bufsz
    • tf::cudaExecutionPolicy<NT, VT>::min_element_bufsz
    • tf::cudaExecutionPolicy<NT, VT>::max_element_bufsz

// previous - no longer supported

tf::cuda_reduce_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);

// now (and similarly for other parallel algorithms)

tf::cudaDefaultExecutionPolicy policy(stream);

policy.reduce_bufsz<int>(N);

  • Renamed tf::Executor::run_and_wait to tf::Executor::corun for expressiveness
  • Renamed tf::Executor::loop_until to tf::Executor::corun_until for expressiveness
  • Renamed tf::Runtime::run_and_wait to tf::Runtime::corun for expressiveness
  • Disabled argument support for all asynchronous tasking features
    • users are responsible for creating their own wrapper to make the callable

// previous - async allows passing arguments to the callable

executor.async([](int i){ std::cout << i << std::endl; }, 4);

// now - users are responsible of wrapping the arumgnets into a callable

executor.async([i=4]( std::cout << i << std::endl; ){});

  • Replaced named_async with an overload that takes the name string on the first argument

// previous - explicitly calling named_async to assign a name to an async task

executor.named_async("name", {});

// now - overlaod

executor.async("name", {});

Documentation

Miscellaneous Items

We have published Taskflow in the following venues:

Please do not hesitate to contact Dr. Tsung-Wei Huang if you intend to collaborate with us on using Taskflow in your scientific computing projects.