Back to Taskflow

Problem Formulation

docs/ExamplesFibonacciNumber.html

4.1.09.0 KB
Original Source

| | Taskflow: A General-purpose Task-parallel Programming System |

Loading...

Searching...

No Matches

Fibonacci Number

We study recursive task parallelism using the Fibonacci sequence as a concrete example, demonstrating how tf::Runtime enables dynamic task creation and cooperative synchronisation without blocking worker threads.

Problem Formulation

The Fibonacci sequence is defined such that each number is the sum of the two preceding ones, starting from 0 and 1:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

The natural recursive implementation is:

int fibonacci(int n) {

if(n < 2) return n;

return fibonacci(n-1) + fibonacci(n-2);

}

The two recursive calls are mutually independent and can therefore run in parallel. The challenge is to express this recursive parallelism efficiently — spawning tasks that themselves spawn tasks — without blocking worker threads or creating unbounded overhead.

Recursive Fibonacci with Runtime Tasking

A runtime task accepts a tf::Runtime& parameter, giving it a live handle to the scheduling runtime. We use tf::Runtime::silent_async to spawn both recursive branches in parallel and tf::Runtime::corun to wait for them cooperatively. corun does not block the calling worker — it participates in the work-stealing loop while waiting, so other tasks can run on the same thread:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

if(N < 2) return N;

size_t res1, res2;

// spawn both branches as async runtime tasks

rt.silent_async([N, &res1](tf::Runtime& rt1) {

res1 = fibonacci(N-1, rt1);

});

rt.silent_async([N, &res2](tf::Runtime& rt2) {

res2 = fibonacci(N-2, rt2);

});

// cooperatively wait for both branches without blocking the worker

rt.corun();

return res1 + res2;

}

int main() {

tf::Executor executor;

size_t N = 30, res;

executor.silent_async([N, &res](tf::Runtime& rt) {

res = fibonacci(N, rt);

});

executor.wait_for_all();

std::cout << N << "-th Fibonacci number is " << res << '\n';

return 0;

}

tf::Executor::silent_async

void silent_async(P &&params, F &&func)

similar to tf::Executor::async but does not return a future object

tf::Executor::wait_for_all

void wait_for_all()

waits for all tasks to complete

tf::Runtime

class to create a runtime task

Definition runtime.hpp:47

tf::Runtime::silent_async

void silent_async(F &&f)

runs the given function asynchronously without returning any future object

Definition runtime.hpp:671

tf::Runtime::corun

void corun()

corun all tasks spawned by this runtime with other workers

Definition runtime.hpp:646

Each call to fibonacci recursively spawns two async tasks and then waits cooperatively for both to finish. The executor distributes tasks across all available workers using work-stealing, achieving parallelism at every level of the recursion tree. The execution diagram for fibonacci(4) is shown below. The suffixes _1 and _2 denote the left and right children spawned by their parent runtime:

Embedded content

Tail Recursion Optimisation

Spawning both branches asynchronously doubles the number of async tasks created: one branch is always available for inline computation in the current execution context, so spawning it asynchronously only adds scheduling overhead without increasing parallelism. The standard optimisation is to compute one branch inline (directly in the current context) and spawn only the other asynchronously. This halves the task count, reduces stack pressure, and cuts scheduling overhead at every level of recursion:

size_t fibonacci(size_t N, tf::Runtime& rt) {

if(N < 2) return N;

size_t res1, res2;

// spawn only the left branch asynchronously

rt.silent_async([N, &res1](tf::Runtime& rt1) {

res1 = fibonacci(N-1, rt1);

});

// compute the right branch inline in the current context

res2 = fibonacci(N-2, rt);

// cooperatively wait for the left branch

rt.corun();

return res1 + res2;

}

The optimised execution diagram for fibonacci(4) is shown below. The right branch at each level has been eliminated, replaced by inline computation:

Embedded content

Benchmarking

The table below compares wall-clock runtime of the two implementations across different values of N on a multi-core machine:

Nwith tail optimisationwithout tail optimisation
200.23 ms0.31 ms
252 ms4 ms
3023 ms42 ms
35269 ms483 ms
403003 ms5124 ms

The performance gap widens as N increases because the number of tasks created grows exponentially. At N = 40, tail optimisation cuts runtime by over 40% by eliminating half the task-creation and scheduling overhead at every recursive level.

Recursive Fibonacci with Task Group

tf::TaskGroup offers a lighter-weight alternative to tf::Runtime for recursive parallelism from any calling context — no need for the caller itself to be a runtime task. The interface is the same: spawn sub-tasks with silent_async, then call corun to wait cooperatively.

tf::Executor executor;

std::function<size_t(size_t)> fibonacci = [&](size_t N) -> size_t {

if(N < 2) return N;

size_t res1, res2;

tf::TaskGroup tg = executor.task_group();

// spawn left branch; compute right branch inline (tail optimisation)

tg.silent_async(N, &res1, &fibonacci {

res1 = fibonacci(N-1);

});

res2 = fibonacci(N-2);

tg.corun(); // cooperatively wait for the left branch

return res1 + res2;

};

int main() {

size_t N = 30;

size_t res = executor.async(& { return fibonacci(N); }).get();

std::cout << N << "-th Fibonacci number is " << res << '\n';

return 0;

}

tf::Executor

class to create an executor

Definition executor.hpp:62

tf::Executor::task_group

TaskGroup task_group()

creates a task group that executes a collection of asynchronous tasks

Definition task_group.hpp:875

tf::Executor::async

auto async(P &&params, F &&func)

creates a parameterized asynchronous task to run the given function

tf::TaskGroup

class to create a task group from a task

Definition task_group.hpp:61

tf::TaskGroup::corun

void corun()

corun all tasks spawned by this task group with other workers

Definition task_group.hpp:725

tf::TaskGroup::silent_async

void silent_async(F &&f)

runs the given function asynchronously without returning any future object

Definition task_group.hpp:756

NotePrefer tf::Runtime when you are already inside a runtime task, as it avoids the construction overhead of a task group object. Use tf::TaskGroup when the recursive function is called from a context that does not hold a tf::Runtime reference.