3rd-party/tbb/examples/task/tree_sum/readme.html
This directory contains a simple example that sums values in a tree.
The example exhibits some speedup, but not a lot, because it quickly saturates the system bus on a multiprocessor. For good speedup, there needs to be more computation cycles per memory reference. The point of the example is to teach how to use the raw task interface, so the computation is deliberately trivial.
The performance of this example is better when objects are allocated by the scalable_allocator instead of the default "operator new". The reason is that the scalable_allocator typically packs small objects more tightly than the default "operator new", resulting in a smaller memory footprint, and thus more efficient use of cache and virtual memory. In addition, the scalable_allocator performs better for multi-threaded allocations.
System Requirements
For the most up to date system requirements, see the release notes.
Files
SerialSumTree.cppSums sequentially. SimpleParallelSumTree.cppSums in parallel without any fancy tricks. OptimizedParallelSumTree.cppSums in parallel, using "recycling" and "continuation-passing" tricks. In this case, it is only slightly faster than the simple version. common.hShared declarations. main.cppMain program which parses command line options and runs the algorithm. MakefileMakefile for building the example.
Directories
msvsContains Microsoft* Visual Studio* workspace for building and running the example (Windows* systems only). xcodeContains Xcode* IDE workspace for building and running the example (macOS* systems only).
For information about the minimum supported version of IDE, see release notes.
Build instructions
General build directions can be found here.
Usage
tree_sum _-h_Prints the help for command line options tree_sum [n-of-threads=value] [number-of-nodes=value] [silent] [stdmalloc]tree_sum [n-of-threads [number-of-nodes]] [silent] [stdmalloc]n-of-threads is the number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the default.
number-of-nodes is the number of nodes in the tree.
silent - no output except elapsed time.
stdmalloc - causes the default "operator new" to be used for memory allocations instead of the scalable_allocator.
To run a short version of this example, e.g., for use with Intel® Parallel Inspector: Build a debug version of the example (see the build instructions).
Run it with a small problem size and the desired number of threads, e.g., tree_sum 4 100000.
Legal Information
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
© 2020, Intel Corporation