hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/pi/package.html
This package consists of a map/reduce application, distbbp, which computes exact binary digits of the mathematical constant π. distbbp is designed for computing the nth bit of π, for large n, say n > 100,000,000. For computing the lower bits of π, consider using bbp.
The main class is DistBbp and the actually computation is done by DistSum jobs. The steps for launching the jobs are:
"The Bits of Pi"|
The table on the right are the results computed by distbbp.
Row 0 to Row 7
Row 8 to Row 14
The first part of Row 15 (6216B06)
The second part of Row 15 (D3611)
| |
"Pi in hex"| | Position n | π bits (in hex) starting at n |
| --- | --- | --- |
| 0 | 1 | 243F6A8885A3* |
| 1 | 11 | FDAA22168C23 |
| 2 | 101 | 3707344A409 |
| 3 | 1,001 | 574E69A458F |
| 4 | 10,001 | 44EC5716F2B |
| 5 | 100,001 | 944F7A204 |
| 6 | 1,000,001 | 6FFFA4103 |
| 7 | 10,000,001 | 6CFDD54E3 |
| 8 | 100,000,001 | A306CFA7 |
| 9 | 1,000,000,001 | 3E08FF2B |
| 10 | 10,000,000,001 | 0A8BD8C0 |
| 11 | 100,000,000,001 | B2238C1 |
| 12 | 1,000,000,000,001 | 0FEE563 |
| 13 | 10,000,000,000,001 | 896DC3 |
| 14 | 100,000,000,000,001 | C216EC |
| 15 | 1,000,000,000,000,001 | 6216B06 ... D3611 |
* By representing π in decimal, hexadecimal and binary, we have
"Pi in various formats"| π | = | 3.1415926535 8979323846 2643383279 ... |
| | = | 3.243F6A8885 A308D31319 8A2E037073 ... |
| | = | 11.0010010000 1111110110 1010100010 ... |
The first ten bits of π are 0010010000. |
The command line format is:
$ hadoop org.apache.hadoop.examples.pi.DistBbp \
<b> <nThreads> <nJobs> <type> <nPart> <remoteDir> <localDir>
And the parameters are:
"command line option"| <b> | The number of bits to skip, i.e. compute the (b+1)th position. | | <nThreads> | The number of working threads. | | <nJobs> | The number of jobs per sum. | | <type> | 'm' for map side job, 'r' for reduce side job, 'x' for mix type. | | <nPart> | The number of parts per job. | | <remoteDir> | Remote directory for submitting jobs. | | <localDir> | Local directory for storing output files. |
Note that it may take a long time to finish all the jobs when <b> is large. If the program is killed in the middle of the execution, the same command with a different <remoteDir> can be used to resume the execution. For example, suppose we use the following command to compute the (10^15+57)th bit of π.
$ hadoop org.apache.hadoop.examples.pi.DistBbp \
1,000,000,000,000,056 20 1000 x 500 remote/a local/output
It uses 20 threads to summit jobs so that there are at most 20 concurrent jobs. Each sum (there are totally 14 sums) is partitioned into 1000 jobs. The jobs will be executed in map-side or reduce-side. Each job has 500 parts. The remote directory for the jobs is remote/a and the local directory for storing output is local/output. Depends on the cluster configuration, it may take many days to finish the entire execution. If the execution is killed, we may resume it by
$ hadoop org.apache.hadoop.examples.pi.DistBbp \
1,000,000,000,000,056 20 1000 x 500 remote/b local/output