en/docs/chapter_divide_and_conquer/divide_and_conquer.md
<u>Divide and conquer</u> is a very important and common algorithm strategy. Divide and conquer is typically implemented based on recursion, consisting of two steps: "divide" and "conquer".
As shown in the figure below, "merge sort" is one of the typical applications of the divide and conquer strategy.
Whether a problem is suitable for solving with divide and conquer can usually be determined based on the following criteria.
Clearly, merge sort satisfies these three criteria.
Divide and conquer can not only effectively solve algorithmic problems but often also improve algorithm efficiency. In sorting algorithms, quick sort, merge sort, and heap sort are faster than selection, bubble, and insertion sort because they apply the divide and conquer strategy.
This raises the question: Why can divide and conquer improve algorithm efficiency, and what is the underlying logic? In other words, why is dividing a large problem into multiple subproblems, solving the subproblems, and merging their solutions more efficient than directly solving the original problem? This question can be discussed from two aspects: operation count and parallel computation.
Taking "bubble sort" as an example, processing an array of length $n$ requires $O(n^2)$ time. Suppose we divide the array into two subarrays from the midpoint as shown in the figure below, the division requires $O(n)$ time, sorting each subarray requires $O((n / 2)^2)$ time, and merging the two subarrays requires $O(n)$ time, resulting in an overall time complexity of:
$$ O(n + (\frac{n}{2})^2 \times 2 + n) = O(\frac{n^2}{2} + 2n) $$
Next, we compute the following inequality, where the left and right sides represent the total number of operations before and after division, respectively:
$$ \begin{aligned} n^2 & > \frac{n^2}{2} + 2n \newline n^2 - \frac{n^2}{2} - 2n & > 0 \newline n(n - 4) & > 0 \end{aligned} $$
This means that when $n > 4$, the number of operations after division is smaller, and sorting efficiency should be higher. Note that the time complexity after division is still quadratic $O(n^2)$, but the constant term in the complexity has become smaller.
Going further, what if we continuously divide the subarrays from their midpoints into two subarrays until the subarrays have only one element? This approach is actually "merge sort", with a time complexity of $O(n \log n)$.
Thinking further, what if we set multiple division points and evenly divide the original array into $k$ subarrays? This situation is very similar to "bucket sort", which is well-suited for sorting massive amounts of data, with a theoretical time complexity of $O(n + k)$.
We know that the subproblems generated by divide and conquer are independent of each other, so they can typically be solved in parallel. This means divide and conquer can not only reduce the time complexity of algorithms, but also benefits from parallel optimization by operating systems.
Parallel optimization is particularly effective in multi-core or multi-processor environments, as the system can simultaneously handle multiple subproblems, making fuller use of computing resources and significantly reducing overall runtime.
For example, in the "bucket sort" shown in the figure below, we evenly distribute massive data into various buckets, and the sorting tasks for all buckets can be distributed to various computing units. After completion, the results are merged.
On one hand, divide and conquer can be used to solve many classic algorithmic problems.
On the other hand, divide and conquer is widely applied in the design of algorithms and data structures.
It can be seen that divide and conquer is a "subtly pervasive" algorithmic idea, embedded in various algorithms and data structures.