code/string_algorithms/src/levenshtein_distance/README.md
The Levenshtein distance between two strings measures the similarity between them by calculating the minimum number of one-character insertions, deletions, or substitutions required to change one string into another. This edit distance is often used in spell chekers, plagiarism detectors, and OCR.
For example, each of these examples have a Levensthein distance of 1 with cosmos:
Recursively solving this problem is simple, but computationally very inefficient because it recalculates the distance between substrings it has already covered.
Function LevDistance(String S, String T):
S) or (length of T) is 0, return the maximum of (length of S) and (length of T)
S without the last character, T)S, T without last character)S without the last character, T without the last character)This algorithm is more efficient than the recursive one because the matrix stores the distance calculations for substrings, saving computational time.
Function LevDistance(String S, String T):
SL = length of S and TL = length of Tmatrix (2d array) of height SL and width TL0 to TL0 to SLs of S (from i = 1 to SL):
t of T (from j = 1 to TL):
s equals t, the cost is 0s does not equal t, the cost is 1matrix[i][j] to the minimum of:
matrix[i - 1, j] + 1 (this is for deletion)matrix[i, j - 1] + 1 (this is for insertion)matrix[i - 1, j - 1] + cost (this is for substitution)matrix[SL, TL]Each bottom-up dynamic program works by setting each cell's value to the minimum edit distance between the substrings of S and T.
This is the final Levenshtein distance matrix for the strings "cosmos" and "catmouse". The minimum edit distance is the value in the bottom right corner: 4.
| c | a | t | m | o | u | s | e | ||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| c | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| o | 2 | 1 | 1 | 2 | 3 | 3 | 4 | 5 | 6 |
| s | 3 | 2 | 2 | 2 | 3 | 4 | 4 | 4 | 5 |
| m | 4 | 3 | 3 | 3 | 2 | 3 | 4 | 5 | 5 |
| o | 5 | 4 | 4 | 4 | 3 | 2 | 3 | 4 | 5 |
| s | 6 | 5 | 5 | 5 | 4 | 3 | 3 | 3 | 4 |