manual/src/line_diffing.md
Most diff tools today use LCS algorithms. They typically apply to lines, but in some cases they're word-based.
This is the default diff algorithm in GNU diff and git diff. It finds the longest common subsequence (LCS) and is used on a line-by-line basis.
There's a great introduction here and the original paper is An O(ND) Difference Algorithm and Its Variations, Myers 1986.
# Modern diff supports colour, but see also
# https://www.colordiff.org/
$ diff --color=always -u sample_files/css_1.css sample_files/css_2.css
Note that GNU diff originally used the Hunt-McIlroy algorithm.
Myers' diff has a problem with sliders:
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
Instead of:
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
Git has a --indent-heuristic that was added to reduce the
likelihood of making a bad
choice. There's
a corpus of test files
where the ideal diff has been chosen by a human, to test different
heuristics.
The patience diff algorithm is an LCS algorithm that aims to do a better job with sliders. It produces great results by doing more work.
# Original behaviour
$ git diff --no-indent-heuristic --no-index sample_files/css_1.css sample_files/css_2.css
# As of git 2.11, this heuristic is enabled by default.
$ git diff --indent-heuristic --no-index sample_files/css_1.css sample_files/css_2.css
# Patience algorithm does a better a job in this example.
$ git diff --patience --no-index sample_files/css_1.css sample_files/css_2.css
Diff Match Patch also has some excellent discussions of diff designs on the author's website (e.g diff strategies).
Git 1.7.7+ also has a histogram algorithm, which aims to produce better results than Myers' algorithm but without the slowdown of the patience algorithm.
# Inferior to patience on this example file.
$ git diff --histogram --no-index sample_files/css_1.css sample_files/css_2.css
prettydiff does really well out of the box with the sample files here. It implements LCS on words.
diff-so-fancy consumes
normal diff output, so it's line based. It also performs word
highlighting within lines, and generally has a prettier set of
defaults.