Back to Difftastic

Line Diffing

manual/src/line_diffing.md

0.69.03.4 KB
Original Source

Line Diffing

Most diff tools today use LCS algorithms. They typically apply to lines, but in some cases they're word-based.

Myers' diff algorithm

This is the default diff algorithm in GNU diff and git diff. It finds the longest common subsequence (LCS) and is used on a line-by-line basis.

There's a great introduction here and the original paper is An O(ND) Difference Algorithm and Its Variations, Myers 1986.

# Modern diff supports colour, but see also
# https://www.colordiff.org/
$ diff --color=always -u sample_files/css_1.css sample_files/css_2.css

Note that GNU diff originally used the Hunt-McIlroy algorithm.

Patience Diff

Myers' diff has a problem with sliders:

diff
 if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {
                        $smtp_server = $_;

Instead of:

diff
+if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
 if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {

Git has a --indent-heuristic that was added to reduce the likelihood of making a bad choice. There's a corpus of test files where the ideal diff has been chosen by a human, to test different heuristics.

The patience diff algorithm is an LCS algorithm that aims to do a better job with sliders. It produces great results by doing more work.

# Original behaviour
$ git diff --no-indent-heuristic --no-index sample_files/css_1.css sample_files/css_2.css
# As of git 2.11, this heuristic is enabled by default.
$ git diff --indent-heuristic --no-index sample_files/css_1.css sample_files/css_2.css
# Patience algorithm does a better a job in this example.
$ git diff --patience --no-index sample_files/css_1.css sample_files/css_2.css

Diff Match Patch also has some excellent discussions of diff designs on the author's website (e.g diff strategies).

Histogram Diff

Git 1.7.7+ also has a histogram algorithm, which aims to produce better results than Myers' algorithm but without the slowdown of the patience algorithm.

# Inferior to patience on this example file.
$ git diff --histogram --no-index sample_files/css_1.css sample_files/css_2.css

prettydiff

prettydiff does really well out of the box with the sample files here. It implements LCS on words.

diff-so-fancy

diff-so-fancy consumes normal diff output, so it's line based. It also performs word highlighting within lines, and generally has a prettier set of defaults.