package patience_diff

  1. Overview
  2. Docs

This is a port of Bram Cohen's patience diff algorithm, as found in the Bazaar 1.14.1 source code, available at http://bazaar-vcs.org.

This copyright notice was included:

# Copyright (C) 2005 Bram Cohen, Copyright (C) 2005, 2006 Canonical Ltd # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Bram Cohen's comment from the original Python code (with syntax changed to OCaml):

get_matching_blocks a b returns a list of triples describing matching subsequences.

Each triple is of the form (i, j, n), and means that a <|> (i,i+n) = b <|> (j,j+n). The triples are monotonically increasing in i and in j.

The last triple is a dummy, (Array.length a, Array.length b, 0), and is the only triple with n=0.

Example: get_matching_blocks |"a";"b";"x";"c";"d"| |"a";"b";"c";"d"| returns (0, 0, 2), (3, 2, 2), (5, 4, 0)

module Matching_block : sig ... end

For handling diffs abstractly. A range is a subarray of the two original arrays with a constructor defining its relationship to the two original arrays. A Same range contains a series of elements which can be found in both arrays. A Next range contains elements found only in the second array, while an Prev range contains elements found only in the first array.

A Replace contains two arrays: elements in the first output array are elements found only in the first input array, which have been replaced by elements in the second output array, which are elements found only in the second input array.

module Range : sig ... end
module Hunk : sig ... end

In diff terms, a hunk is a unit of consecutive ranges with some Same context before and after Next, Prev, and Replace ranges. Each hunk contains information about the original arrays, specifically the starting indexes and the number of elements in both arrays to which the hunk refers.

module Hunks : sig ... end
module type S = sig ... end
module Make (Elt : Core_kernel.Hashtbl.Key) : S with type elt = Elt.t
module String : S with type elt = string
module Stable : sig ... end