package mula

  1. Overview
  2. Docs

Universal Levenshtein Automata

The paper by Touzet details the Universal Levenshtein Automata.

Some nice computational properties of the (not-deterministic) automata include:

  • There are no epsilon transitions.
  • The automata are computable on demand, there is no need to store states and transitions in a data structure.
  • The states of the automata carry error counts.
  • There is a simple subsumption relation that prunes sets of states, so transitions produce small sets of states.

These allow for efficient implementation together with interesting uses.

  1. We not only know if two strings are within a certain edit distance, but we also know what the edit distance is if they are within the edit distance limit.
  2. If several strings are compared, we can rank them by their edit distance.
  3. We can lazily feed characters and string fragments to the automata and get the current error count.

This module offers a functor to build matchers for different representations of strings and characters. For a prebuilt matcher for OCaml strings and characters, see Strings.

String Abstraction

We abstract over strings and characters so that we do not rely on any specific encoding. We only need the following:

  • a function to compute length of strings,
  • a function to fetch a character at an index, and
  • a function to check if two characters are equal.
module type S = sig ... end

Levenshtein Automata

Given a string representation, we produce two Automata:

  • Make.Lev, for the standard Levenshtein distance.
  • Make.Dem, for the Demarau-Levenshtein distance which includes transpositions as a primitve edit operation.
module Make (St : S) : sig ... end