uuseg

Unicode text segmentation for OCaml
README

v15.0.0

Uuseg is an OCaml library for segmenting Unicode text. It implements
the locale independent Unicode text segmentation algorithms to
detect grapheme cluster, word and sentence boundaries and the
Unicode line breaking algorithm to detect line break
opportunities.

The library is independent from any IO mechanism or Unicode text data
structure and it can process text without a complete in-memory
representation.

Uuseg depends on Uucp and
optionally on Uutf for support on
OCaml UTF-X encoded strings. It is distributed under the ISC license.

Homepage: http://erratique.ch/software/uuseg

Installation

Uuseg can be installed with opam:

opam install uuseg
opam install uutf uuseg # for support on OCaml UTF-X encoded strings

If you don't use opam consult the opam file for build
instructions.

Documentation

The documentation and API reference can be consulted online or
via odig doc uuseg.

Sample programs

If you installed Uuseg with opam sample programs are located in
the directory opam config var uuseg:doc.

In the distribution sample programs are located in the test
directory of the distribution, they can be built with:

topkg build --tests true

  • test.native tests the library, nothing should fail.

  • usegtrip.native inputs Unicode text on stdin and rewrites
    segments on stdout. Invoke with --help for more information
    Depends on Uutf and
    Cmdliner.

Install
Published
15 Sep 2022
Maintainers
Sources
uuseg-15.0.0.tbz
sha512=37ea83b582dd779a026cfae11f08f5d67ef79fce65a2cf03f2a9aabc7eb5de60c8e812524fa7531e4ff6e22a3b18228e3438a0143ce43be95f23237cc283576f
Dependencies
uucp
>= "15.0.0" & < "16.0.0"
topkg
build & >= "1.0.3"
ocaml
>= "4.03.0"
Reverse Dependencies