package uutf

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Fold over the characters of UTF encoded OCaml string values.

Note. Since OCaml 4.14, UTF decoders are available in Stdlib.String. You are encouraged to migrate to them.

Encoding guess

val encoding_guess : string -> [ `UTF_8 | `UTF_16BE | `UTF_16LE ] * bool

encoding_guess s is the encoding guessed for s coupled with true iff there's an initial BOM.

String folders

Note. Initial BOMs are also folded over.

type 'a folder = 'a -> int -> [ `Uchar of Uchar.t | `Malformed of string ] -> 'a

The type for character folders. The integer is the index in the string where the `Uchar or `Malformed starts.

val fold_utf_8 : ?pos:int -> ?len:int -> 'a folder -> 'a -> string -> 'a

fold_utf_8 f a s ?pos ?len () is f ( ... (f (f a pos u0) j1 u1) ... ) ... ) jn un where ui, ji are characters and their start position in the UTF-8 encoded substring s starting at pos and len long. The default value for pos is 0 and len is String.length s - pos.

val fold_utf_16be : ?pos:int -> ?len:int -> 'a folder -> 'a -> string -> 'a

fold_utf_16be f a s ?pos ?len () is f ( ... (f (f a pos u0) j1 u1) ... ) ... ) jn un where ui, ji are characters and their start position in the UTF-8 encoded substring s starting at pos and len long. The default value for pos is 0 and len is String.length s - pos.

val fold_utf_16le : ?pos:int -> ?len:int -> 'a folder -> 'a -> string -> 'a

fold_utf_16le f a s ?pos ?len () is f ( ... (f (f a pos u0) j1 u1) ... ) ... ) jn un where ui, ji are characters and their start position in the UTF-8 encoded substring s starting at pos and len long. The default value for pos is 0 and len is String.length s - pos.