package fmlib_parse

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Parser for streams of unicode characters.

There are several possibilities to encode unicode characters in byte streams.

  • utf8: Encodes a unicode character in 1 to 4 bytes. The ascii characters are included as a special case. Mostly used to transfer unicode text data on the internet and on unix based platforms (like MacOS).
  • utf16: Encodes a unicode character in 2 or 4 bytes. The whole basic mulilingual plane is encoded in 2 bytes and all the other planes need 4 bytes. Mostly used on windows platforms and in javascript. For text streams big and littly endian has to be distinguished.

There are the following modules available:

  • Make_utf8: Parse text streams encoded in utf-8.
  • Make_utf16_le: Parse text streams encoded in utf-16 little endian.
  • Make: Parse text streams in any encoding. The encoder and decoder have to be provided as module parameter.

All parsers in this module work like a character parser (see Character.Make) with some additional combinators to recognize unicode characters.

Parse an input stream consisting of unicode characters encoded in utf-8.

Parse an input stream consisting of unicode characters encoded in utf-16 big endian.

Parse an input stream consisting of unicode characters encoded in utf-16 little endian.

Parse an input stream consisting of unicode characters. The unicode characters are encoded and decoded by using the module Codec.

OCaml

Innovation. Community. Security.