package ubase

  1. Overview
  2. Docs
Remove diacritics from latin utf8 strings

Install

Dune Dependency

Authors

Maintainers

Sources

0.03.tar.gz
md5=ace4603b1618f36a2b181d18a523132e
sha512=307206da7097ae87ef9a8b69d13f0f06709d3ce9445a2d811d70c583792722ff632f108043f1190914559b824e83aee2224f1ab295c864ae487f4781d3aa84cd

Description

Published: 19 Jul 2020

README

Ubase

Ocaml library for removing diacritics (accents, etc.) from Latin letters in UTF8 string.

It should work for all utf8 strings, regardless of normalization NFC, NFD, NFKD, NFKC.

Please don't use this library to store your strings without accents! On the contrary, store them in full UTF8 encoding, and use this library to simplify searching and comparison.

Example

let nfc = "V\197\169 Ng\225\187\141c Phan";; 
let nfd = "Vu\204\131 Ngo\204\163c Phan";;

print_endline nfc;; 
Vũ Ngọc Phan

print_endline nfd;; 
Vũ Ngọc Phan

Ubase.from_utf8_string nfc;;
- : string = "Vu Ngoc Phan"

Ubase.from_utf8_string nfd;; 
- : string = "Vu Ngoc Phan"

Usage

val from_utf8_string : ?malformed:string -> ?strip:string -> string -> string
(** Remove all diacritics on latin letters from a standard string containing
    UTF8 text. Any malformed UTF8 will be replaced by the [malformed] parameter
    (by default "?"). If the optional parameter [strip] is present, all
    non-ascii, non-latin unicode characters will be replaced by the [strip]
    string (which can be empty). *)

Install

Ubase depends (only) on uutf.

Download the repository, move into the ubase directory, and

dune build
opam install .

Testing

From the ubase directory:

dune utop

From the command line

Once you have installed the library, you can execute the ubase program from a terminal

$ ubase Déjà vu !
Deja vu !

$ ubase "et grønt træ"
et gront trae

(Notice that the quotes "" are not required)

Doc

Documentation and API are available here.

Manually building the docs, from the ubase directory:

dune build @doc
firefox ./_build/default/_doc/_html/ubase/Ubase/index.html

Using Ubase for accent-insensitive searching

Have a look at Ufind, a small search engine based on Ubase.

Dependencies (3)

  1. ocaml >= "4.05.0"
  2. uutf >= "1.0.1"
  3. dune >= "1.11"

Dev Dependencies

None

Used by

None

Conflicts

None

OCaml

Innovation. Community. Security.